Skip to content

Latest commit

 

History

History
52 lines (34 loc) · 2.1 KB

gimel-sql.md

File metadata and controls

52 lines (34 loc) · 2.1 KB

SQL API

  • Simply Combine all your DataSets & express your entire Logic as a single SQL via SQL API.
  • In a Single SQL - Join data from various systems (Kafka, ES, HBASE, Hive, Teradata, more to come).
  • Execute your SQL in Batch Mode or Continuous Streaming Mode.

Controlling SQL output and behaviour

Property for SQL results Notes
gimel.query.results.show.rows.only Set this to "true" to stop getting all these messages. (Default : false)
gimel.query.results.show.rows.threshold Number of rows to display in interactive mode (Default : 1000)

SQL on Scala (spark-shell | spark-submit)

Execute Batch SQL - Spark Shell / Program

sparkSession.conf.set("gimel.query.results.show.rows.only","true")

sparkSession.sql("set gimel.kafka.throttle.streaming.window.seconds=20");
sparkSession.sql("set gimel.kafka.throttle.streaming.parallelism.factor=20");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.save=true");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.clear=false");
sparkSession.sql("set gimel.kafka.throttle.batch.fetchRowsOnFirstRun=100");
sparkSession.sql("set gimel.kafka.throttle.batch.maxRecordsPerPartition=50");
sparkSession.sql("set gimel.logging.level=INFO");
sparkSession.sql("set gimel.query.results.show.rows.only=true");

val df = gimel.sql.GimelQueryProcessor.executeBatch("SELECT count(*) FROM pcatalog.flights_kafka",sparkSession);

Execute Stream SQL - Spark Shell / Program

gimel.sql.GimelQueryProcessor.executeStream("INSERT into pcatalog.flights_elastic SELECT count(*) FROM pcatalog.flights_kafka",sparkSession);