Complex data access through Apache Spark SQL API.

The main functionality the Spark SQL Connector is to allow the execution of Spark job to extract structured data using Spark SQL capabilities. Version 1.0.0 allows a user to submit a job (defined as a SQL Query) into a Spark standalone Cluster and retrieve the results as a collection of entities. A use case can be defining a context with sources like a JSON file at hadoop, a cassandra DB and a PostgreSQL DB and execute a SQL Query (a SparkSQL Job) that joins and filters those data sources and produces, as result, a list of objects with requested data.

V1.0.0 - 01/02/2017

Version V1.0.0 Compatibility (tested)

Software Version

Mule Runtime

EE 3.6.x - 3.8.3

Annypoint Studio

6.0 or later

Apache Spark

Apache Spark 1.6.2 with Scala 2.10 or 2.11

Features

  1. Data sources - allow globally configuring CSV, JSON, AVRO and PARQUET files at Hadoop HDFS (1), JDBC compatible databases (2) and Cassandra database (3) as data sources.

  2. Execute SQL - Inside a flow, retrieve a "DataFrame" by running a SQL Query (org.apache.spark.sql.SQLContext#sql(String)) into a Spark Cluster and transform it to an array or iterator at the mule message payload (sqlSelect and customSql processors).

  3. Dynamically manage data sources - Dynamically add data sources (addJdbcDataProperties, addFileDataProperties and addCassandraDataProperties processors).

  4. Temporary tables - Register (sqlSelect processor) and remove (dropTemporaryTable processor) temporary tables (usable within SQLs).

Support Resources

  • Learn how to Install Anypoint Connectors using Anypoint Exchange.

  • Access MuleSoft’s MuleForge Forum to pose questions and get help from Mule’s broad community of users.

  • To access MuleSoft’s expert support team, subscribe to Mule ESB Enterprise and log in to MuleSoft’s Customer Portal.