Spark Sql Connector Release Notes

Complex data access through Apache Spark SQL API.

The main functionality the Spark SQL Connector is to allow the execution of Spark job to extract structured data using Spark SQL capabilities. Version 1.0.0 allows a user to submit a job (defined as a SQL Query) into a Spark standalone Cluster and retrieve the results as a collection of entities. A use case can be defining a context with sources like a JSON file at hadoop, a cassandra DB and a PostgreSQL DB and execute a SQL Query (a SparkSQL Job) that joins and filters those data sources and produces, as result, a list of objects with requested data.

Spark Sql Connector User Guide

V1.0.0 - 01/02/2017

Version V1.0.0 Compatibility (tested)

Software	Version
Mule Runtime	EE 3.6.x - 3.8.3
Annypoint Studio	6.0 or later
Apache Spark	Apache Spark 1.6.2 with Scala 2.10 or 2.11

Features

Data sources - allow globally configuring CSV, JSON, AVRO and PARQUET files at Hadoop HDFS (1), JDBC compatible databases (2) and Cassandra database (3) as data sources.
Execute SQL - Inside a flow, retrieve a "DataFrame" by running a SQL Query (org.apache.spark.sql.SQLContext#sql(String)) into a Spark Cluster and transform it to an array or iterator at the mule message payload (sqlSelect and customSql processors).
Dynamically manage data sources - Dynamically add data sources (addJdbcDataProperties, addFileDataProperties and addCassandraDataProperties processors).
Temporary tables - Register (sqlSelect processor) and remove (dropTemporaryTable processor) temporary tables (usable within SQLs).

Support Resources

Learn how to Install Anypoint Connectors using Anypoint Exchange.
Access MuleSoft’s MuleForge Forum to pose questions and get help from Mule’s broad community of users.
To access MuleSoft’s expert support team, subscribe to Mule ESB Enterprise and log in to MuleSoft’s Customer Portal.