Main Content

Deploy Applications Using theMATLABAPI forSpark

Create and execute MATLAB®applications against Spark™ using the MATLAB API for Spark

Supported Platform:Linux®only.

Using the MATLAB API for Spark to deploy an application consists of two parts :

  • Creating your application using the MATLAB API for Spark and packaging it as a standalone application in the MATLAB desktop environment.

  • Executing the standalone application against a Spark enabled cluster from a Linux shell.

While creating your application using the MATLAB API for Spark, you will be able to use Spark functions such asflatMap,mapPartitions,aggregateand others in your MATLAB code. The API exposes the Spark programing model to MATLAB, allowing for MATLAB implementations of numerous Spark functions. Many of these MATLAB implementations accept function handles or anonymous functions as inputs to perform various types of analyses.

The API lets you interactively run your application from within the MATLAB desktop environment in a nondistributed mode on a single machine. A second MATLAB session on the same machine serves as a worker. This functionality can be helpful in debugging your application prior to deploying it on a Spark enabled cluster. It is necessary to configure your MATLAB environment for interactive debugging using the MATLAB API for Spark. For more information, seeConfigure Environment for Interactive Debugging.

The general workflow for using the MATLAB API for Spark is as follows :

  1. Specify Spark properties.

  2. Create a SparkConf object.

  3. Create a SparkContext object.

  4. Create an RDD object from the data.

  5. Perform operations on the RDD object.

You can package an application created with this API into a standalone application using themcccommand ordeploytool. You can then run the application on a Spark enabled cluster from a Linux shell.

Note

MATLAB applications developed using the MATLAB API for Spark cannot be deployed if they contain tall arrays.

For a complete example, seeExample on Deploying Applications to Spark Using the MATLAB API for Spark. You can follow the same instructions to deploy applications created using the MATLAB API for Spark to Cloudera®CDH.

Classes

matlab.compiler.mlspark.SparkConf Interface class to configure an application withSparkparameters as key-value pairs
matlab.compiler.mlspark.SparkContext Interface class to initialize a connection to a Spark enabled cluster
matlab.compiler.mlspark.RDD Interface class to represent aSparkResilient Distributed Dataset (RDD)

Topics