Deploy Applications Using theMATLABAPI forSpark
Supported Platform:Linux®only.
Using the MATLAB API for Spark to deploy an application consists of two parts :
Creating your application using the MATLAB API for Spark and packaging it as a standalone application in the MATLAB desktop environment.
Executing the standalone application against a Spark enabled cluster from a Linux shell.
While creating your application using the MATLAB API for Spark, you will be able to use Spark functions such asflatMap
,mapPartitions
,aggregate
and others in your MATLAB code. The API exposes the Spark programing model to MATLAB, allowing for MATLAB implementations of numerous Spark functions. Many of these MATLAB implementations accept function handles or anonymous functions as inputs to perform various types of analyses.
The API lets you interactively run your application from within the MATLAB desktop environment in a nondistributed mode on a single machine. A second MATLAB session on the same machine serves as a worker. This functionality can be helpful in debugging your application prior to deploying it on a Spark enabled cluster. It is necessary to configure your MATLAB environment for interactive debugging using the MATLAB API for Spark. For more information, seeConfigure Environment for Interactive Debugging.
The general workflow for using the MATLAB API for Spark is as follows :
Specify Spark properties.
Create a SparkConf object.
Create a SparkContext object.
Create an RDD object from the data.
Perform operations on the RDD object.
You can package an application created with this API into a standalone application using themcc
command ordeploytool
. You can then run the application on a Spark enabled cluster from a Linux shell.
Note
MATLAB applications developed using the MATLAB API for Spark cannot be deployed if they contain tall arrays.
For a complete example, seeExample on Deploying Applications to Spark Using the MATLAB API for Spark. You can follow the same instructions to deploy applications created using the MATLAB API for Spark to Cloudera®CDH.
Classes
matlab.compiler.mlspark.SparkConf |
Interface class to configure an application withSparkparameters as key-value pairs |
matlab.compiler.mlspark.SparkContext |
Interface class to initialize a connection to a Spark enabled cluster |
matlab.compiler.mlspark.RDD |
Interface class to represent aSparkResilient Distributed Dataset (RDD) |
Topics
- Configure Environment for Interactive Debugging
Configure your MATLAB environment to interactively make calls and debug your application using the MATLAB API for Spark.
- Apache Spark Basics
Learn basic Apache Spark™ concepts and see how these concepts relate to deploying MATLAB applications to Spark.
- Example on Deploying Applications to Spark Using the MATLAB API for Spark
Complete example showing how to deploy an application to Spark using the MATLAB API for Spark.