Configure aHadoopCluster
平行MATLAB®code that containstall
(MATLAB)arrays andMapReduce
(MATLAB)functions can be submitted to the Hadoop cluster from suitably configured MATLAB clients.
To configure the client to run MATLAB code on the cluster, you must already be able to submit to the cluster from the intended client machine. The client machine must have a Hadoop®installation that can access the cluster outside of MATLAB.
Many Hadoop distributions do not support direct access of Linux®基于集群从Windows®clients. Users of Windows clients typically need to set up a Linux gateway node that can be accessed from the Windows client via SSH or VNC. The cluster can then be accessed from this gateway node.
Cluster Configuration
整合MATLAB Parallel Server™with your cluster infrastructure. For instructions, seeInstall and Configure MATLAB Parallel Server for Third-Party Schedulers.
If your cluster requires Kerberos authentication, ensure yourMATLAB Parallel Serverinstallations have been configured correctly. For instructions, seeKerberos身份验证.
客户端配置
Ensure your client can access the Hadoop cluster outside MATLAB.
Ensure your client MATLAB installation has been configured for Kerberos authentication if your cluster requires it. For instructions, seeKerberos身份验证.
要从MATLAB内部访问群集,请设置一个parallel.cluster.Hadoop
(Parallel Computing Toolbox)object using the following statements.
setenv('HADOOP_HOME', '/path/to/hadoop/install') cluster = parallel.cluster.Hadoop;
UseMapReducer
(MATLAB)指定MapReduce
在Hadoop群集对象上运行。
For examples of how to run parallel MATLAB code on your Hadoop cluster, seeRun mapreduce on a Hadoop Cluster(Parallel Computing Toolbox)和在启用火花的Hadoop群集上使用高阵列(Parallel Computing Toolbox).
Kerberos身份验证
If the cluster uses Kerberos authentication that requires the Oracle®Java®Cryptography Extension, you must configure all installations of MATLAB andMATLAB Parallel Server. If you are using Hortonworks®or Cloudera®distributions, it is likely that you need to complete these configuration steps.
The configuration instructions are the same for client and worker MATLAB installations.
Starting in R2018b, configure your MATLAB installation by enabling the appropriate security policy in the Java installation.
In the MATLAB Editor, open the file
${MATLAB_ROOT}/sys/java/jre/${ARCH}/jre/lib/security/java.security
.Change the line
#crypto.policy=unlimited
crypto.policy=unlimited
For previous releases, you must download additional security files from Oracle.
Download the Oracle Java Cryptography Extension zip file from the Oracle Java SE page.
Unzip the downloaded zip file into a temporary folder.
Replace the files
local_policy.jar
和US_export_policy.jar
in the folder${MATLABROOT}/sys/java/jre/${ARCH}/jre/lib/security
with the downloaded versions.
Hadoop Version Support
MATLAB
MapReduce
is supported on Hadoop 2.x clusters. Note that support for Hadoop 1.x clusters has been removed.MATLAB tall arrays are supported on Spark™ enabled Hadoop 2.x clusters. You can use tall arrays on Spark enabled Hadoop clusters supporting all architectures for the client, while supporting Linux and Mac architectures for the cluster. This includes cross-platform support.
功能 | Result | Use Instead | Compatibility Considerations |
---|---|---|---|
Support for running MATLABMapReduce on Hadoop 1.x clusters has been removed. |
错误 | 使用已安装Hadoop 2.X的群集运行MATLABMapReduce . |
Migrate MATLABMapReduce 在Hadoop 1.X上运行的代码到Hadoop 2.x。 |
See Also
parallel.cluster.Hadoop
(Parallel Computing Toolbox)
相关话题
- Install and Configure MATLAB Parallel Server for Third-Party Schedulers
- 在启用火花的Hadoop群集上使用高阵列(Parallel Computing Toolbox)
- Run mapreduce on a Hadoop Cluster(Parallel Computing Toolbox)
- 阅读和分析Hadoop序列文件(MATLAB)