Is there a way to launch safely multiple instances of Matlab Compiler codes at the same time?

3 views (last 30 days)
PierreMB
PierreMB 2020年7月29日
Commented: Leos Pohl2021年7月22日
Hi,
I use Matlab compiled functions with the Matlab Compiler on high performance computing servers. I’m having some issues when I launch simultaneously several jobs of my compiled code that use parallelization. For instance, yesterday, I just launched 12 jobs of 4 workers each at the same time and 3 of them failed. The error messages for these jobs are:
Job #1
Failedto locate and destroy old interactive jobs.
错误使用parallel.Job/delete (line 1295)
Thejob storage metadata file '/home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/matlab_metadata.mat' does not exist or is corrupt. For assistance recovering job data, contactMathWorks Support Team. Otherwise, 删除all files in the JobStorageLocation and try again.
工作#11
Failedto start pool.
错误使用保存
Unableto write file /home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/Job12.in.mat: No such file or directory.
工作#12
Failedto start pool.
错误使用parallel.Cluster/createConcurrentJob (line 1136)
Cannot write file /home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/Job12.in.mat.
I’m guessing that when Matlab or the runtime try to create the parallel pool it reads/writes/removes temporary configuration files that are common for every task which cause conflicts.
Is there a way either for me or my server administrator to fix that? It happens quite often and it is just annoying to have to restart each failed jobs every time it happens.
Note that I tried to fix it by adding a 45s delay between each task, but even with that, I’m still having this issue. Also, the job scheduler the server uses is Slurm.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

开始狩猎!