主要内容

基于贝叶斯优化的自动分类器选择

此示例显示了如何使用fitcauto给定训练预测器和响应数据,自动尝试选择具有不同超参数值的分类模型类型。该函数使用贝叶斯优化来选择模型及其超参数值,并计算每个模型的交叉验证分类误差。优化完成后,fitcauto返回在整个数据集上训练过的模型,该模型有望对新数据进行最佳分类。根据测试数据检查模型性能。

加载示例数据

此示例使用存储在1994年人口普查数据人口普查1994.Mat.该数据集由来自美国人口普查局的人口统计信息组成,可以用来预测一个人的年收入是否超过5万美元。

加载示例数据人口普查1994.,其中包含训练数据adultdata测试数据成人.预览训练数据集的前几行。

加载人口普查1994.头(adultdata)
ans =8×15表年龄workClass fnlwgt教育education_num marital_status种族职业关系性capital_gain capital_loss hours_per_week native_country薪水  ___ ________________ __________ _________ _____________ _____________________ _________________ _____________ _____ ______ ____________ ____________ ______________ ______________ ______ 39State-gov 77516单身汉13未婚Adm-clerical家族的白人男性2174 0 40美国< = 50 k 50 Self-emp-not-inc 83311单身汉13 Married-civ-spouse Exec-managerial丈夫13美国白人男性0 0 < = 50 k 38私人2.1565 e + 05 HS-grad 9离婚Handlers-cleaners家族的白人男性40 0 0美国< = 50 k 53岁私人2.3472 e + 05年11日7 Married-civ-spouse Handlers-cleaners丈夫黑人男性40 0 0美国< = 50 k 28私人3.3841 e + 05单身汉13 Married-civ-spouse Prof-specialty妻子古巴黑人女性40 0 0 < = 50 k 37私人2.8458 e + 05年硕士14 Married-civ-spouse Exec-managerial妻子白人女性40 0 0美国< = 50 k 49私人1.6019 e + 05第9位5已婚配偶缺席的其他服务非家庭成员黑人女性0 0 16牙买加<=50K 52自我-emp-not-inc 2.0964e+05 HS-grad 9已婚公民配偶执行-管理丈夫白人男性0 0 45美国>50K

每一行包含一个成人的人口统计信息。最后一列工资展示一个人是否有薪水小于或等于每年50,000美元或每年大于50,000美元。

使用自动模型选择

使用fitcauto自动找到适当的分类器以获取数据adultdata.设置观察权值,并指定并行运行贝叶斯优化,这需要parallel Computing Toolbox™。由于并行时序的不可再现性,并行贝叶斯优化并不一定产生可再现的结果。

由于优化的复杂性,这个过程可能需要一些时间,特别是对于较大的数据集。默认情况下,fitcauto提供优化图和优化结果的迭代显示。有关如何解释这些结果的更多信息,请参见详细的显示

选择=结构(“UseParallel”,真正的);[mdl,结果]= fitcauto (adultdata“工资”“重量”“fnlwgt”...'hyperparameteroptimizationOptions'、选择);
警告:建议您在优化朴素贝叶斯“宽度”参数时首先标准化所有数值预测器。如果您已经这样做了,请忽略此警告。
使用“local”配置文件启动并行池(parpool)…连接到并行池(工人数量:6)。复制目标函数到工人…完成向工人复制目标函数。
总迭代(MaxObjectiveEvaluations): 90总时间(MaxTime): Inf
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 最好1 | 6 | | 0.16287 | 4.3468 | 0.16287 | 0.16287 | nb | DistributionNames:normal | | | | | | | | | |宽度:NaN |
| 2 | 5 |接受| 0.14389 | 6.1049 | 0.14162 | 0.14287 | tree | MinLeafSize: 21 | | 3 | 5 | Best | 0.14162 | 5.6195 | 0.14162 | 0.14287 | tree | MinLeafSize: 50 |
| 4 | 6 | Accept | 0.15626 | 74.156 | 0.14162 | 0.14287 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 283 | | | | | | | | | | MinLeafSize: 7330 |
| 5 | 6 | Accept | 0.15603 | 77.293 | 0.14162 | 0.14287 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 295 | | | | | | | | | | MinLeafSize: 3 |
| 6 | 6 |接受| 0.16027 | 5.6224 | 0.14162 | 0.14842 | tree | MinLeafSize: 5 |
| 7 | 6 |接受| 0.17343 | 8.6209 | 0.14162 | 0.15576 | tree | MinLeafSize: 2 |
| 8 | 6 | Accept | 0.15103 | 4.8867 | 0.14162 | 0.15392 | tree | MinLeafSize: 8 |
| 9 | 6 |接受| 0.17642 | 1.1808 | 0.14162 | 0.15449 | tree | MinLeafSize: 1663 |
| 10 | 6 | Accept | 0.15927 | 5.0734 | 0.14162 | 0.15343 | tree | MinLeafSize: 6 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 11 | 6 | Accept | 0.17009 | 1.6504 | 0.14162 | 0.15533 | tree | MinLeafSize: 1272 |
| 12 | 6 |接受| 0.17869 | 1.0308 | 0.14162 | 0.154 | tree | MinLeafSize: 2744 |
| 13 | 6 | Accept | 0.17961 | 116.64 | 0.14162 | 0.154 | nb | DistributionNames: kernel | | | | | | | | | | Width: 274.23 |
| | 5 | 14日接受| 0.15128 | 118.36 | 0.14162 | 0.15383 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 241  | | | | | | | | | | MinLeafSize: 23 | | 15 | 5 |接受| 0.15177 | 115.42 | 0.14162 | 0.15383 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 235  | | | | | | | | | | MinLeafSize: 40 |
| 16 | 5 |接受| 0.15116 | 115.49 | 0.14162 | 0.15326 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 235 | | | | | | | | | | MinLeafSize: 40 |
| 17 | 6 | Accept | 0.14887 | 63.412 | 0.14162 | 0.15326 | nb | DistributionNames: kernel | | | | | | | | | | Width: 0.56014 |
|18 |6 |接受|0.17869 |0.89318 |0.14162 |0.15219 |树|minleafsize:2712 |
| 19 | 6 |接受| 0.17676 | 59.781 | 0.14162 | 0.15219 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 208 | | | | | | | | | | MinLeafSize: 4208 |
|20 |6 |接受|0.15086 |81.42 |0.14162 |0.15219 |NB |分发名称:内核| | | | | | | | | | Width: 2.4778 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 21 | 6 |接受| 0.16287 | 0.64656 | 0.14162 | 0.15219 | nb | DistributionNames:normal | | | | | | | | | |宽度:NaN |
| 22 | 6 | Accept | 0.14943 | 75.578 | 0.14162 | 0.15219 | nb | DistributionNames: kernel | | | | | | | | | | Width: 1.6195 |
| 23 | 6 |接受| 0.16287 | 0.49489 | 0.14162 | 0.15219 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
| 24 | 6 | Accept | 0.14926 | 68.642 | 0.14162 | 0.15219 | nb | DistributionNames: kernel | | | | | | | | | | Width: 1.2371 | . | 24 | 6 | Accept | 0.14926 | 68.642 | 0.14162 | 0.15219 | nb | DistributionNames: kernel | | | | | | | | | | Width: 1.2371 |
| 25 | 6 |接受| 0.16287 | 0.5124 | 0.14162 | 0.15219 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
| 26 | 6 | Accept | 0.15609 | 58.267 | 0.14162 | 0.15219 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 247 | | | | | | | | | | MinLeafSize: 1 |
| 27 | 6 |接受| 0.16287 | 0.93385 | 0.14162 | 0.15219 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
|28 |6 |接受|0.15554 |4.3668 |0.14162 |0.15067 |树|minleafsize:7 |
|29 |6 |接受|0.15087 |127.01 |0.14162 |0.15067 |合奏|方法:包包| | | | | | | | | | NumLearningCycles: 289 | | | | | | | | | | MinLeafSize: 9 |
| 30 | 6 |接受| 0.15142 | 127.39 | 0.14162 | 0.15067 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 289 | | | | | | | | | | MinLeafSize: 9 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 31 | 6 | Accept | 0.14177 | 2.6306 | 0.14162 | 0.14707 | tree | MinLeafSize: 116 |
| 32 | 6 |接受| 0.16287 | 1.1225 | 0.14162 | 0.14707 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
| 33 | 6 | Accept | 0.15737 | 56.258 | 0.14162 | 0.14707 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 233 | | | | | | | | | | MinLeafSize: 5308 |
| 34 | 6 |接受| 0.15158 | 97.559 | 0.14162 | 0.14707 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 214 | | | | | | | | | | MinLeafSize: 133 |
| 35 | 6 |接受| 0.1719 | 96.392 | 0.14162 | 0.14707 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 223 | | | | | | | | | | MinLeafSize: 1526 |
|36 |6 |接受|0.16287 |0.42054 |0.14162 |0.14707 |NB |分布名:正常| | | | | | | | | | Width: NaN |
| 37 | 6 |接受| 0.14441 | 3.5932 | 0.14162 | 0.14598 | tree | MinLeafSize: 18 |
| 38 | 6 |接受| 0.16287 | 0.34693 | 0.14162 | 0.14598 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
| 39 | 6 | Accept | 0.14432 | 3.4661 | 0.14162 | 0.145 | tree | MinLeafSize: 19 |
|40 |6 |接受|0.14291 |2.3121 |0.14162 |0.14321 |树|minleafsize:231 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 41 | 6 | Accept | 0.15278 | 96.086 | 0.14162 | 0.14321 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.5668 |
| 42 | 6 |接受| 0.15068 | 1.9847 | 0.14162 | 0.14348 | tree | MinLeafSize: 412 |
|43 |6 |接受|0.14705 |2.1122 |0.14162 |0.14343 |树|MINLEAFSIZE:305 |
| 44 | 6 | Accept | 0.14186 | 2.3835 | 0.14162 | 0.14309 | tree | MinLeafSize: 168 |
| 45 | 6 |接受| 0.16209 | 1.9821 | 0.14162 | 0.14302 | tree | MinLeafSize: 573 |
46 | | 5 |接受| 0.15783 | 53.627 | 0.14135 | 0.14271 |合奏|方法:LogitBoost  | | | | | | | | | | NumLearningCycles: 211  | | | | | | | | | | MinLeafSize: 125 | | 47最好| 5 | | 0.14135 | 3.1329 | 0.14135 | 0.14271 | |树MinLeafSize: 63 |
|48 |4 |接受|0.15637 |63.578 |0.14135 |0.14236 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 252 | | | | | | | | | | MinLeafSize: 485 | | 49 | 4 | Accept | 0.1448 | 2.1012 | 0.14135 | 0.14236 | tree | MinLeafSize: 263 |
|50 |3 |接受|0.1513 |114.35 |0.14135 |0.14224 |合奏|方法:包包| | | | | | | | | | NumLearningCycles: 253 | | | | | | | | | | MinLeafSize: 13 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 51 | 3 | Accept | 0.14271 | 2.2737 | 0.14135 | 0.14224 | tree | MinLeafSize: 133 |
|52 |6 |接受|0.14349 |1.9707 |0.14135 |0.14224 |树|minleafsize:199 |
53 | | 3 |接受| 0.15337 | 1.6887 | 0.14135 | 0.14235 | |树MinLeafSize: 441 | | 54 | 3 |接受| 0.17869 | 1.049 | 0.14135 | 0.14235 | |树MinLeafSize: 1821 | | 55 | 3 |接受| 0.1785 | 0.9639 | 0.14135 | 0.14235 | |树MinLeafSize: 3523 | | 56 | 3 |接受| 0.18062 | 0.63917 | 0.14135 | 0.14235 | |树MinLeafSize: 4359 |
| 57 | 6 |接受| 0.14673 | 3.2067 | 0.14135 | 0.14207 | tree | MinLeafSize: 12 |
| 58 | 6 |接受| 0.14238 | 2.3081 | 0.14135 | 0.14215 | tree | MinLeafSize: 177 |
59 | | 5 |接受| 0.16352 | 125.94 | 0.14135 | 0.1419 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 297  | | | | | | | | | | 60 MinLeafSize: 823 | | | 5 |接受| 0.14162 | 2.849 | 0.14135 | 0.1419 | |树MinLeafSize: 50 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 最好61 | 5 | | 0.14113 | 2.6499 | 0.14113 | 0.14173 | |树MinLeafSize: 83 |
| 62 | 5 | Accept | 0.14178 | 2.9853 | 0.14113 | 0.14153 | tree | MinLeafSize: 40 |
| 63 | 5 | Accept | 0.14157 | 2.8701 | 0.14113 | 0.14153 | tree | MinLeafSize: 42 |
| 64 | 5 | Accept | 0.15886 | 1.7188 | 0.14113 | 0.14161 | tree | MinLeafSize: 532 |
|65 |5 |接受|0.14529 |3.6593 |0.14113 |0.14151 |树|minleafsize:14 |
| 66 | 4 |接受| 0.23856 | 41.472 | 0.14113 | 0.14151 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 209  | | | | | | | | | | MinLeafSize: 8676 | | 67 | |接受| 0.14702 | 4.0559 | 0.14113 | 0.14151 | |树MinLeafSize: 10 |
| 68 | 4 | Best | 0.14058 | 2.8472 | 0.14058 | 0.14148 | tree | MinLeafSize: 30 |
| 69 | 4 | Accept | 0.14168 | 2.1868 | 0.14058 | 0.14143 | tree | MinLeafSize: 112 |
| 70 | 4 |接受| 0.14072 | 2.9698 | 0.14058 | 0.14144 | tree | MinLeafSize: 28 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 71 | 4 |接受| 0.14117 | 2.8824 | 0.14058 | 0.14114 | |树MinLeafSize: 29 |
| 72 | 4 | Best | 0.14046 | 2.8853 | 0.14046 | 0.14112 | tree | MinLeafSize: 25 |
| 73 | 4 | Accept | 0.14184 | 2.8532 | 0.14046 | 0.14103 | tree | MinLeafSize: 24 |
| 74 | 4 |接受| 0.14112 | 2.7998 | 0.14046 | 0.14102 | tree | MinLeafSize: 33 |
| 75 | 4 |接受| 0.14331 | 3.0835 | 0.14046 | 0.141 | tree | MinLeafSize: 23 |
| 76 | 4 |接受| 0.14089 | 2.9637 | 0.14046 | 0.14086 | tree | MinLeafSize: 31 |
|77 |4 |接受|0.14046 |3.0017 |0.14046 |0.14083 |树|minleafsize:25 |
| 78 | 3 |接受| 0.15093 | 91.952 | 0.14046 | 0.14085 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 222  | | | | | | | | | | MinLeafSize: 27日| | 79 | |接受| 0.14046 | 2.9993 | 0.14046 | 0.14085 | |树MinLeafSize: 25 |
|80 |6 |接受|0.14046 |2.7739 |0.14046 |0.14073 |树|minleafsize:25 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 81 | 2 |接受| 0.18178 | 101.13 | 0.14046 | 0.14068 | nb | DistributionNames:内核  | | | | | | | | | | 宽度:868.86 | | 82 | |接受| 0.14184 | 3.2218 | 0.14046 | 0.14068 | |树MinLeafSize: 24 | | 83 | |接受| 0.17807 | 0.82685 | 0.14046 | 0.14068 | |树MinLeafSize: 3874 | | 84 | |接受| 0.15989 | 1.8729 | 0.14046 | 0.14068 | |树MinLeafSize:| 0.15103 | 3.8835 | 0.14046 | 0.14068 | tree | MinLeafSize: 8
|86 |6 |接受|0.14046 |2.5909 |0.14046 |0.14067 |树|minleafsize:25 |
| 87 | 6 |接受| 0.14331 | 3.5433 | 0.14046 | 0.14067 | tree | MinLeafSize: 23 |
| 88 | 6 |接受| 0.23856 | 47.904 | 0.14046 | 0.14067 | ensemble |方法:Bag | | | | | | | | | | NumLearningCycles: 258 | | | | | | | | | | MinLeafSize: 12543 |
| 89 | 6 | Accept | 0.14914 | 59.665 | 0.14046 | 0.14067 | nb | DistributionNames: kernel | | | | | | | | | | Width: 0.37688 |
| 90 | 6 | Accept | 0.15604 | 68.731 | 0.14046 | 0.14067 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 262 | | | | | | | | | | MinLeafSize: 2 |

__________________________________________________________ 优化完成。总迭代次数:90总耗时:577.1419秒总训练和验证时间:2558.1542秒最佳观测学习者是一个树模型:MinLeafSize: 25观测验证损失:0.14046训练和验证时间:2.8853秒最佳估计学习者(返回模型)是一个树模型:MinLeafSize:25估计验证损失:0.14067估计培训和验证时间:2.8824秒fitcauto显示文档

返回的最终模型fitcauto对应于最佳估计的学习者。在返回模型之前,函数使用整个训练数据对模型进行重新训练(adultdata),上市学习者(或模型)类型,以及显示的超参数值。

评估测试集性能

评估返回模型的性能mdl在测试集中成人通过使用混淆矩阵和接受者工作特征(ROC)曲线。

找到测试集的预测标签和评分值。

(标签、分数)=预测(mdl,成人);

从测试集结果中创建一个混淆矩阵。对角线元素表示给定类正确分类的实例的数量。非对角线元素是错误分类观察的实例。

ConfusionChart(AdutherTest.Salary,标签)

计算测试集分类精度。精度为正确分类的测试集观察值的百分比。

精度=(第一(mdl,成人,“工资”))* 100
精度= 85.1513.

绘制对应于标签的得分值的ROC曲线“< = 50 k”,找到的列分数它对应于那个标签。列的顺序分数匹配训练模型中类的顺序。

mdl。一会
ans =2×1分类< = 50 k > 50 k

因为“< = 50 k”列在第一列分数对应于那个标签。

绘制ROC曲线,计算曲线下面积(AUC)。ROC曲线显示了分类器输出的不同阈值的真实阳性率与假阳性率。对于一个完美的分类器,无论阈值如何,其真实阳性率始终为1,则AUC = 1。对于一个随机分配观察到的类的二元分类器,AUC = 0.5。较大的AUC值(接近1)表示较好的分类器性能。

[x,y,〜,auc] = perfcurve(成年人,分数(:,1),“< = 50 k”);情节(X, Y)标题(“ROC曲线”)包含(的假阳性率) ylabel (“真阳性率”

AUC
AUC = 0.8947

基于准确率和AUC值,分类器在测试数据上表现良好。

也可以看看

|||

相关的话题