Main Content

Binning Explorer案例研究示例

此示例显示了如何使用Binning Explorer应用程序。使用Binning Explorer要汇总数据,请绘制BINNED数据信息并导出ACreditsCoreCard目的。然后使用CreditsCoreCardobject with functions from Financial Toolbox™ to fit a logistic regression model, determine a score for the data, determine the probabilities of default, and validate the credit scorecard model using three different metrics.

Step 1. Load credit scorecarddata进入MATLAB工作区。

使用CreditCardData。mat档案加载data进入MATLAB®workspace (using a dataset from Refaat 2011).

加载CreditCardDatadisp(data(1:10,:))
CustID CustAge TmAtAddress ResStatus EmpStatus前来tIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ___________ __________ _________ __________ _______ _______ _________ ________ ______ 1 53 62 Tenant Unknown 50000 55 Yes 1055.9 0.22 0 2 61 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 30 Tenant Employed 37000 61 No 877.23 0.29 0 4 50 75 Home Owner Employed 53000 20 Yes 157.37 0.08 0 5 68 56 Home Owner Employed 53000 14 Yes 561.84 0.11 0 6 65 13 Home Owner Employed 48000 59 Yes 968.18 0.15 0 7 34 32 Home Owner Unknown 32000 26 Yes 717.82 0.02 1 8 50 57 Other Employed 51000 33 No 3041.2 0.13 0 9 50 10 Tenant Unknown 52000 25 Yes 115.56 0.02 1 10 49 30 Home Owner Unknown 53000 23 Yes 718.5 0.17 1

Step 2. Import thedatainto Binning Explorer.

OpenBinning Explorerfrom the MATLAB toolstrip: On the应用标签,下Computational Finance,单击应用程序图标。或者,您可以输入binningExploreron the MATLAB command line. For more information on starting theBinning Explorerfrom the command line, see使用数据或现有CreditsCoreCard对象从MATLAB命令行开始

来自Binning Explorer工具条,选择导入数据到open the Import Data window.

导入数据对话框

UnderStep 1,,,,selectdata

UnderStep 2,可选设置Variable Typefor each of the predictors. By default, the last column in the data ('status'在此示例中)设置为'Response'。响应价值e with the highest count (0在此示例中)设置为'Good'。All other variables are considered predictors. However, in this example, because“ custid'不是预测指标,设置Variable Type“ custid'不包括

笔记

If the input MATLAB table contains a column forweights, 来自Step 2窗格中,使用Variable Type列,单击下拉列表以选择权重。为了more information on using observation weights with aCreditsCoreCardobject, seeCredit Scorecard Modeling Using Observation Weights

如果数据包含缺失值,则从Step 2窗格,设置bin缺少数据:是的。有关使用丢失数据的更多信息,请参见Credit Scorecard Modeling with Missing Values

UnderStep 3,,,,leaveMonotone作为默认的初始binning算法。

点击导入数据完成导入操作。使用选定算法将其导入到所有预测指标中时,将它们应用于所有预测变量Binning Explorer

为每个预测变量绘制并显示垃圾箱。通过单击从概述窗格,该预测图绘图的细节显示在主窗格和bin信息and预测信息窗格在应用程序的底部。

Predictor plots display after binning

Binning Explorerperforms automatic binning for every predictor variable, using the default“单调”algorithm with default algorithm options. A monotonic, ideally linear trend in the Weight of Evidence (WOE) is often desirable for credit scorecards because this translates into linear points for a given predictor. WOE trends are visualized on the plots for each predictor inBinning Explorer

Perform some initial data exploration. Inquire about predictor statistics for the'resstatus'分类变量。

点击theResStatus阴谋。这bin信息窗格包含“好”和“坏”频率和其他bin统计数据,例如证据的重量(祸)。

bin信息显示

为了numeric data, the same statistics are displayed. Click theCustIncome阴谋。这bin信息已更新有关CustIncome

Bin information for CustIncome predictor

步骤3.使用Binning Explorer中的手动箱来微调垃圾箱。

点击theCustAge预测图。请注意,垃圾箱1和2和垃圾箱5和6一样。

监护预测图

To merge bins 1 and 2, from the main pane, clickCtrl+ click or转移+单击“多选择箱1和2)一起显示与蓝色轮廓合并的蓝色轮廓。

选择两个垃圾箱的监护预测图

On theBinning Explorer到olstrip, theEdgestext boxes display values for the edges of the selected bins to merge.

使用边缘文本框合并选定的垃圾箱进行监护预测变量

点击合并完成合并垃圾箱1和2的合并。CustAge对新垃圾箱信息和详细信息进行了预测图。bin信息and预测信息窗格也已更新。

监护预测图with the two selected bins merged

接下来,合并垃圾箱4和5,因为它们也有类似的困扰。

监护预测图with bins 4 and 5 selected for merging

CustAge预测图图已随着新的bin信息而更新。细节bin信息and预测信息窗格也已更新。

重复此合并操作的以下垃圾箱,这些垃圾箱具有类似的困扰:

  • 为了CustIncome,合并垃圾箱3、4和5。

  • 为了TmWBank,合并垃圾箱2和3。

  • 为了AMBalance,合并垃圾箱2和3。

Now the bins for all predictors have close-to-linear WOE trends.

Step 4. Export theCreditsCoreCardBinning Explorer的对象。

After you complete your binning assignments, usingBinning Explorer, 点击Export然后单击出口记分卡并提供一个CreditsCoreCard对象名称。这CreditsCoreCard目的 (sc)is saved to the MATLAB workspace.

Step 5. Fit a logistic regression model.

使用fitmodel函数将逻辑回归模型拟合到祸数据。fitmodel内部援助培训数据,将其转换为差异值,映射响应变量,以便'Good'is1,,,,and fits a linear logistic regression model. By default,fitmodel使用逐步过程确定模型中的哪些预测因子。

SC = FitModel(SC);
1. Adding CustIncome, Deviance = 1490.8954, Chi2Stat = 32.545914, PValue = 1.1640961e-08 2. Adding TmWBank, Deviance = 1467.3249, Chi2Stat = 23.570535, PValue = 1.2041739e-06 3. Adding AMBalance, Deviance = 1455.858, Chi2Stat = 11.466846,,,,PValue = 0.00070848829 4. Adding EmpStatus, Deviance = 1447.6148, Chi2Stat = 8.2432677, PValue = 0.0040903428 5. Adding CustAge, Deviance = 1442.06, Chi2Stat = 5.5547849, PValue = 0.018430237 6. Adding ResStatus, Deviance = 1437.9435, Chi2Stat = 4.1164321, PValue = 0.042468555 7. Adding OtherCC, Deviance = 1433.7372, Chi2Stat = 4.2063597, PValue = 0.040272676 Generalized Linear regression model: logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ _______ ______ __________ (Intercept) 0.7024 0.064 10.975 5.0407e-28 CustAge 0.61562 0.24783 2.4841 0.012988 ResStatus 1.3776 0.65266 2.1107 0.034799 EmpStatus 0.88592 0.29296 3.024 0.0024946 CustIncome 0.69836 0.21715 3.216 0.0013001 TmWBank 1.106 0.23266 4.7538 1.9958e-06 OtherCC 1.0933 0.52911 2.0662 0.038806 AMBalance 1.0437 0.32292 3.2322 0.0012285 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.42e-16

Step 6. Review and format scorecard points.

After fitting the logistic model, the points are unscaled by default and come directly from the combination of WOE values and model coefficients. Use thedisplaypointsfunction to summarize the scorecard points.

p1 = displaypoints(sc); disp(p1)
Predictors Bin Points ____________ __________________ _________ 'CustAge' '[-Inf,37)' -0.15314 'CustAge' '[37,40)' -0.062247 'CustAge' '[40,46)' 0.045763 'CustAge' '[46,58)' 0.22888 'CustAge' '[58,Inf]' 0.48354 'ResStatus' 'Tenant' -0.031302 'ResStatus' 'Home Owner' 0.12697 'ResStatus' 'Other' 0.37652 'EmpStatus' 'Unknown' -0.076369 'EmpStatus' 'Employed' 0.31456 'CustIncome' '[-Inf,29000)' -0.45455 'CustIncome' '[29000,33000)' -0.1037 'CustIncome' '[33000,42000)' 0.077768 'CustIncome' '[42000,47000)' 0.24406 'CustIncome' '[47000,Inf]' 0.43536 'TmWBank' '[-Inf,12)' -0.18221 'TmWBank' '[12,45)' -0.038279 'TmWBank' '[45,71)' 0.39569 'TmWBank' '[71,Inf]' 0.95074 'OtherCC' 'No' -0.193 'OtherCC' 'Yes' 0.15868 'AMBalance' '[-Inf,558.88)' 0.3552 'AMBalance' '[558.88,1597.44)' -0.026797 'AMBalance' '[1597.44,Inf]' -0.21168

Usemodifybins到give the bins more descriptive labels.

sc = modifybins(sc,“托管”,,,,'binlabels',,,,。。。{“最多36”'37至39''40至45''46至57''58 and up'});sc = modifybins(sc,'CustIncome',,,,'binlabels',,,,。。。{'Up to 28999''29000至32999''33000至41999''42000至46999''47000及以上'});sc = modifybins(sc,'tmwbank',,,,'binlabels',,,,。。。{'最多11''12 to 44''45至70''71及以上'});sc = modifybins(sc,“ Ambalance”,,,,'binlabels',,,,。。。{“最多558.87”'558.88至1597.43''1597.44 and up'});p1 = displaypoints(sc); disp(p1)
Predictors Bin Points ____________ ___________________ _________ 'CustAge' 'Up to 36' -0.15314 'CustAge' '37 to 39' -0.062247 'CustAge' '40 to 45' 0.045763 'CustAge' '46 to 57' 0.22888 'CustAge' '58 and up' 0.48354 'ResStatus' 'Tenant' -0.031302 'ResStatus' 'Home Owner' 0.12697 'ResStatus' 'Other' 0.37652 'EmpStatus' 'Unknown' -0.076369 'EmpStatus' 'Employed' 0.31456 'CustIncome' 'Up to 28999' -0.45455 'CustIncome' '29000 to 32999' -0.1037 'CustIncome' '33000 to 41999' 0.077768 'CustIncome' '42000 to 46999' 0.24406 'CustIncome' '47000 and up' 0.43536 'TmWBank' 'Up to 11' -0.18221 'TmWBank' '12 to 44' -0.038279 'TmWBank' '45 to 70' 0.39569 'TmWBank' '71 and up' 0.95074 'OtherCC' 'No' -0.193 'OtherCC' 'Yes' 0.15868 'AMBalance' 'Up to 558.87' 0.3552 'AMBalance' '558.88 to 1597.43' -0.026797 'AMBalance' '1597.44 and up' -0.21168

点缩放,也经常被舍入。要圆形并缩放点,请使用格式功能。为了example, you can set a target level of points corresponding to a target odds level and also set the required points-to-double-the-odds (PDO).

TargetPoints = 500;targetOdds = 2;PDO = 50;% Points to double the oddssc = formatpoints(sc,“点ddsandpdo”,[targetPoints targetOdds pDO]);p2 = displaypoints(sc);disp(p2)
Predictors Bin Points ____________ ___________________ ______ 'CustAge' 'Up to 36' 53.239 'CustAge' '37 to 39' 59.796 'CustAge' '40 to 45' 67.587 'CustAge' '46 to 57' 80.796 'CustAge' '58 and up' 99.166 'ResStatus' 'Tenant' 62.028 'ResStatus' 'Home Owner' 73.445 'ResStatus' 'Other' 91.446 'EmpStatus' 'Unknown' 58.777 'EmpStatus' 'Employed' 86.976 'CustIncome' 'Up to 28999' 31.497 'CustIncome' '29000 to 32999' 56.805 'CustIncome' '33000 to 41999' 69.896 'CustIncome' '42000 to 46999' 81.891 'CustIncome' '47000 and up' 95.69 'TmWBank' 'Up to 11' 51.142 'TmWBank' '12 to 44' 61.524 'TmWBank' '45 to 70' 92.829 'TmWBank' '71 and up' 132.87 'OtherCC' 'No' 50.364 'OtherCC' 'Yes' 75.732 'AMBalance' 'Up to 558.87' 89.908 'AMBalance' '558.88 to 1597.43' 62.353 'AMBalance' '1597.44 and up' 49.016

步骤7.评分数据。

使用分数function to compute the scores for the training data. You can also pass an optionaldata输入到分数,例如,验证数据。每个客户的每个预测指标的点作为可选输出提供。

[得分,点] =得分(SC);disp(分数(1:10))disp(点(1:10,:))
528.2044 554.8861 505.2406 564.0717 554.8861 586.1904 441.8755 515.8125 524.4553 508.3169 CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ _________ _________ __________ _______ _______ _________ 80.796 62.028 58.777 95.69 92.829 75.732 62.353 99.166 73.445 86.976 95.69 61.524 75.732 62.353 80.796 62.028 86.976 69.896 92.829 50.364 62.353 80.796 73.445 86.976 95.69 61.524 75.732 89.908 99.166 73.445 86.976 95.69 61.524 75.732 62.353 99.166 73.445 86.976 95.69 92.829 75.732 62.353 53.239 73.445 58.777 56.805 61.524 75.732 62.353 80.796 91.446 86.976 95.69 61.524 50.364 49.016 80.796 62.028 58.777 95.69 61.524 75.732 89.908 80.796 73.445 58.777 95.69 61.524 75.732 62.353

Step 8. Calculate the probability of default.

To calculate the probability of default, use theprobdefault功能。

pd = probdefault(sc);

定义“良好”的概率,并绘制预测的赔率与格式的分数。视觉分析目标点和目标赔率是否匹配以及点对点 - 选择(PDO)的关系。

ProbGood = 1-pd; PredictedOdds = ProbGood./pd; figure scatter(Scores,PredictedOdds) title(“预测赔率与得分”)xlabel('分数')ylabel(“预测赔率”)holdonxLimits = xlim; yLimits = ylim;%目标点和赔率plot([TargetPoints TargetPoints],[yLimits(1) TargetOdds],'K:')plot([xLimits(1) TargetPoints],[TargetOdds TargetOdds],'K:'% Target points plus PDOplot([TargetPoints+PDO TargetPoints+PDO],[yLimits(1) 2*TargetOdds],'K:')plot([xLimits(1) TargetPoints+PDO],[2*TargetOdds 2*TargetOdds],'K:'% Target points minus PDO图([[targetpoints-pdo targetpoints-pdo],[ylimits(1)targetOdds/2],'K:')图([XLIMITS(1)TARGETPOINTS-PDO],[targetOdds/2 targetOdds/2],'K:')holdoff

预测赔率与得分的情节

步骤9.使用CAP,ROC和KOLMOGOROV-SMIRNOV统计数据验证信用记分卡模型

CreditsCoreCard对象支持三种验万博1manbetx证方法:累积准确度概况(CAP),接收器操作特征(ROC)和Kolmogorov-Smirnov(KS)统计。有关CAP,ROC和KS的更多信息,请参阅valialatemodel

[Stats,T] = validatemodel(sc,'阴谋',,,,{'帽',,,,'ROC',,,,'KS'});disp(统计)disp(t(1:15,:))
Measure Value ______________________ _______ 'Accuracy Ratio' 0.32225 'Area under ROC curve' 0.66113 'KS statistic' 0.22324 'KS score' 499.18 Scores ProbDefault TrueBads FalseBads TrueGoods FalseGoods Sensitivity FalseAlarm PctObs ______ ___________ ________ _________ _________ __________ ___________ __________ __________ 369.4 0.7535 0 1 802 3970 0.0012453 0.00083333 377.86 0.73107 1 1 802 396 0.0025189 0.0012453 0.0016667 379.78 0.7258 2 1 802 395 0.0050378 0.0012453 0.0025 391.81 0.69139 3 1 802 394 0.0075567 0.0012453 0.0033333 394.77 0.68259 3 2 801 394 0.0075567 0.0024907 0.0041667 395.78 0.67954 4 2 801 393 0.010076 0.0024907 0.005 396.95 0.675985 2 801 392 0.012594 0.0024907 0.0058333 398.37 0.67167 6 2 801 391 0.015113 0.0024907 0.0066667 401.26 0.66276 7 2 801 390 0.017632 0.0024907 0.0075 403.23 0.65664 8 2 801 389 0.020151 0.0024907 0.0083333 405.09 0.65081 8 3 800 389 0.020151 0.003736 0.0091667 405.15 0.65062 11 5 798 386 0.0277080.0062267 0.013333 405.37 0.64991 11 6797 386 0.027708 0.007472 0.014167 406.18 0.64735 12 6 797 385 0.030227 0.007472 0.015 407.14 0.64433 13 6 797 384 0.032746 0.007472 0.015833

CAP curve

ROC曲线

K-S plot

也可以看看

||||||||||||||||

相关示例

更多关于

外部网站