ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测
目录
1、数据集简介
Dataset之AllstateClaimsSeverity:AllstateClaimsSeverity数据集(Kaggle2016竞赛)的简介、下载、案例应用之详细攻略
2、数据可视化
T1、绘制heatmap图
T2、绘制散点图
- threshold = 0.5
- corr_list = []
- for i in range(0,size):
- for j in range(i+1,size):
- if (data_corr.iloc[i,j] >= threshold and data_corr.iloc[i,j] < 1) or (data_corr.iloc[i,j] < 0 and data_corr.iloc[i,j] <= -threshold):
- corr_list.append([data_corr.iloc[i,j],i,j])
- s_corr_list = sorted(corr_list,key=lambda x: -abs(x[0]))
- for v,i,j in s_corr_list:
- print ("%s and %s = %.2f" % (cols[i],cols[j],v))
-
-
- for v,i,j in s_corr_list:
- sns.pairplot(train, size=6, x_vars=cols[i],y_vars=cols[j] )
- plt.title('AllstateClaimsSeverity: Scatter plot of only the highly correlated pairs')
- plt.show()
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!