ML之KMeans：利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

飞机 2022-09-19 11:54:20  49167

分类专栏：资讯

利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

设计思路

输出结果

核心代码

利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

设计思路

输出结果


train_boston_data.shape (1460, 81)
   Id  MSSubClass MSZoning  ...  SaleType  SaleCondition SalePrice
0   1          60       RL  ...        WD         Normal    208500
1   2          20       RL  ...        WD         Normal    181500
2   3          60       RL  ...        WD         Normal    223500
3   4          70       RL  ...        WD        Abnorml    140000
4   5          60       RL  ...        WD         Normal    250000
 
[5 rows x 81 columns]
train_t.head()    LotFrontage  GarageArea  SalePrice
0         65.0         548     208500
1         80.0         460     181500
2         68.0         608     223500
3         60.0         642     140000
4         84.0         836     250000
after scale,train_t.head()    LotFrontage  GarageArea  SalePrice
0     0.207668    0.386460   0.276159
1     0.255591    0.324401   0.240397
2     0.217252    0.428773   0.296026
3     0.191693    0.452750   0.185430
4     0.268371    0.589563   0.331126
   LotFrontage  GarageArea
0     0.207668    0.386460
1     0.255591    0.324401
2     0.217252    0.428773
3     0.191693    0.452750
4     0.268371    0.589563
                     Id  MSSubClass  LotFrontage  ...    MoSold    YrSold  SalePrice
Id             1.000000    0.011156    -0.010601  ...  0.021172  0.000712  -0.021917
MSSubClass     0.011156    1.000000    -0.386347  ... -0.013585 -0.021407  -0.084284
LotFrontage   -0.010601   -0.386347     1.000000  ...  0.011200  0.007450   0.351799
LotArea       -0.033226   -0.139781     0.426095  ...  0.001205 -0.014261   0.263843
OverallQual   -0.028365    0.032628     0.251646  ...  0.070815 -0.027347   0.790982
OverallCond    0.012609   -0.059316    -0.059213  ... -0.003511  0.043950  -0.077856
YearBuilt     -0.012713    0.027850     0.123349  ...  0.012398 -0.013618   0.522897
YearRemodAdd  -0.021998    0.040581     0.088866  ...  0.021490  0.035743   0.507101
MasVnrArea    -0.050298    0.022936     0.193458  ... -0.005965 -0.008201   0.477493
BsmtFinSF1    -0.005024   -0.069836     0.233633  ... -0.015727  0.014359   0.386420
BsmtFinSF2    -0.005968   -0.065649     0.049900  ... -0.015211  0.031706  -0.011378
BsmtUnfSF     -0.007940   -0.140759     0.132644  ...  0.034888 -0.041258   0.214479
TotalBsmtSF   -0.015415   -0.238518     0.392075  ...  0.013196 -0.014969   0.613581
1stFlrSF       0.010496   -0.251758     0.457181  ...  0.031372 -0.013604   0.605852
2ndFlrSF       0.005590    0.307886     0.080177  ...  0.035164 -0.028700   0.319334
LowQualFinSF  -0.044230    0.046474     0.038469  ... -0.022174 -0.028921  -0.025606
GrLivArea      0.008273    0.074853     0.402797  ...  0.050240 -0.036526   0.708624
BsmtFullBath   0.002289    0.003491     0.100949  ... -0.025361  0.067049   0.227122
BsmtHalfBath  -0.020155   -0.002333    -0.007234  ...  0.032873 -0.046524  -0.016844
FullBath       0.005587    0.131608     0.198769  ...  0.055872 -0.019669   0.560664
HalfBath       0.006784    0.177354     0.053532  ... -0.009050 -0.010269   0.284108
BedroomAbvGr   0.037719   -0.023438     0.263170  ...  0.046544 -0.036014   0.168213
KitchenAbvGr   0.002951    0.281721    -0.006069  ...  0.026589  0.031687  -0.135907
TotRmsAbvGrd   0.027239    0.040380     0.352096  ...  0.036907 -0.034516   0.533723
Fireplaces    -0.019772   -0.045569     0.266639  ...  0.046357 -0.024096   0.466929
GarageYrBlt    0.000072    0.085072     0.070250  ...  0.005337 -0.001014   0.486362
GarageCars     0.016570   -0.040110     0.285691  ...  0.040522 -0.039117   0.640409
GarageArea     0.017634   -0.098672     0.344997  ...  0.027974 -0.027378   0.623431
WoodDeckSF    -0.029643   -0.012579     0.088521  ...  0.021011  0.022270   0.324413
OpenPorchSF   -0.000477   -0.006100     0.151972  ...  0.071255 -0.057619   0.315856
EnclosedPorch  0.002889   -0.012037     0.010700  ... -0.028887 -0.009916  -0.128578
3SsnPorch     -0.046635   -0.043825     0.070029  ...  0.029474  0.018645   0.044584
ScreenPorch    0.001330   -0.026030     0.041383  ...  0.023217  0.010694   0.111447
PoolArea       0.057044    0.008283     0.206167  ... -0.033737 -0.059689   0.092404
MiscVal       -0.006242   -0.007683     0.003368  ... -0.006495  0.004906  -0.021190
MoSold         0.021172   -0.013585     0.011200  ...  1.000000 -0.145721   0.046432
YrSold         0.000712   -0.021407     0.007450  ... -0.145721  1.000000  -0.028923
SalePrice     -0.021917   -0.084284     0.351799  ...  0.046432 -0.028923   1.000000
 
[38 rows x 38 columns]
k_means_cluster_centers 
 [[0.1938454  0.21080405]
 [0.25140958 0.44595543]]
k_means_labels_unique 
 [0 1]
0 [1 1 1 ... 0 0 0]
0 [1 1 1 ... 0 0 0] [False False False ...  True  True  True]
1 [1 1 1 ... 0 0 0]
1 [1 1 1 ... 0 0 0] [ True  True  True ... False False False]

核心代码


class KMeans Found at: sklearn.cluster._kmeans
 
class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
    """K-Means clustering.
    
    Read more in the :ref:`User Guide <k_means>`.
    
    Parameters
    ----------
    
    n_clusters : int, default=8
    The number of clusters to form as well as the number of
    centroids to generate.
    
    init : {'k-means++', 'random', ndarray, callable}, default='k-
     means++'
    Method for initialization:
    
    'k-means++' : selects initial cluster centers for k-mean
    clustering in a smart way to speed up convergence. See 
     section
    Notes in k_init for more details.
    
    'random': choose `n_clusters` observations (rows) at 
     random from data
    for the initial centroids.
    
    If an ndarray is passed, it should be of shape (n_clusters, 
     n_features)
    and gives the initial centers.
    
    If a callable is passed, it should take arguments X, 
     n_clusters and a
    random state and return an initialization.
    
    n_init : int, default=10
    Number of time the k-means algorithm will be run with 
     different
    centroid seeds. The final results will be the best output of
    n_init consecutive runs in terms of inertia.
    
    max_iter : int, default=300
    Maximum number of iterations of the k-means algorithm 
     for a
    single run.
    
    tol : float, default=1e-4
    Relative tolerance with regards to Frobenius norm of the 
     difference
    in the cluster centers of two consecutive iterations to 
     declare
    convergence.
    It's not advised to set `tol=0` since convergence might 
     never be
    declared due to rounding errors. Use a very small number 
     instead.
    
    precompute_distances : {'auto', True, False}, default='auto'
    Precompute distances (faster but takes more memory).
    
    'auto' : do not precompute distances if n_samples * 
     n_clusters > 12
    million. This corresponds to about 100MB overhead per 
     job using
    double precision.
    
    True : always precompute distances.
    
    False : never precompute distances.
    
    .. deprecated:: 0.23
    'precompute_distances' was deprecated in version 0.22 
     and will be
    removed in 0.25. It has no effect.
    
    verbose : int, default=0
    Verbosity mode.
    
    random_state : int, RandomState instance, default=None
    Determines random number generation for centroid 
     initialization. Use
    an int to make the randomness deterministic.
    See :term:`Glossary <random_state>`.
    
    copy_x : bool, default=True
    When pre-computing distances it is more numerically 
     accurate to center
    the data first. If copy_x is True (default), then the original 
     data is
    not modified. If False, the original data is modified, and put 
     back
    before the function returns, but small numerical 
     differences may be
    introduced by subtracting and then adding the data mean. 
     Note that if
    the original data is not C-contiguous, a copy will be made 
     even if
    copy_x is False. If the original data is sparse, but not in CSR 
     format,
    a copy will be made even if copy_x is False.
    
    n_jobs : int, default=None
    The number of OpenMP threads to use for the 
     computation. Parallelism is
    sample-wise on the main cython loop which assigns each 
     sample to its
    closest center.
    
    ``None`` or ``-1`` means using all processors.
    
    .. deprecated:: 0.23
    ``n_jobs`` was deprecated in version 0.23 and will be 
     removed in
    0.25.
    
    algorithm : {"auto", "full", "elkan"}, default="auto"
    K-means algorithm to use. The classical EM-style algorithm 
     is "full".
    The "elkan" variation is more efficient on data with well-
     defined
    clusters, by using the triangle inequality. However it's 
     more memory
    intensive due to the allocation of an extra array of shape
    (n_samples, n_clusters).
    
    For now "auto" (kept for backward compatibiliy) chooses 
     "elkan" but it
    might change in the future for a better heuristic.
    
    .. versionchanged:: 0.18
    Added Elkan algorithm
    
    Attributes
    ----------
    cluster_centers_ : ndarray of shape (n_clusters, n_features)
    Coordinates of cluster centers. If the algorithm stops 
     before fully
    converging (see ``tol`` and ``max_iter``), these will not be
    consistent with ``labels_``.
    
    labels_ : ndarray of shape (n_samples,)

            网站声明：如果转载，请联系本站管理员。否则一切后果自行承担。
          
        本文链接：https://www.xckfsq.com/news/show.html?id=2188
          
            
              赞同  0
            
            
              反对  0
            
          
      
      
      
        评论 0 条
        
		              
		    
        
              
      
      
        相关文章
        
                    
            ML之LoR：基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解
            2022-09-19
            51529 浏览
            
        
       
                
                    ML之LoR：基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以
          
                    
            ML之PFI(eli5)：基于mpg汽车油耗数据集利用RF随机森林算法和PFI置换特征重要性算法实现模型特征可解释性排序
            2022-09-19
            50615 浏览
            
        
       
                
                    ML之PFI(eli5)：基于mpg汽车油耗数据集利用RF随机森林算法和PFI置换特征重要
          
                    
            TF之DD：利用Inception模型+GD算法生成带背景的大尺寸、高质量的Deep Dream图片——五个架构设计思维导图
            2022-09-19
            48492 浏览
            
        
       
                
                    TF之DD：利用Inception模型+GD算法生成带背景的大尺寸、高质量的Deep Dr
          
                    
            DL之DNN：基于sklearn自带california_housing加利福尼亚房价数据集利用GD神经网络梯度下降算法进行回归预测(数据较多时采用mini-batch方式训练会更快)
            2022-09-19
            51835 浏览
            
        
       
                
                    DL之DNN：基于sklearn自带california_housing加利福尼亚房价数据
          
                    
            ML之mlxtend：基于iris鸢尾花数据集利用逻辑回归LoR/随机森林RF/支持向量机SVM/集成学习算法结合mlxtend库实现模型可解释性
            2022-09-19
            50773 浏览
            
        
       
                
                    ML之mlxtend：基于iris鸢尾花数据集利用逻辑回归LoR/随机森林RF/支持向量机
          
                    
            DataScience：基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
            2022-09-19
            53993 浏览
            
        
       
                
                    DataScience：基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回
          
                    
            DL之GRU(Tensorflow框架)：基于茅台股票数据集利用GRU算法实现回归预测(保存模型.ckpt.index、.ckpt.data文件)
            2022-09-19
            49028 浏览
            
        
       
                
                    DL之GRU(Tensorflow框架)：基于茅台股票数据集利用GRU算法实现回归预测(保
          
                    
            DataScience&ML：基于heart disease心脏病分类预测数据集利用决策数算法基于graphviz/eli5/pdpbox/shap库实现模型可解释性(全局/部分/局部解释)之详细攻略
            2022-09-19
            48901 浏览
            
        
       
                
                    DataScience&ML：基于heart disease心脏病分类预测数据集利
          
                    
            ML之LoR：基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
            2022-09-19
            52362 浏览
            
        
       
                
                    ML之LoR：基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以
          
                    
            DL之GRU：基于2022年6月最新上证指数数据集结合Pytorch框架利用GRU算法预测最新股票上证指数实现回归预测
            2022-09-19
            50068 浏览
            
        
       
                
                    DL之GRU：基于2022年6月最新上证指数数据集结合Pytorch框架利用GRU算法预测


    
    
      
        
          
            
              
            
          
          
            飞机L0
          
          
            粉丝 0
            发表 9
                        + 关注
                        私信
          
        
      
            
    上周热门
        
        如何使用 StarRocks 管理和优化数据湖中的数据？
         2941
    
        
        【软件正版化】软件正版化工作要点
         2860
    
        
        统信UOS试玩黑神话：悟空
         2819
    
        
        信刻光盘安全隔离与信息交换系统
         2712
    
        
        镜舟科技与中启乘数科技达成战略合作，共筑数据服务新生态
         1246
    
        
        grub引导程序无法找到指定设备和分区
         1213
    
        
        华为全联接大会2024丨软通动力分论坛精彩议程抢先看！
         163
    
        
        点击报名 | 京东2025校招进校行程预告
         162
    
        
        2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办
         160
    
        
        华为纯血鸿蒙正式版9月底见！但Mate 70的内情还得接着挖...
         157
    
    

    本周热议
        
        我的信创开放社区兼职赚钱历程
         40
    
        
        今天你签到了吗？
         27
    
        
        信创开放社区邀请他人注册的具体步骤如下
         15
    
        
        如何玩转信创开放社区—从小白进阶到专家
         15
    
        
        方德桌面操作系统
         14
    
        
        我有15积分有什么用?
         13
    
        
        用抖音玩法闯信创开放社区——用平台宣传企业产品服务
         13
    
        
        如何让你先人一步获得悬赏问题信息？（创作者必看）
         12
    
        
        2024中国信创产业发展大会暨中国信息科技创新与应用博览会
         9
    
        
        中央国家机关政府采购中心：应当将CPU、操作系统符合安全可靠测评要求纳入采购需求
         8
    
    

    热门标签更多
    
        
                        运维
                        银河麒麟
                        安全
                        国产数据库
                        信创知识
                        信创硬件
                        后端
                        Windows
                        鸿蒙
                        信创外设
                        AI(人工智能)
                        统信UOS
                        国产办公软件
                        国产操作系统
                        中间件
                        前端
                        校园招聘
                        国产设计软件
                        软件正版化
                    
    


    
        
                        关于社区
                        使用帮助
                        招贤纳士




    
		
		
		
		
		
			
			
			 登录 
			 注册 
		
	      
        菜单
        
        
						
				
					 资源 
				
			
						
				
					 问题 
				
			
						
				
					 资讯 
				
			
						
				
					 人才 
				
			
						
				
					 活动 
				
			
						
				
					 畅所欲言 
				
			
						
				
					 AI 
				
			
					  
            
              
                 回首页
                index
              
            
           
        
      
    



  
    
      菜单
      
      
        
          
             首页 
          
        
                
          
             资源 
          
        
                
          
             问题 
          
        
                
          
             资讯 
          
        
                
          
             人才 
          
        
                
          
             活动 
          
        
                
          
             畅所欲言 
          
        
                
          
             AI 
          
        
              
    
  


  
    信创开放社区    Copyright © 2024 
    渝ICP备15002787号-10
    渝公网安备50010702506279
        关于社区
        使用帮助
        招贤纳士
        友情链接：
        重庆市软件正版化服务平台
        信创参考
        信创产业网
        重庆享动科技有限公司
      


  
    
    
      关注我们
      
        
          
          扫一扫关注公众号
        
        
          
          扫一扫加入交流群
        
      
    
  
  




  加入交流群
  
  请使用微信扫一扫!