ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析


飞机
飞机 2022-09-19 11:54:20 49167
分类专栏: 资讯

ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

目录

利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

设计思路

输出结果

核心代码


相关文章
ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析
ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析实现
 

利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

设计思路

输出结果

  1. train_boston_data.shape (1460, 81)
  2. Id MSSubClass MSZoning ... SaleType SaleCondition SalePrice
  3. 0 1 60 RL ... WD Normal 208500
  4. 1 2 20 RL ... WD Normal 181500
  5. 2 3 60 RL ... WD Normal 223500
  6. 3 4 70 RL ... WD Abnorml 140000
  7. 4 5 60 RL ... WD Normal 250000
  8. [5 rows x 81 columns]
  9. train_t.head() LotFrontage GarageArea SalePrice
  10. 0 65.0 548 208500
  11. 1 80.0 460 181500
  12. 2 68.0 608 223500
  13. 3 60.0 642 140000
  14. 4 84.0 836 250000
  15. after scale,train_t.head() LotFrontage GarageArea SalePrice
  16. 0 0.207668 0.386460 0.276159
  17. 1 0.255591 0.324401 0.240397
  18. 2 0.217252 0.428773 0.296026
  19. 3 0.191693 0.452750 0.185430
  20. 4 0.268371 0.589563 0.331126
  21. LotFrontage GarageArea
  22. 0 0.207668 0.386460
  23. 1 0.255591 0.324401
  24. 2 0.217252 0.428773
  25. 3 0.191693 0.452750
  26. 4 0.268371 0.589563
  27. Id MSSubClass LotFrontage ... MoSold YrSold SalePrice
  28. Id 1.000000 0.011156 -0.010601 ... 0.021172 0.000712 -0.021917
  29. MSSubClass 0.011156 1.000000 -0.386347 ... -0.013585 -0.021407 -0.084284
  30. LotFrontage -0.010601 -0.386347 1.000000 ... 0.011200 0.007450 0.351799
  31. LotArea -0.033226 -0.139781 0.426095 ... 0.001205 -0.014261 0.263843
  32. OverallQual -0.028365 0.032628 0.251646 ... 0.070815 -0.027347 0.790982
  33. OverallCond 0.012609 -0.059316 -0.059213 ... -0.003511 0.043950 -0.077856
  34. YearBuilt -0.012713 0.027850 0.123349 ... 0.012398 -0.013618 0.522897
  35. YearRemodAdd -0.021998 0.040581 0.088866 ... 0.021490 0.035743 0.507101
  36. MasVnrArea -0.050298 0.022936 0.193458 ... -0.005965 -0.008201 0.477493
  37. BsmtFinSF1 -0.005024 -0.069836 0.233633 ... -0.015727 0.014359 0.386420
  38. BsmtFinSF2 -0.005968 -0.065649 0.049900 ... -0.015211 0.031706 -0.011378
  39. BsmtUnfSF -0.007940 -0.140759 0.132644 ... 0.034888 -0.041258 0.214479
  40. TotalBsmtSF -0.015415 -0.238518 0.392075 ... 0.013196 -0.014969 0.613581
  41. 1stFlrSF 0.010496 -0.251758 0.457181 ... 0.031372 -0.013604 0.605852
  42. 2ndFlrSF 0.005590 0.307886 0.080177 ... 0.035164 -0.028700 0.319334
  43. LowQualFinSF -0.044230 0.046474 0.038469 ... -0.022174 -0.028921 -0.025606
  44. GrLivArea 0.008273 0.074853 0.402797 ... 0.050240 -0.036526 0.708624
  45. BsmtFullBath 0.002289 0.003491 0.100949 ... -0.025361 0.067049 0.227122
  46. BsmtHalfBath -0.020155 -0.002333 -0.007234 ... 0.032873 -0.046524 -0.016844
  47. FullBath 0.005587 0.131608 0.198769 ... 0.055872 -0.019669 0.560664
  48. HalfBath 0.006784 0.177354 0.053532 ... -0.009050 -0.010269 0.284108
  49. BedroomAbvGr 0.037719 -0.023438 0.263170 ... 0.046544 -0.036014 0.168213
  50. KitchenAbvGr 0.002951 0.281721 -0.006069 ... 0.026589 0.031687 -0.135907
  51. TotRmsAbvGrd 0.027239 0.040380 0.352096 ... 0.036907 -0.034516 0.533723
  52. Fireplaces -0.019772 -0.045569 0.266639 ... 0.046357 -0.024096 0.466929
  53. GarageYrBlt 0.000072 0.085072 0.070250 ... 0.005337 -0.001014 0.486362
  54. GarageCars 0.016570 -0.040110 0.285691 ... 0.040522 -0.039117 0.640409
  55. GarageArea 0.017634 -0.098672 0.344997 ... 0.027974 -0.027378 0.623431
  56. WoodDeckSF -0.029643 -0.012579 0.088521 ... 0.021011 0.022270 0.324413
  57. OpenPorchSF -0.000477 -0.006100 0.151972 ... 0.071255 -0.057619 0.315856
  58. EnclosedPorch 0.002889 -0.012037 0.010700 ... -0.028887 -0.009916 -0.128578
  59. 3SsnPorch -0.046635 -0.043825 0.070029 ... 0.029474 0.018645 0.044584
  60. ScreenPorch 0.001330 -0.026030 0.041383 ... 0.023217 0.010694 0.111447
  61. PoolArea 0.057044 0.008283 0.206167 ... -0.033737 -0.059689 0.092404
  62. MiscVal -0.006242 -0.007683 0.003368 ... -0.006495 0.004906 -0.021190
  63. MoSold 0.021172 -0.013585 0.011200 ... 1.000000 -0.145721 0.046432
  64. YrSold 0.000712 -0.021407 0.007450 ... -0.145721 1.000000 -0.028923
  65. SalePrice -0.021917 -0.084284 0.351799 ... 0.046432 -0.028923 1.000000
  66. [38 rows x 38 columns]
  67. k_means_cluster_centers
  68. [[0.1938454 0.21080405]
  69. [0.25140958 0.44595543]]
  70. k_means_labels_unique
  71. [0 1]
  72. 0 [1 1 1 ... 0 0 0]
  73. 0 [1 1 1 ... 0 0 0] [False False False ... True True True]
  74. 1 [1 1 1 ... 0 0 0]
  75. 1 [1 1 1 ... 0 0 0] [ True True True ... False False False]

核心代码

  1. class KMeans Found at: sklearn.cluster._kmeans
  2. class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
  3. """K-Means clustering.
  4. Read more in the :ref:`User Guide <k_means>`.
  5. Parameters
  6. ----------
  7. n_clusters : int, default=8
  8. The number of clusters to form as well as the number of
  9. centroids to generate.
  10. init : {'k-means++', 'random', ndarray, callable}, default='k-
  11. means++'
  12. Method for initialization:
  13. 'k-means++' : selects initial cluster centers for k-mean
  14. clustering in a smart way to speed up convergence. See
  15. section
  16. Notes in k_init for more details.
  17. 'random': choose `n_clusters` observations (rows) at
  18. random from data
  19. for the initial centroids.
  20. If an ndarray is passed, it should be of shape (n_clusters,
  21. n_features)
  22. and gives the initial centers.
  23. If a callable is passed, it should take arguments X,
  24. n_clusters and a
  25. random state and return an initialization.
  26. n_init : int, default=10
  27. Number of time the k-means algorithm will be run with
  28. different
  29. centroid seeds. The final results will be the best output of
  30. n_init consecutive runs in terms of inertia.
  31. max_iter : int, default=300
  32. Maximum number of iterations of the k-means algorithm
  33. for a
  34. single run.
  35. tol : float, default=1e-4
  36. Relative tolerance with regards to Frobenius norm of the
  37. difference
  38. in the cluster centers of two consecutive iterations to
  39. declare
  40. convergence.
  41. It's not advised to set `tol=0` since convergence might
  42. never be
  43. declared due to rounding errors. Use a very small number
  44. instead.
  45. precompute_distances : {'auto', True, False}, default='auto'
  46. Precompute distances (faster but takes more memory).
  47. 'auto' : do not precompute distances if n_samples *
  48. n_clusters > 12
  49. million. This corresponds to about 100MB overhead per
  50. job using
  51. double precision.
  52. True : always precompute distances.
  53. False : never precompute distances.
  54. .. deprecated:: 0.23
  55. 'precompute_distances' was deprecated in version 0.22
  56. and will be
  57. removed in 0.25. It has no effect.
  58. verbose : int, default=0
  59. Verbosity mode.
  60. random_state : int, RandomState instance, default=None
  61. Determines random number generation for centroid
  62. initialization. Use
  63. an int to make the randomness deterministic.
  64. See :term:`Glossary <random_state>`.
  65. copy_x : bool, default=True
  66. When pre-computing distances it is more numerically
  67. accurate to center
  68. the data first. If copy_x is True (default), then the original
  69. data is
  70. not modified. If False, the original data is modified, and put
  71. back
  72. before the function returns, but small numerical
  73. differences may be
  74. introduced by subtracting and then adding the data mean.
  75. Note that if
  76. the original data is not C-contiguous, a copy will be made
  77. even if
  78. copy_x is False. If the original data is sparse, but not in CSR
  79. format,
  80. a copy will be made even if copy_x is False.
  81. n_jobs : int, default=None
  82. The number of OpenMP threads to use for the
  83. computation. Parallelism is
  84. sample-wise on the main cython loop which assigns each
  85. sample to its
  86. closest center.
  87. ``None`` or ``-1`` means using all processors.
  88. .. deprecated:: 0.23
  89. ``n_jobs`` was deprecated in version 0.23 and will be
  90. removed in
  91. 0.25.
  92. algorithm : {"auto", "full", "elkan"}, default="auto"
  93. K-means algorithm to use. The classical EM-style algorithm
  94. is "full".
  95. The "elkan" variation is more efficient on data with well-
  96. defined
  97. clusters, by using the triangle inequality. However it's
  98. more memory
  99. intensive due to the allocation of an extra array of shape
  100. (n_samples, n_clusters).
  101. For now "auto" (kept for backward compatibiliy) chooses
  102. "elkan" but it
  103. might change in the future for a better heuristic.
  104. .. versionchanged:: 0.18
  105. Added Elkan algorithm
  106. Attributes
  107. ----------
  108. cluster_centers_ : ndarray of shape (n_clusters, n_features)
  109. Coordinates of cluster centers. If the algorithm stops
  110. before fully
  111. converging (see ``tol`` and ``max_iter``), these will not be
  112. consistent with ``labels_``.
  113. labels_ : ndarray of shape (n_samples,)
  114. 网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

    赞同 0
    评论 0 条
飞机L0
粉丝 0 发表 9 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2941
【软件正版化】软件正版化工作要点  2860
统信UOS试玩黑神话:悟空  2819
信刻光盘安全隔离与信息交换系统  2712
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1246
grub引导程序无法找到指定设备和分区  1213
华为全联接大会2024丨软通动力分论坛精彩议程抢先看!  163
点击报名 | 京东2025校招进校行程预告  162
2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办  160
华为纯血鸿蒙正式版9月底见!但Mate 70的内情还得接着挖...  157
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
信创开放社区邀请他人注册的具体步骤如下 15
如何玩转信创开放社区—从小白进阶到专家 15
方德桌面操作系统 14
我有15积分有什么用? 13
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!