ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析
目录
利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析
相关文章
ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析
ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析实现
- train_boston_data.shape (1460, 81)
- Id MSSubClass MSZoning ... SaleType SaleCondition SalePrice
- 0 1 60 RL ... WD Normal 208500
- 1 2 20 RL ... WD Normal 181500
- 2 3 60 RL ... WD Normal 223500
- 3 4 70 RL ... WD Abnorml 140000
- 4 5 60 RL ... WD Normal 250000
-
- [5 rows x 81 columns]
- train_t.head() LotFrontage GarageArea SalePrice
- 0 65.0 548 208500
- 1 80.0 460 181500
- 2 68.0 608 223500
- 3 60.0 642 140000
- 4 84.0 836 250000
- after scale,train_t.head() LotFrontage GarageArea SalePrice
- 0 0.207668 0.386460 0.276159
- 1 0.255591 0.324401 0.240397
- 2 0.217252 0.428773 0.296026
- 3 0.191693 0.452750 0.185430
- 4 0.268371 0.589563 0.331126
- LotFrontage GarageArea
- 0 0.207668 0.386460
- 1 0.255591 0.324401
- 2 0.217252 0.428773
- 3 0.191693 0.452750
- 4 0.268371 0.589563
- Id MSSubClass LotFrontage ... MoSold YrSold SalePrice
- Id 1.000000 0.011156 -0.010601 ... 0.021172 0.000712 -0.021917
- MSSubClass 0.011156 1.000000 -0.386347 ... -0.013585 -0.021407 -0.084284
- LotFrontage -0.010601 -0.386347 1.000000 ... 0.011200 0.007450 0.351799
- LotArea -0.033226 -0.139781 0.426095 ... 0.001205 -0.014261 0.263843
- OverallQual -0.028365 0.032628 0.251646 ... 0.070815 -0.027347 0.790982
- OverallCond 0.012609 -0.059316 -0.059213 ... -0.003511 0.043950 -0.077856
- YearBuilt -0.012713 0.027850 0.123349 ... 0.012398 -0.013618 0.522897
- YearRemodAdd -0.021998 0.040581 0.088866 ... 0.021490 0.035743 0.507101
- MasVnrArea -0.050298 0.022936 0.193458 ... -0.005965 -0.008201 0.477493
- BsmtFinSF1 -0.005024 -0.069836 0.233633 ... -0.015727 0.014359 0.386420
- BsmtFinSF2 -0.005968 -0.065649 0.049900 ... -0.015211 0.031706 -0.011378
- BsmtUnfSF -0.007940 -0.140759 0.132644 ... 0.034888 -0.041258 0.214479
- TotalBsmtSF -0.015415 -0.238518 0.392075 ... 0.013196 -0.014969 0.613581
- 1stFlrSF 0.010496 -0.251758 0.457181 ... 0.031372 -0.013604 0.605852
- 2ndFlrSF 0.005590 0.307886 0.080177 ... 0.035164 -0.028700 0.319334
- LowQualFinSF -0.044230 0.046474 0.038469 ... -0.022174 -0.028921 -0.025606
- GrLivArea 0.008273 0.074853 0.402797 ... 0.050240 -0.036526 0.708624
- BsmtFullBath 0.002289 0.003491 0.100949 ... -0.025361 0.067049 0.227122
- BsmtHalfBath -0.020155 -0.002333 -0.007234 ... 0.032873 -0.046524 -0.016844
- FullBath 0.005587 0.131608 0.198769 ... 0.055872 -0.019669 0.560664
- HalfBath 0.006784 0.177354 0.053532 ... -0.009050 -0.010269 0.284108
- BedroomAbvGr 0.037719 -0.023438 0.263170 ... 0.046544 -0.036014 0.168213
- KitchenAbvGr 0.002951 0.281721 -0.006069 ... 0.026589 0.031687 -0.135907
- TotRmsAbvGrd 0.027239 0.040380 0.352096 ... 0.036907 -0.034516 0.533723
- Fireplaces -0.019772 -0.045569 0.266639 ... 0.046357 -0.024096 0.466929
- GarageYrBlt 0.000072 0.085072 0.070250 ... 0.005337 -0.001014 0.486362
- GarageCars 0.016570 -0.040110 0.285691 ... 0.040522 -0.039117 0.640409
- GarageArea 0.017634 -0.098672 0.344997 ... 0.027974 -0.027378 0.623431
- WoodDeckSF -0.029643 -0.012579 0.088521 ... 0.021011 0.022270 0.324413
- OpenPorchSF -0.000477 -0.006100 0.151972 ... 0.071255 -0.057619 0.315856
- EnclosedPorch 0.002889 -0.012037 0.010700 ... -0.028887 -0.009916 -0.128578
- 3SsnPorch -0.046635 -0.043825 0.070029 ... 0.029474 0.018645 0.044584
- ScreenPorch 0.001330 -0.026030 0.041383 ... 0.023217 0.010694 0.111447
- PoolArea 0.057044 0.008283 0.206167 ... -0.033737 -0.059689 0.092404
- MiscVal -0.006242 -0.007683 0.003368 ... -0.006495 0.004906 -0.021190
- MoSold 0.021172 -0.013585 0.011200 ... 1.000000 -0.145721 0.046432
- YrSold 0.000712 -0.021407 0.007450 ... -0.145721 1.000000 -0.028923
- SalePrice -0.021917 -0.084284 0.351799 ... 0.046432 -0.028923 1.000000
-
- [38 rows x 38 columns]
- k_means_cluster_centers
- [[0.1938454 0.21080405]
- [0.25140958 0.44595543]]
- k_means_labels_unique
- [0 1]
- 0 [1 1 1 ... 0 0 0]
- 0 [1 1 1 ... 0 0 0] [False False False ... True True True]
- 1 [1 1 1 ... 0 0 0]
- 1 [1 1 1 ... 0 0 0] [ True True True ... False False False]
- class KMeans Found at: sklearn.cluster._kmeans
-
- class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
- """K-Means clustering.
-
- Read more in the :ref:`User Guide <k_means>`.
-
- Parameters
- ----------
-
- n_clusters : int, default=8
- The number of clusters to form as well as the number of
- centroids to generate.
-
- init : {'k-means++', 'random', ndarray, callable}, default='k-
- means++'
- Method for initialization:
-
- 'k-means++' : selects initial cluster centers for k-mean
- clustering in a smart way to speed up convergence. See
- section
- Notes in k_init for more details.
-
- 'random': choose `n_clusters` observations (rows) at
- random from data
- for the initial centroids.
-
- If an ndarray is passed, it should be of shape (n_clusters,
- n_features)
- and gives the initial centers.
-
- If a callable is passed, it should take arguments X,
- n_clusters and a
- random state and return an initialization.
-
- n_init : int, default=10
- Number of time the k-means algorithm will be run with
- different
- centroid seeds. The final results will be the best output of
- n_init consecutive runs in terms of inertia.
-
- max_iter : int, default=300
- Maximum number of iterations of the k-means algorithm
- for a
- single run.
-
- tol : float, default=1e-4
- Relative tolerance with regards to Frobenius norm of the
- difference
- in the cluster centers of two consecutive iterations to
- declare
- convergence.
- It's not advised to set `tol=0` since convergence might
- never be
- declared due to rounding errors. Use a very small number
- instead.
-
- precompute_distances : {'auto', True, False}, default='auto'
- Precompute distances (faster but takes more memory).
-
- 'auto' : do not precompute distances if n_samples *
- n_clusters > 12
- million. This corresponds to about 100MB overhead per
- job using
- double precision.
-
- True : always precompute distances.
-
- False : never precompute distances.
-
- .. deprecated:: 0.23
- 'precompute_distances' was deprecated in version 0.22
- and will be
- removed in 0.25. It has no effect.
-
- verbose : int, default=0
- Verbosity mode.
-
- random_state : int, RandomState instance, default=None
- Determines random number generation for centroid
- initialization. Use
- an int to make the randomness deterministic.
- See :term:`Glossary <random_state>`.
-
- copy_x : bool, default=True
- When pre-computing distances it is more numerically
- accurate to center
- the data first. If copy_x is True (default), then the original
- data is
- not modified. If False, the original data is modified, and put
- back
- before the function returns, but small numerical
- differences may be
- introduced by subtracting and then adding the data mean.
- Note that if
- the original data is not C-contiguous, a copy will be made
- even if
- copy_x is False. If the original data is sparse, but not in CSR
- format,
- a copy will be made even if copy_x is False.
-
- n_jobs : int, default=None
- The number of OpenMP threads to use for the
- computation. Parallelism is
- sample-wise on the main cython loop which assigns each
- sample to its
- closest center.
-
- ``None`` or ``-1`` means using all processors.
-
- .. deprecated:: 0.23
- ``n_jobs`` was deprecated in version 0.23 and will be
- removed in
- 0.25.
-
- algorithm : {"auto", "full", "elkan"}, default="auto"
- K-means algorithm to use. The classical EM-style algorithm
- is "full".
- The "elkan" variation is more efficient on data with well-
- defined
- clusters, by using the triangle inequality. However it's
- more memory
- intensive due to the allocation of an extra array of shape
- (n_samples, n_clusters).
-
- For now "auto" (kept for backward compatibiliy) chooses
- "elkan" but it
- might change in the future for a better heuristic.
-
- .. versionchanged:: 0.18
- Added Elkan algorithm
-
- Attributes
- ----------
- cluster_centers_ : ndarray of shape (n_clusters, n_features)
- Coordinates of cluster centers. If the algorithm stops
- before fully
- converging (see ``tol`` and ``max_iter``), these will not be
- consistent with ``labels_``.
-
- labels_ : ndarray of shape (n_samples,)
-
评论 0 条
加入交流群
请使用微信扫一扫!