ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测


joyce
joyce 2022-09-19 11:54:11 51439
分类专栏: 资讯

ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

目录

基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

设计思路

输出结果

核心代码


相关文章
ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测
ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测实现

 

基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

设计思路

输出结果

  1. (149, 5)
  2. 5.1 3.5 1.4 0.2 Iris-setosa
  3. 0 4.9 3.0 1.4 0.2 Iris-setosa
  4. 1 4.7 3.2 1.3 0.2 Iris-setosa
  5. 2 4.6 3.1 1.5 0.2 Iris-setosa
  6. 3 5.0 3.6 1.4 0.2 Iris-setosa
  7. 4 5.4 3.9 1.7 0.4 Iris-setosa
  8. (149, 5)
  9. Sepal_Length Sepal_Width Petal_Length Petal_Width type
  10. 0 4.5 2.3 1.3 0.3 Iris-setosa
  11. 1 6.3 2.5 5.0 1.9 Iris-virginica
  12. 2 5.1 3.4 1.5 0.2 Iris-setosa
  13. 3 6.3 3.3 6.0 2.5 Iris-virginica
  14. 4 6.8 3.2 5.9 2.3 Iris-virginica
  15. 切分点: 29
  16. label_classes: ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
  17. kNNDIY模型预测,基于原数据: 0.95
  18. kNN模型预测,基于原数据预测: [0.96666667 1. 0.93333333 1. 0.93103448]
  19. kNN模型预测,原数据PCA处理后: [1. 0.96 0.95918367]

核心代码

  1. class KNeighborsClassifier Found at: sklearn.neighbors._classification
  2. class KNeighborsClassifier(NeighborsBase, KNeighborsMixin,
  3. SupervisedIntegerMixin, ClassifierMixin):
  4. """Classifier implementing the k-nearest neighbors vote.
  5. Read more in the :ref:`User Guide <classification>`.
  6. Parameters
  7. ----------
  8. n_neighbors : int, default=5
  9. Number of neighbors to use by default for :meth:`kneighbors` queries.
  10. weights : {'uniform', 'distance'} or callable, default='uniform'
  11. weight function used in prediction. Possible values:
  12. - 'uniform' : uniform weights. All points in each neighborhood
  13. are weighted equally.
  14. - 'distance' : weight points by the inverse of their distance.
  15. in this case, closer neighbors of a query point will have a
  16. greater influence than neighbors which are further away.
  17. - [callable] : a user-defined function which accepts an
  18. array of distances, and returns an array of the same shape
  19. containing the weights.
  20. algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
  21. Algorithm used to compute the nearest neighbors:
  22. - 'ball_tree' will use :class:`BallTree`
  23. - 'kd_tree' will use :class:`KDTree`
  24. - 'brute' will use a brute-force search.
  25. - 'auto' will attempt to decide the most appropriate algorithm
  26. based on the values passed to :meth:`fit` method.
  27. Note: fitting on sparse input will override the setting of
  28. this parameter, using brute force.
  29. leaf_size : int, default=30
  30. Leaf size passed to BallTree or KDTree. This can affect the
  31. speed of the construction and query, as well as the memory
  32. required to store the tree. The optimal value depends on the
  33. nature of the problem.
  34. p : int, default=2
  35. Power parameter for the Minkowski metric. When p = 1, this is
  36. equivalent to using manhattan_distance (l1), and euclidean_distance
  37. (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
  38. metric : str or callable, default='minkowski'
  39. the distance metric to use for the tree. The default metric is
  40. minkowski, and with p=2 is equivalent to the standard Euclidean
  41. metric. See the documentation of :class:`DistanceMetric` for a
  42. list of available metrics.
  43. If metric is "precomputed", X is assumed to be a distance matrix and
  44. must be square during fit. X may be a :term:`sparse graph`,
  45. in which case only "nonzero" elements may be considered neighbors.
  46. metric_params : dict, default=None
  47. Additional keyword arguments for the metric function.
  48. n_jobs : int, default=None
  49. The number of parallel jobs to run for neighbors search.
  50. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
  51. ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
  52. for more details.
  53. Doesn't affect :meth:`fit` method.
  54. Attributes
  55. ----------
  56. classes_ : array of shape (n_classes,)
  57. Class labels known to the classifier
  58. effective_metric_ : str or callble
  59. The distance metric used. It will be same as the `metric` parameter
  60. or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
  61. 'minkowski' and `p` parameter set to 2.
  62. effective_metric_params_ : dict
  63. Additional keyword arguments for the metric function. For most
  64. metrics
  65. will be same with `metric_params` parameter, but may also contain the
  66. `p` parameter value if the `effective_metric_` attribute is set to
  67. 'minkowski'.
  68. outputs_2d_ : bool
  69. False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
  70. otherwise True.
  71. Examples
  72. --------
  73. >>> X = [[0], [1], [2], [3]]
  74. >>> y = [0, 0, 1, 1]
  75. >>> from sklearn.neighbors import KNeighborsClassifier
  76. >>> neigh = KNeighborsClassifier(n_neighbors=3)
  77. >>> neigh.fit(X, y)
  78. KNeighborsClassifier(...)
  79. >>> print(neigh.predict([[1.1]]))
  80. [0]
  81. >>> print(neigh.predict_proba([[0.9]]))
  82. [[0.66666667 0.33333333]]
  83. See also
  84. --------
  85. RadiusNeighborsClassifier
  86. KNeighborsRegressor
  87. RadiusNeighborsRegressor
  88. NearestNeighbors
  89. Notes
  90. -----
  91. See :ref:`Nearest Neighbors <neighbors>` in the online
  92. documentation
  93. for a discussion of the choice of ``algorithm`` and ``leaf_size``.
  94. .. warning::
  95. Regarding the Nearest Neighbors algorithms, if it is found that two
  96. neighbors, neighbor `k+1` and `k`, have identical distances
  97. but different labels, the results will depend on the ordering of the
  98. training data.
  99. https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
  100. """
  101. -meta"> @_deprecate_positional_args
  102. def __init__(self, n_neighbors=5,
  103. *, weights='uniform', algorithm='auto', leaf_size=30,
  104. p=2, metric='minkowski', metric_params=None, n_jobs=None, **
  105. kwargs):
  106. super().__init__(n_neighbors=n_neighbors, algorithm=algorithm,
  107. leaf_size=leaf_size, metric=metric, p=p, metric_params=metric_params,
  108. n_jobs=n_jobs, **kwargs)
  109. self.weights = _check_weights(weights)
  110. def predict(self, X):
  111. """Predict the class labels for the provided data.
  112. Parameters
  113. ----------
  114. X : array-like of shape (n_queries, n_features), \
  115. or (n_queries, n_indexed) if metric == 'precomputed'
  116. Test samples.
  117. Returns
  118. -------
  119. y : ndarray of shape (n_queries,) or (n_queries, n_outputs)
  120. Class labels for each data sample.
  121. """
  122. X = check_array(X, accept_sparse='csr')
  123. neigh_dist, neigh_ind = self.kneighbors(X)
  124. classes_ = self.classes_
  125. _y = self._y
  126. if not self.outputs_2d_:
  127. _y = self._y.reshape((-1, 1))
  128. classes_ = [self.classes_]
  129. n_outputs = len(classes_)
  130. n_queries = _num_samples(X)
  131. weights = _get_weights(neigh_dist, self.weights)
  132. y_pred = np.empty((n_queries, n_outputs), dtype=classes_[0].
  133. dtype)
  134. for k, classes_k in enumerate(classes_):
  135. if weights is None:
  136. mode, _ = stats.mode(_y[neigh_indk], axis=1)
  137. else:
  138. mode, _ = weighted_mode(_y[neigh_indk], weights, axis=1)
  139. mode = np.asarray(mode.ravel(), dtype=np.intp)
  140. y_pred[:k] = classes_k.take(mode)
  141. if not self.outputs_2d_:
  142. y_pred = y_pred.ravel()
  143. return y_pred
  144. def predict_proba(self, X):
  145. """Return probability estimates for the test data X.
  146. Parameters
  147. ----------
  148. X : array-like of shape (n_queries, n_features), \
  149. or (n_queries, n_indexed) if metric == 'precomputed'
  150. Test samples.
  151. Returns
  152. -------
  153. p : ndarray of shape (n_queries, n_classes), or a list of n_outputs
  154. of such arrays if n_outputs > 1.
  155. The class probabilities of the input samples. Classes are ordered
  156. by lexicographic order.
  157. """
  158. X = check_array(X, accept_sparse='csr')
  159. neigh_dist, neigh_ind = self.kneighbors(X)
  160. classes_ = self.classes_
  161. _y = self._y
  162. if not self.outputs_2d_:
  163. _y = self._y.reshape((-1, 1))
  164. classes_ = [self.classes_]
  165. n_queries = _num_samples(X)
  166. weights = _get_weights(neigh_dist, self.weights)
  167. if weights is None:
  168. weights = np.ones_like(neigh_ind)
  169. all_rows = np.arange(X.shape[0])
  170. probabilities = []
  171. for k, classes_k in enumerate(classes_):
  172. pred_labels = _y[:k][neigh_ind]
  173. proba_k = np.zeros((n_queries, classes_k.size))
  174. a simple ':' index doesn't work right
  175. for i, idx in enumerate(pred_labels.T): loop is O(n_neighbors)
  176. proba_k[all_rowsidx] += weights[:i]
  177. normalize 'votes' into real [0,1] probabilities
  178. normalizer = proba_k.sum(axis=1)[:np.newaxis]
  179. normalizer[normalizer == 0.0] = 1.0
  180. proba_k /= normalizer
  181. probabilities.append(proba_k)
  182. if not self.outputs_2d_:
  183. probabilities = probabilities[0]
  184. return probabilities

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=2187
赞同 0
评论 0 条
joyceL0
粉丝 0 发表 8 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2672
【软件正版化】软件正版化工作要点  2637
统信UOS试玩黑神话:悟空  2532
信刻光盘安全隔离与信息交换系统  2216
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1092
grub引导程序无法找到指定设备和分区  743
WPS City Talk · 校招西安站来了!  15
金山办公2024算法挑战赛 | 报名截止日期更新  15
看到某国的寻呼机炸了,就问你用某水果手机发抖不?  14
有在找工作的IT人吗?  13
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
信创开放社区邀请他人注册的具体步骤如下 15
如何玩转信创开放社区—从小白进阶到专家 15
方德桌面操作系统 14
我有15积分有什么用? 13
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!