ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测
目录
基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测
相关文章
ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测
ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测实现
- (149, 5)
- 5.1 3.5 1.4 0.2 Iris-setosa
- 0 4.9 3.0 1.4 0.2 Iris-setosa
- 1 4.7 3.2 1.3 0.2 Iris-setosa
- 2 4.6 3.1 1.5 0.2 Iris-setosa
- 3 5.0 3.6 1.4 0.2 Iris-setosa
- 4 5.4 3.9 1.7 0.4 Iris-setosa
- (149, 5)
- Sepal_Length Sepal_Width Petal_Length Petal_Width type
- 0 4.5 2.3 1.3 0.3 Iris-setosa
- 1 6.3 2.5 5.0 1.9 Iris-virginica
- 2 5.1 3.4 1.5 0.2 Iris-setosa
- 3 6.3 3.3 6.0 2.5 Iris-virginica
- 4 6.8 3.2 5.9 2.3 Iris-virginica
- 切分点: 29
- label_classes: ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
- kNNDIY模型预测,基于原数据: 0.95
- kNN模型预测,基于原数据预测: [0.96666667 1. 0.93333333 1. 0.93103448]
- kNN模型预测,原数据PCA处理后: [1. 0.96 0.95918367]
- class KNeighborsClassifier Found at: sklearn.neighbors._classification
-
- class KNeighborsClassifier(NeighborsBase, KNeighborsMixin,
- SupervisedIntegerMixin, ClassifierMixin):
- """Classifier implementing the k-nearest neighbors vote.
-
- Read more in the :ref:`User Guide <classification>`.
-
- Parameters
- ----------
- n_neighbors : int, default=5
- Number of neighbors to use by default for :meth:`kneighbors` queries.
-
- weights : {'uniform', 'distance'} or callable, default='uniform'
- weight function used in prediction. Possible values:
-
- - 'uniform' : uniform weights. All points in each neighborhood
- are weighted equally.
- - 'distance' : weight points by the inverse of their distance.
- in this case, closer neighbors of a query point will have a
- greater influence than neighbors which are further away.
- - [callable] : a user-defined function which accepts an
- array of distances, and returns an array of the same shape
- containing the weights.
-
- algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
- Algorithm used to compute the nearest neighbors:
-
- - 'ball_tree' will use :class:`BallTree`
- - 'kd_tree' will use :class:`KDTree`
- - 'brute' will use a brute-force search.
- - 'auto' will attempt to decide the most appropriate algorithm
- based on the values passed to :meth:`fit` method.
-
- Note: fitting on sparse input will override the setting of
- this parameter, using brute force.
-
- leaf_size : int, default=30
- Leaf size passed to BallTree or KDTree. This can affect the
- speed of the construction and query, as well as the memory
- required to store the tree. The optimal value depends on the
- nature of the problem.
-
- p : int, default=2
- Power parameter for the Minkowski metric. When p = 1, this is
- equivalent to using manhattan_distance (l1), and euclidean_distance
- (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
-
- metric : str or callable, default='minkowski'
- the distance metric to use for the tree. The default metric is
- minkowski, and with p=2 is equivalent to the standard Euclidean
- metric. See the documentation of :class:`DistanceMetric` for a
- list of available metrics.
- If metric is "precomputed", X is assumed to be a distance matrix and
- must be square during fit. X may be a :term:`sparse graph`,
- in which case only "nonzero" elements may be considered neighbors.
-
- metric_params : dict, default=None
- Additional keyword arguments for the metric function.
-
- n_jobs : int, default=None
- The number of parallel jobs to run for neighbors search.
- ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
- ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
- for more details.
- Doesn't affect :meth:`fit` method.
-
- Attributes
- ----------
- classes_ : array of shape (n_classes,)
- Class labels known to the classifier
-
- effective_metric_ : str or callble
- The distance metric used. It will be same as the `metric` parameter
- or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
- 'minkowski' and `p` parameter set to 2.
-
- effective_metric_params_ : dict
- Additional keyword arguments for the metric function. For most
- metrics
- will be same with `metric_params` parameter, but may also contain the
- `p` parameter value if the `effective_metric_` attribute is set to
- 'minkowski'.
-
- outputs_2d_ : bool
- False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
- otherwise True.
-
- Examples
- --------
- >>> X = [[0], [1], [2], [3]]
- >>> y = [0, 0, 1, 1]
- >>> from sklearn.neighbors import KNeighborsClassifier
- >>> neigh = KNeighborsClassifier(n_neighbors=3)
- >>> neigh.fit(X, y)
- KNeighborsClassifier(...)
- >>> print(neigh.predict([[1.1]]))
- [0]
- >>> print(neigh.predict_proba([[0.9]]))
- [[0.66666667 0.33333333]]
-
- See also
- --------
- RadiusNeighborsClassifier
- KNeighborsRegressor
- RadiusNeighborsRegressor
- NearestNeighbors
-
- Notes
- -----
- See :ref:`Nearest Neighbors <neighbors>` in the online
- documentation
- for a discussion of the choice of ``algorithm`` and ``leaf_size``.
-
- .. warning::
-
- Regarding the Nearest Neighbors algorithms, if it is found that two
- neighbors, neighbor `k+1` and `k`, have identical distances
- but different labels, the results will depend on the ordering of the
- training data.
-
- https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
- """
- -meta"> @_deprecate_positional_args
- def __init__(self, n_neighbors=5,
- *, weights='uniform', algorithm='auto', leaf_size=30,
- p=2, metric='minkowski', metric_params=None, n_jobs=None, **
- kwargs):
- super().__init__(n_neighbors=n_neighbors, algorithm=algorithm,
- leaf_size=leaf_size, metric=metric, p=p, metric_params=metric_params,
- n_jobs=n_jobs, **kwargs)
- self.weights = _check_weights(weights)
-
- def predict(self, X):
- """Predict the class labels for the provided data.
- Parameters
- ----------
- X : array-like of shape (n_queries, n_features), \
- or (n_queries, n_indexed) if metric == 'precomputed'
- Test samples.
- Returns
- -------
- y : ndarray of shape (n_queries,) or (n_queries, n_outputs)
- Class labels for each data sample.
- """
- X = check_array(X, accept_sparse='csr')
- neigh_dist, neigh_ind = self.kneighbors(X)
- classes_ = self.classes_
- _y = self._y
- if not self.outputs_2d_:
- _y = self._y.reshape((-1, 1))
- classes_ = [self.classes_]
- n_outputs = len(classes_)
- n_queries = _num_samples(X)
- weights = _get_weights(neigh_dist, self.weights)
- y_pred = np.empty((n_queries, n_outputs), dtype=classes_[0].
- dtype)
- for k, classes_k in enumerate(classes_):
- if weights is None:
- mode, _ = stats.mode(_y[neigh_indk], axis=1)
- else:
- mode, _ = weighted_mode(_y[neigh_indk], weights, axis=1)
- mode = np.asarray(mode.ravel(), dtype=np.intp)
- y_pred[:k] = classes_k.take(mode)
-
- if not self.outputs_2d_:
- y_pred = y_pred.ravel()
- return y_pred
-
- def predict_proba(self, X):
- """Return probability estimates for the test data X.
- Parameters
- ----------
- X : array-like of shape (n_queries, n_features), \
- or (n_queries, n_indexed) if metric == 'precomputed'
- Test samples.
- Returns
- -------
- p : ndarray of shape (n_queries, n_classes), or a list of n_outputs
- of such arrays if n_outputs > 1.
- The class probabilities of the input samples. Classes are ordered
- by lexicographic order.
- """
- X = check_array(X, accept_sparse='csr')
- neigh_dist, neigh_ind = self.kneighbors(X)
- classes_ = self.classes_
- _y = self._y
- if not self.outputs_2d_:
- _y = self._y.reshape((-1, 1))
- classes_ = [self.classes_]
- n_queries = _num_samples(X)
- weights = _get_weights(neigh_dist, self.weights)
- if weights is None:
- weights = np.ones_like(neigh_ind)
- all_rows = np.arange(X.shape[0])
- probabilities = []
- for k, classes_k in enumerate(classes_):
- pred_labels = _y[:k][neigh_ind]
- proba_k = np.zeros((n_queries, classes_k.size))
- a simple ':' index doesn't work right
- for i, idx in enumerate(pred_labels.T): loop is O(n_neighbors)
- proba_k[all_rowsidx] += weights[:i]
-
- normalize 'votes' into real [0,1] probabilities
- normalizer = proba_k.sum(axis=1)[:np.newaxis]
- normalizer[normalizer == 0.0] = 1.0
- proba_k /= normalizer
- probabilities.append(proba_k)
-
- if not self.outputs_2d_:
- probabilities = probabilities[0]
- return probabilities
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!