sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略


小李
小李 2022-09-19 13:26:12 64648
分类专栏: 资讯

sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略

目录

SelectFromModel函数的简介

1、使用SelectFromModel和LassoCV进行特征选择

2、L1-based feature selection

3、Tree-based feature selection

SelectFromModel函数的使用方法

1、SelectFromModel的原生代码


SelectFromModel函数的简介

        SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.
        SelectFromModel是一个元转换器,可以与任何在拟合后具有coef_或feature_importances_属性的estimator 一起使用。如果相应的coef_或feature_importances_值低于提供的阈值参数,则认为这些特性不重要并将其删除。除了以数字方式指定阈值外,还有使用字符串参数查找阈值的内置启发式方法。可用的试探法是“平均数”、“中位数”和这些数的浮点倍数,如“0.1*平均数”。

官网APIhttps://scikit-learn.org/stable/modules/feature_selection.htmlfeature-selection-using-selectfrommodel

  """Meta-transformer for selecting features based on importance weights.    .. versionadded:: 0.17

用于根据重要性权重来选择特征的元转换器

. .加入在0.17版本::

      Parameters
    ----------
    estimator : object
    The base estimator from which the transformer is built.
    This can be both a fitted (if ``prefit`` is set to True)
    or a non-fitted estimator. The estimator must have either a
    ``feature_importances_`` or ``coef_`` attribute after fitting.
    
    threshold : string, float, optional default None
    The threshold value to use for feature selection. Features whose
    importance is greater or equal are kept while the others are
    discarded. If "median" (resp. "mean"), then the ``threshold`` value is
    the median (resp. the mean) of the feature importances. A scaling
    factor (e.g., "1.25*mean") may also be used. If None and if the
    estimator has a parameter penalty set to l1, either explicitly
    or implicitly (e.g, Lasso), the threshold used is 1e-5.
    Otherwise, "mean" is used by default.
    
    prefit : bool, default False
    Whether a prefit model is expected to be passed into the constructor
    directly or not. If True, ``transform`` must be called directly
    and SelectFromModel cannot be used with ``cross_val_score``,
    ``GridSearchCV`` and similar utilities that clone the estimator.
    Otherwise train the model using ``fit`` and then ``transform`` to do
    feature selection.
    
    norm_order : non-zero int, inf, -inf, default 1
    Order of the norm used to filter the vectors of coefficients below
    ``threshold`` in the case where the ``coef_`` attribute of the
    estimator is of dimension 2.

参数
estimator :对象类型,
建立转换的基本estimator 。
这可以是一个拟合(如果' ' prefit ' '被设置为True) 或者非拟合的estimator。在拟合之后,estimator 必须有' ' feature_importances_ ' '或' ' coef_ ' '属性。


threshold :字符串,浮点类型,可选的,默认无

用于特征选择的阈值。重要性大于或等于的特征被保留,其他特征被丢弃。如果“中位数”(分别地。(“均值”),则“阈值”为中位数(resp,特征重要性的平均值)。也可以使用比例因子(例如“1.25*平均值”)。如果没有,并且估计量有一个参数惩罚设置为l1,不管是显式的还是隐式的(例如Lasso),阈值为1e-5。否则,默认使用“mean”。


prefit: bool,默认为False

prefit模型是否应直接传递给构造函数。如果为True,则必须直接调用“transform”,SelectFromModel不能与cross_val_score 、GridSearchCV以及类似的克隆估计器的实用程序一起使用。否则,使用' ' fit ' '和' ' transform ' '训练模型进行特征选择。


norm_order:非零整型,inf, -inf,默认值1
在estimator的' coef_ 属性为2维的情况下,用于过滤' '阈值' '以下系数的向量的范数的顺序。

    Attributes
    ----------
    estimator_ : an estimator
    The base estimator from which the transformer is built.
    This is stored only when a non-fitted estimator is passed to the
    ``SelectFromModel``, i.e when prefit is False.
    
    threshold_ : float
    The threshold value used for feature selection.
    """

属性
estimator_:一个estimator。

建立转换器的基estimator,只有在将非拟合估计量传递给SelectFromModel 时,才会存储它。当prefit 为假时。

threshold_ :浮点类型
用于特征选择的阈值。

1、使用SelectFromModel和LassoCV进行特征选择

  1. Author: Manoj Kumar <mks542@nyu.edu>
  2. License: BSD 3 clause
  3. print(__doc__)
  4. import matplotlib.pyplot as plt
  5. import numpy as np
  6. from sklearn.datasets import load_boston
  7. from sklearn.feature_selection import SelectFromModel
  8. from sklearn.linear_model import LassoCV
  9. Load the boston dataset.
  10. X, y = load_boston(return_X_y=True)
  11. We use the base estimator LassoCV since the L1 norm promotes sparsity of features.
  12. clf = LassoCV()
  13. Set a minimum threshold of 0.25
  14. sfm = SelectFromModel(clf, threshold=0.25)
  15. sfm.fit(X, y)
  16. n_features = sfm.transform(X).shape[1]
  17. Reset the threshold till the number of features equals two.
  18. Note that the attribute can be set directly instead of repeatedly
  19. fitting the metatransformer.
  20. while n_features > 2:
  21. sfm.threshold += 0.1
  22. X_transform = sfm.transform(X)
  23. n_features = X_transform.shape[1]
  24. Plot the selected two features from X.
  25. plt.title(
  26. "Features selected from Boston using SelectFromModel with "
  27. "threshold %0.3f." % sfm.threshold)
  28. feature1 = X_transform[:, 0]
  29. feature2 = X_transform[:, 1]
  30. plt.plot(feature1, feature2, 'r.')
  31. plt.xlabel("Feature number 1")
  32. plt.ylabel("Feature number 2")
  33. plt.ylim([np.min(feature2), np.max(feature2)])
  34. plt.show()

2、L1-based feature selection

  1. -meta">>>> from sklearn.svm import LinearSVC
  2. -meta">>>> from sklearn.datasets import load_iris
  3. -meta">>>> from sklearn.feature_selection import SelectFromModel
  4. -meta">>>> X, y = load_iris(return_X_y=True)
  5. -meta">>>> X.shape
  6. (150, 4)
  7. -meta">>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
  8. -meta">>>> model = SelectFromModel(lsvc, prefit=True)
  9. -meta">>>> X_new = model.transform(X)
  10. -meta">>>> X_new.shape
  11. (150, 3)

3、Tree-based feature selection

  1. -meta">>>> from sklearn.ensemble import ExtraTreesClassifier
  2. -meta">>>> from sklearn.datasets import load_iris
  3. -meta">>>> from sklearn.feature_selection import SelectFromModel
  4. -meta">>>> X, y = load_iris(return_X_y=True)
  5. -meta">>>> X.shape
  6. (150, 4)
  7. -meta">>>> clf = ExtraTreesClassifier(n_estimators=50)
  8. -meta">>>> clf = clf.fit(X, y)
  9. -meta">>>> clf.feature_importances_
  10. array([ 0.04..., 0.05..., 0.4..., 0.4...])
  11. -meta">>>> model = SelectFromModel(clf, prefit=True)
  12. -meta">>>> X_new = model.transform(X)
  13. -meta">>>> X_new.shape
  14. (150, 2)

SelectFromModel函数的使用方法

1、SelectFromModel的原生代码

  1. class SelectFromModel Found at: sklearn.feature_selection.from_model
  2. class SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):
  3. """Meta-transformer for selecting features based on importance weights.
  4. .. versionadded:: 0.17
  5. Parameters
  6. ----------
  7. estimator : object
  8. The base estimator from which the transformer is built.
  9. This can be both a fitted (if ``prefit`` is set to True)
  10. or a non-fitted estimator. The estimator must have either a
  11. ``feature_importances_`` or ``coef_`` attribute after fitting.
  12. threshold : string, float, optional default None
  13. The threshold value to use for feature selection. Features whose
  14. importance is greater or equal are kept while the others are
  15. discarded. If "median" (resp. "mean"), then the ``threshold`` value is
  16. the median (resp. the mean) of the feature importances. A scaling
  17. factor (e.g., "1.25*mean") may also be used. If None and if the
  18. estimator has a parameter penalty set to l1, either explicitly
  19. or implicitly (e.g, Lasso), the threshold used is 1e-5.
  20. Otherwise, "mean" is used by default.
  21. prefit : bool, default False
  22. Whether a prefit model is expected to be passed into the constructor
  23. directly or not. If True, ``transform`` must be called directly
  24. and SelectFromModel cannot be used with ``cross_val_score``,
  25. ``GridSearchCV`` and similar utilities that clone the estimator.
  26. Otherwise train the model using ``fit`` and then ``transform`` to do
  27. feature selection.
  28. norm_order : non-zero int, inf, -inf, default 1
  29. Order of the norm used to filter the vectors of coefficients below
  30. ``threshold`` in the case where the ``coef_`` attribute of the
  31. estimator is of dimension 2.
  32. Attributes
  33. ----------
  34. estimator_ : an estimator
  35. The base estimator from which the transformer is built.
  36. This is stored only when a non-fitted estimator is passed to the
  37. ``SelectFromModel``, i.e when prefit is False.
  38. threshold_ : float
  39. The threshold value used for feature selection.
  40. """
  41. def __init__(self, estimator, threshold=None, prefit=False,
  42. norm_order=1):
  43. self.estimator = estimator
  44. self.threshold = threshold
  45. self.prefit = prefit
  46. self.norm_order = norm_order
  47. def _get_support_mask(self):
  48. SelectFromModel can directly call on transform.
  49. if self.prefit:
  50. estimator = self.estimator
  51. elif hasattr(self, 'estimator_'):
  52. estimator = self.estimator_
  53. else:
  54. raise ValueError(
  55. 'Either fit SelectFromModel before transform or set "prefit='
  56. 'True" and pass a fitted estimator to the constructor.')
  57. scores = _get_feature_importances(estimator, self.norm_order)
  58. threshold = _calculate_threshold(estimator, scores, self.threshold)
  59. return scores >= threshold
  60. def fit(self, X, y=None, **fit_params):
  61. """Fit the SelectFromModel meta-transformer.
  62. Parameters
  63. ----------
  64. X : array-like of shape (n_samples, n_features)
  65. The training input samples.
  66. y : array-like, shape (n_samples,)
  67. The target values (integers that correspond to classes in
  68. classification, real numbers in regression).
  69. **fit_params : Other estimator specific parameters
  70. Returns
  71. -------
  72. self : object
  73. Returns self.
  74. """
  75. if self.prefit:
  76. raise NotFittedError(
  77. "Since 'prefit=True', call transform directly")
  78. self.estimator_ = clone(self.estimator)
  79. self.estimator_.fit(X, y, **fit_params)
  80. return self
  81. -meta"> @property
  82. def threshold_(self):
  83. scores = _get_feature_importances(self.estimator_, self.norm_order)
  84. return _calculate_threshold(self.estimator, scores, self.threshold)
  85. -meta"> @if_delegate_has_method('estimator')
  86. def partial_fit(self, X, y=None, **fit_params):
  87. """Fit the SelectFromModel meta-transformer only once.
  88. Parameters
  89. ----------
  90. X : array-like of shape (n_samples, n_features)
  91. The training input samples.
  92. y : array-like, shape (n_samples,)
  93. The target values (integers that correspond to classes in
  94. classification, real numbers in regression).
  95. **fit_params : Other estimator specific parameters
  96. Returns
  97. -------
  98. self : object
  99. Returns self.
  100. """
  101. if self.prefit:
  102. raise NotFittedError(
  103. "Since 'prefit=True', call transform directly")
  104. if not hasattr(self, "estimator_"):
  105. self.estimator_ = clone(self.estimator)
  106. self.estimator_.partial_fit(X, y, **fit_params)
  107. return self
文章知识点与官方知识档案匹配,可进一步学习相关知识
Python入门技能树预备知识Python简介123796 人正在系统学习中

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=2654
赞同 0
评论 0 条
小李L2
粉丝 0 发表 143 + 关注 私信
上周热门
银河麒麟添加网络打印机时,出现“client-error-not-possible”错误提示  1450
银河麒麟打印带有图像的文档时出错  1367
银河麒麟添加打印机时,出现“server-error-internal-error”  1153
统信桌面专业版【如何查询系统安装时间】  1075
统信操作系统各版本介绍  1072
统信桌面专业版【全盘安装UOS系统】介绍  1030
麒麟系统也能完整体验微信啦!  987
统信【启动盘制作工具】使用介绍  629
统信桌面专业版【一个U盘做多个系统启动盘】的方法  577
信刻全自动档案蓝光光盘检测一体机  487
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
信创开放社区邀请他人注册的具体步骤如下 15
如何玩转信创开放社区—从小白进阶到专家 15
方德桌面操作系统 14
我有15积分有什么用? 13
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

添加我为好友,拉您入交流群!

请使用微信扫一扫!