ML:LGBMClassifier、XGBClassifier和CatBoostClassifier的feature_importances_计算方法源代码解读之详细攻略
目录
LGBMClassifier、XGBClassifier和CatBoostClassifier的feature_importances_计算方法源代码解读之详细攻略
LGBMClassifier.feature_importances_函数,采用split方式计算
LGBMC.feature_importances_ | importance_type='split', def feature_importances_(self): Note |
@property def booster_(self): """Get the underlying lightgbm Booster of this model.""" if self._Booster is None: raise LGBMNotFittedError('No booster found. Need to call fit beforehand.') return self._Booster | |
def num_feature(self): Returns | |
self.booster_.feature_importance self.importance_type) | def feature_importance(self, importance_type='split', iteration=None): Parameters Returns |
XGBClassifier.feature_importances_函数,采用weight方式计算
XGBC. feature_importances_ | importance_type="weight" 默认 gain、weight、cover、total_gain、total_cover def feature_importances_(self): .. note:: Feature importance is defined only for tree boosters Feature importance is only defined when the decision tree model is chosen as base learner (`booster=gbtree`). It is not defined for other base learner types, such as linear learners .仅当选择决策树模型作为基础学习者(`booster=gbtree`)时,才定义特征重要性。它不适用于其他基本学习者类型,例如线性学习者(`booster=gblinear`). Returns """ |
get_score | def get_score(self, fmap='', importance_type='weight'):
.. note:: Feature importance is defined only for tree boosters Feature importance is only defined when the decision tree model is chosen as base learner (`booster=gbtree`). It is not defined for other base learner types, such as linear learners (`booster=gblinear`). Parameters allowed_importance_types = ['weight', 'gain', 'cover', 'total_gain', 'total_cover'] if it's weight, then omap stores the number of missing values fmap = {} extract feature name from string between [] if fid not in fmap: return fmap else: trees = self.get_dump(fmap, with_stats=True) importance_type += '=' look for the closing bracket, extract only info within that bracket extract gain or cover from string after closing bracket extract feature name from string before closing bracket if fid not in fmap: calculate average value (gain/cover) for each feature return gmap |
CatBoostClassifier.feature_importances_函数,采用is_groupwise_metric(loss)方式计算
CatC.feature_importances_ | def feature_importances_(self): loss = self._object._get_loss_function_name() if loss and is_groupwise_metric(loss): return np.array(getattr(self, "_loss_value_change", None)) else: return np.array(getattr(self, "_prediction_values_change", None)) |
CatBoost简单地利用了在正常情况下(当我们包括特征时)使用模型获得的度量(损失函数)与不使用该特征的模型(模型建立大约与此功能从所有的树在合奏)。差别越大,特征就越重要。
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!