ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)

Ledelta 2022-09-19 15:25:33  50645

分类专栏：资讯

输出结果


X_train内容： 
[[  3.    102.     44.    ...  30.8     0.4    26.   ]
 [  1.     77.     56.    ...  33.3     1.251  24.   ]
 [  9.    124.     70.    ...  35.4     0.282  34.   ]
 ...
 [  0.     57.     60.    ...  21.7     0.735  67.   ]
 [  1.    105.     58.    ...  24.3     0.187  21.   ]
 [  8.    179.     72.    ...  32.7     0.719  36.   ]]
 
 
y_train内容： 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1.
 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.
 1. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1.
 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0.
 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1.
 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0.
 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 0. 1.
 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0.
 0. 0. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1.
 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.
 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0.
 0. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 1.
 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.
 1. 0. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.
 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1.
 0. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1.
 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0.
 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 1.
 0. 1. 0. 0. 0. 1. 1. 0. 0. 1.]

设计思路

核心代码


class XGBClassifier Found at: xgboost.sklearn
 
class XGBClassifier(XGBModel, XGBClassifierBase):
     pylint: disable=missing-docstring,too-many-arguments,invalid-name
    __doc__ = "Implementation of the scikit-learn API for XGBoost classification.\n\n" + '\n'.join
     (XGBModel.__doc__.split('\n')[2:])
    def __init__(self, max_depth=3, learning_rate=0.1, 
        n_estimators=100, silent=True, 
        objective="binary:logistic", booster='gbtree', 
        n_jobs=1, nthread=None, gamma=0, min_child_weight=1, 
        max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, 
        reg_alpha=0, reg_lambda=1, scale_pos_weight=1, 
        base_score=0.5, random_state=0, seed=None, missing=None, **kwargs):
        super(XGBClassifier, self).__init__(max_depth, learning_rate, n_estimators, silent, 
         objective, booster, n_jobs, nthread, gamma, min_child_weight, max_delta_step, subsample, 
         colsample_bytree, colsample_bylevel, reg_alpha, reg_lambda, scale_pos_weight, 
         base_score, random_state, seed, missing, **kwargs)
    
    def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None, 
        early_stopping_rounds=None, verbose=True, xgb_model=None, 
        sample_weight_eval_set=None, callbacks=
         pylint: disable = attribute-defined-outside-init,arguments-differ
        None):
        """
        Fit gradient boosting classifier
        Parameters
        ----------
        X : array_like
            Feature matrix
        y : array_like
            Labels
        sample_weight : array_like
            Weight for each instance
        eval_set : list, optional
            A list of (X, y) pairs to use as a validation set for
            early-stopping
        sample_weight_eval_set : list, optional
            A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
            instance weights on the i-th validation set.
        eval_metric : str, callable, optional
            If a str, should be a built-in evaluation metric to use. See
            doc/parameter.rst. If callable, a custom evaluation metric. The call
            signature is func(y_predicted, y_true) where y_true will be a
            DMatrix object such that you may need to call the get_label
            method. It must return a str, value pair where the str is a name
            for the evaluation and value is the value of the evaluation
            function. This objective is always minimized.
        early_stopping_rounds : int, optional
            Activates early stopping. Validation error needs to decrease at
            least every <early_stopping_rounds> round(s) to continue training.
            Requires at least one item in evals. If there's more than one,
            will use the last. If early stopping occurs, the model will have
            three additional fields: bst.best_score, bst.best_iteration and
            bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
            default value in predict method if not any other value is specified).
            (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
            and/or num_class appears in the parameters)
        verbose : bool
            If `verbose` and an evaluation set is used, writes the evaluation
            metric measured on the validation set to stderr.
        xgb_model : str
            file name of stored xgb model or 'Booster' instance Xgb model to be
            loaded before training (allows training continuation).
        callbacks : list of callback functions
            List of callback functions that are applied at end of each iteration.
            It is possible to use predefined callbacks by using :ref:`callback_api`.
            Example:
            .. code-block:: python
                [xgb.callback.reset_learning_rate(custom_rates)]
        """
        evals_result = {}
        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)
        xgb_options = self.get_xgb_params()
        if callable(self.objective):
            obj = _objective_decorator(self.objective)
         Use default value. Is it really not used ?
            xgb_options["objective"] = "binary:logistic"
        else:
            obj = None
        if self.n_classes_ > 2:
         Switch to using a multiclass objective in the underlying XGB instance
            xgb_options["objective"] = "multi:softprob"
            xgb_options['num_class'] = self.n_classes_
        feval = eval_metric if callable(eval_metric) else None
        if eval_metric is not None:
            if callable(eval_metric):
                eval_metric = None
            else:
                xgb_options.update({"eval_metric":eval_metric})
        self._le = XGBLabelEncoder().fit(y)
        training_labels = self._le.transform(y)
        if eval_set is not None:
            if sample_weight_eval_set is None:
                sample_weight_eval_set = [None] * len(eval_set)
            evals = list(
                DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]), 
                    missing=self.missing, weight=sample_weight_eval_set[i], 
                    nthread=self.n_jobs) for 
                i in range(len(eval_set)))
            nevals = len(evals)
            eval_names = ["validation_{}".format(i) for i in range(nevals)]
            evals = list(zip(evals, eval_names))
        else:
            evals = ()
        self._features_count = X.shape[1]
        if sample_weight is not None:
            train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight, 
                missing=self.missing, nthread=self.n_jobs)
        else:
            train_dmatrix = DMatrix(X, label=training_labels, 
                missing=self.missing, nthread=self.n_jobs)
        self._Booster = train(xgb_options, train_dmatrix, self.n_estimators, 
            evals=evals, 
            early_stopping_rounds=early_stopping_rounds, 
            evals_result=evals_result, obj=obj, feval=feval, 
            verbose_eval=verbose, xgb_model=xgb_model, 
            callbacks=callbacks)
        self.objective = xgb_options["objective"]
        if evals_result:
            for val in evals_result.items():
                evals_result_key = list(val[1].keys())[0]
                evals_result[val[0]][evals_result_key] = val[1][evals_result_key]
            
            self.evals_result_ = evals_result
        if early_stopping_rounds is not None:
            self.best_score = self._Booster.best_score
            self.best_iteration = self._Booster.best_iteration
            self.best_ntree_limit = self._Booster.best_ntree_limit
        return self
    
    def predict(self, data, output_margin=False, ntree_limit=None, validate_features=True):
        """
        Predict with `data`.

网站声明：如果转载，请联系本站管理员。否则一切后果自行承担。

本文链接：https://www.xckfsq.com/news/show.html?id=3265

赞同 0

评论 0 条

ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)

输出结果

设计思路

核心代码

相关文章

关注我们