ML之Xgboost:利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)
目录
- X_train内容:
- [[ 3. 102. 44. ... 30.8 0.4 26. ]
- [ 1. 77. 56. ... 33.3 1.251 24. ]
- [ 9. 124. 70. ... 35.4 0.282 34. ]
- ...
- [ 0. 57. 60. ... 21.7 0.735 67. ]
- [ 1. 105. 58. ... 24.3 0.187 21. ]
- [ 8. 179. 72. ... 32.7 0.719 36. ]]
-
-
- y_train内容:
- [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1.
- 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.
- 1. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1.
- 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0.
- 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1.
- 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0.
- 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
- 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 0. 1.
- 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0.
- 0. 0. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1.
- 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.
- 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0.
- 0. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 1.
- 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.
- 1. 0. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.
- 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
- 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1.
- 0. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.
- 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1.
- 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0.
- 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 1.
- 0. 1. 0. 0. 0. 1. 1. 0. 0. 1.]
- class XGBClassifier Found at: xgboost.sklearn
-
- class XGBClassifier(XGBModel, XGBClassifierBase):
- pylint: disable=missing-docstring,too-many-arguments,invalid-name
- __doc__ = "Implementation of the scikit-learn API for XGBoost classification.\n\n" + '\n'.join
- (XGBModel.__doc__.split('\n')[2:])
- def __init__(self, max_depth=3, learning_rate=0.1,
- n_estimators=100, silent=True,
- objective="binary:logistic", booster='gbtree',
- n_jobs=1, nthread=None, gamma=0, min_child_weight=1,
- max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,
- reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
- base_score=0.5, random_state=0, seed=None, missing=None, **kwargs):
- super(XGBClassifier, self).__init__(max_depth, learning_rate, n_estimators, silent,
- objective, booster, n_jobs, nthread, gamma, min_child_weight, max_delta_step, subsample,
- colsample_bytree, colsample_bylevel, reg_alpha, reg_lambda, scale_pos_weight,
- base_score, random_state, seed, missing, **kwargs)
-
- def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
- early_stopping_rounds=None, verbose=True, xgb_model=None,
- sample_weight_eval_set=None, callbacks=
- pylint: disable = attribute-defined-outside-init,arguments-differ
- None):
- """
- Fit gradient boosting classifier
- Parameters
- ----------
- X : array_like
- Feature matrix
- y : array_like
- Labels
- sample_weight : array_like
- Weight for each instance
- eval_set : list, optional
- A list of (X, y) pairs to use as a validation set for
- early-stopping
- sample_weight_eval_set : list, optional
- A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
- instance weights on the i-th validation set.
- eval_metric : str, callable, optional
- If a str, should be a built-in evaluation metric to use. See
- doc/parameter.rst. If callable, a custom evaluation metric. The call
- signature is func(y_predicted, y_true) where y_true will be a
- DMatrix object such that you may need to call the get_label
- method. It must return a str, value pair where the str is a name
- for the evaluation and value is the value of the evaluation
- function. This objective is always minimized.
- early_stopping_rounds : int, optional
- Activates early stopping. Validation error needs to decrease at
- least every <early_stopping_rounds> round(s) to continue training.
- Requires at least one item in evals. If there's more than one,
- will use the last. If early stopping occurs, the model will have
- three additional fields: bst.best_score, bst.best_iteration and
- bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
- default value in predict method if not any other value is specified).
- (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
- and/or num_class appears in the parameters)
- verbose : bool
- If `verbose` and an evaluation set is used, writes the evaluation
- metric measured on the validation set to stderr.
- xgb_model : str
- file name of stored xgb model or 'Booster' instance Xgb model to be
- loaded before training (allows training continuation).
- callbacks : list of callback functions
- List of callback functions that are applied at end of each iteration.
- It is possible to use predefined callbacks by using :ref:`callback_api`.
- Example:
- .. code-block:: python
- [xgb.callback.reset_learning_rate(custom_rates)]
- """
- evals_result = {}
- self.classes_ = np.unique(y)
- self.n_classes_ = len(self.classes_)
- xgb_options = self.get_xgb_params()
- if callable(self.objective):
- obj = _objective_decorator(self.objective)
- Use default value. Is it really not used ?
- xgb_options["objective"] = "binary:logistic"
- else:
- obj = None
- if self.n_classes_ > 2:
- Switch to using a multiclass objective in the underlying XGB instance
- xgb_options["objective"] = "multi:softprob"
- xgb_options['num_class'] = self.n_classes_
- feval = eval_metric if callable(eval_metric) else None
- if eval_metric is not None:
- if callable(eval_metric):
- eval_metric = None
- else:
- xgb_options.update({"eval_metric":eval_metric})
- self._le = XGBLabelEncoder().fit(y)
- training_labels = self._le.transform(y)
- if eval_set is not None:
- if sample_weight_eval_set is None:
- sample_weight_eval_set = [None] * len(eval_set)
- evals = list(
- DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
- missing=self.missing, weight=sample_weight_eval_set[i],
- nthread=self.n_jobs) for
- i in range(len(eval_set)))
- nevals = len(evals)
- eval_names = ["validation_{}".format(i) for i in range(nevals)]
- evals = list(zip(evals, eval_names))
- else:
- evals = ()
- self._features_count = X.shape[1]
- if sample_weight is not None:
- train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight,
- missing=self.missing, nthread=self.n_jobs)
- else:
- train_dmatrix = DMatrix(X, label=training_labels,
- missing=self.missing, nthread=self.n_jobs)
- self._Booster = train(xgb_options, train_dmatrix, self.n_estimators,
- evals=evals,
- early_stopping_rounds=early_stopping_rounds,
- evals_result=evals_result, obj=obj, feval=feval,
- verbose_eval=verbose, xgb_model=xgb_model,
- callbacks=callbacks)
- self.objective = xgb_options["objective"]
- if evals_result:
- for val in evals_result.items():
- evals_result_key = list(val[1].keys())[0]
- evals_result[val[0]][evals_result_key] = val[1][evals_result_key]
-
- self.evals_result_ = evals_result
- if early_stopping_rounds is not None:
- self.best_score = self._Booster.best_score
- self.best_iteration = self._Booster.best_iteration
- self.best_ntree_limit = self._Booster.best_ntree_limit
- return self
-
- def predict(self, data, output_margin=False, ntree_limit=None, validate_features=True):
- """
- Predict with `data`.
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!