ML之Xgboost:利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)


Ledelta
Ledelta 2022-09-19 15:25:33 50645
分类专栏: 资讯

ML之Xgboost:利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)

目录

输出结果

设计思路

核心代码


输出结果

  1. X_train内容:
  2. [[ 3. 102. 44. ... 30.8 0.4 26. ]
  3. [ 1. 77. 56. ... 33.3 1.251 24. ]
  4. [ 9. 124. 70. ... 35.4 0.282 34. ]
  5. ...
  6. [ 0. 57. 60. ... 21.7 0.735 67. ]
  7. [ 1. 105. 58. ... 24.3 0.187 21. ]
  8. [ 8. 179. 72. ... 32.7 0.719 36. ]]
  9. y_train内容:
  10. [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1.
  11. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.
  12. 1. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1.
  13. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0.
  14. 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1.
  15. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0.
  16. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  17. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 0. 1.
  18. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0.
  19. 0. 0. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1.
  20. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.
  21. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0.
  22. 0. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 1.
  23. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.
  24. 1. 0. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.
  25. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
  26. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1.
  27. 0. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.
  28. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1.
  29. 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0.
  30. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 1.
  31. 0. 1. 0. 0. 0. 1. 1. 0. 0. 1.]

设计思路

核心代码

  1. class XGBClassifier Found at: xgboost.sklearn
  2. class XGBClassifier(XGBModel, XGBClassifierBase):
  3. pylint: disable=missing-docstring,too-many-arguments,invalid-name
  4. __doc__ = "Implementation of the scikit-learn API for XGBoost classification.\n\n" + '\n'.join
  5. (XGBModel.__doc__.split('\n')[2:])
  6. def __init__(self, max_depth=3, learning_rate=0.1,
  7. n_estimators=100, silent=True,
  8. objective="binary:logistic", booster='gbtree',
  9. n_jobs=1, nthread=None, gamma=0, min_child_weight=1,
  10. max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,
  11. reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
  12. base_score=0.5, random_state=0, seed=None, missing=None, **kwargs):
  13. super(XGBClassifier, self).__init__(max_depth, learning_rate, n_estimators, silent,
  14. objective, booster, n_jobs, nthread, gamma, min_child_weight, max_delta_step, subsample,
  15. colsample_bytree, colsample_bylevel, reg_alpha, reg_lambda, scale_pos_weight,
  16. base_score, random_state, seed, missing, **kwargs)
  17. def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
  18. early_stopping_rounds=None, verbose=True, xgb_model=None,
  19. sample_weight_eval_set=None, callbacks=
  20. pylint: disable = attribute-defined-outside-init,arguments-differ
  21. None):
  22. """
  23. Fit gradient boosting classifier
  24. Parameters
  25. ----------
  26. X : array_like
  27. Feature matrix
  28. y : array_like
  29. Labels
  30. sample_weight : array_like
  31. Weight for each instance
  32. eval_set : list, optional
  33. A list of (X, y) pairs to use as a validation set for
  34. early-stopping
  35. sample_weight_eval_set : list, optional
  36. A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
  37. instance weights on the i-th validation set.
  38. eval_metric : str, callable, optional
  39. If a str, should be a built-in evaluation metric to use. See
  40. doc/parameter.rst. If callable, a custom evaluation metric. The call
  41. signature is func(y_predicted, y_true) where y_true will be a
  42. DMatrix object such that you may need to call the get_label
  43. method. It must return a str, value pair where the str is a name
  44. for the evaluation and value is the value of the evaluation
  45. function. This objective is always minimized.
  46. early_stopping_rounds : int, optional
  47. Activates early stopping. Validation error needs to decrease at
  48. least every <early_stopping_rounds> round(s) to continue training.
  49. Requires at least one item in evals. If there's more than one,
  50. will use the last. If early stopping occurs, the model will have
  51. three additional fields: bst.best_score, bst.best_iteration and
  52. bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
  53. default value in predict method if not any other value is specified).
  54. (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
  55. and/or num_class appears in the parameters)
  56. verbose : bool
  57. If `verbose` and an evaluation set is used, writes the evaluation
  58. metric measured on the validation set to stderr.
  59. xgb_model : str
  60. file name of stored xgb model or 'Booster' instance Xgb model to be
  61. loaded before training (allows training continuation).
  62. callbacks : list of callback functions
  63. List of callback functions that are applied at end of each iteration.
  64. It is possible to use predefined callbacks by using :ref:`callback_api`.
  65. Example:
  66. .. code-block:: python
  67. [xgb.callback.reset_learning_rate(custom_rates)]
  68. """
  69. evals_result = {}
  70. self.classes_ = np.unique(y)
  71. self.n_classes_ = len(self.classes_)
  72. xgb_options = self.get_xgb_params()
  73. if callable(self.objective):
  74. obj = _objective_decorator(self.objective)
  75. Use default value. Is it really not used ?
  76. xgb_options["objective"] = "binary:logistic"
  77. else:
  78. obj = None
  79. if self.n_classes_ > 2:
  80. Switch to using a multiclass objective in the underlying XGB instance
  81. xgb_options["objective"] = "multi:softprob"
  82. xgb_options['num_class'] = self.n_classes_
  83. feval = eval_metric if callable(eval_metric) else None
  84. if eval_metric is not None:
  85. if callable(eval_metric):
  86. eval_metric = None
  87. else:
  88. xgb_options.update({"eval_metric":eval_metric})
  89. self._le = XGBLabelEncoder().fit(y)
  90. training_labels = self._le.transform(y)
  91. if eval_set is not None:
  92. if sample_weight_eval_set is None:
  93. sample_weight_eval_set = [None] * len(eval_set)
  94. evals = list(
  95. DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
  96. missing=self.missing, weight=sample_weight_eval_set[i],
  97. nthread=self.n_jobs) for
  98. i in range(len(eval_set)))
  99. nevals = len(evals)
  100. eval_names = ["validation_{}".format(i) for i in range(nevals)]
  101. evals = list(zip(evals, eval_names))
  102. else:
  103. evals = ()
  104. self._features_count = X.shape[1]
  105. if sample_weight is not None:
  106. train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight,
  107. missing=self.missing, nthread=self.n_jobs)
  108. else:
  109. train_dmatrix = DMatrix(X, label=training_labels,
  110. missing=self.missing, nthread=self.n_jobs)
  111. self._Booster = train(xgb_options, train_dmatrix, self.n_estimators,
  112. evals=evals,
  113. early_stopping_rounds=early_stopping_rounds,
  114. evals_result=evals_result, obj=obj, feval=feval,
  115. verbose_eval=verbose, xgb_model=xgb_model,
  116. callbacks=callbacks)
  117. self.objective = xgb_options["objective"]
  118. if evals_result:
  119. for val in evals_result.items():
  120. evals_result_key = list(val[1].keys())[0]
  121. evals_result[val[0]][evals_result_key] = val[1][evals_result_key]
  122. self.evals_result_ = evals_result
  123. if early_stopping_rounds is not None:
  124. self.best_score = self._Booster.best_score
  125. self.best_iteration = self._Booster.best_iteration
  126. self.best_ntree_limit = self._Booster.best_ntree_limit
  127. return self
  128. def predict(self, data, output_margin=False, ntree_limit=None, validate_features=True):
  129. """
  130. Predict with `data`.

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=3265
赞同 0
评论 0 条
LedeltaL1
粉丝 0 发表 9 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2951
【软件正版化】软件正版化工作要点  2872
统信UOS试玩黑神话:悟空  2833
信刻光盘安全隔离与信息交换系统  2728
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1261
grub引导程序无法找到指定设备和分区  1226
华为全联接大会2024丨软通动力分论坛精彩议程抢先看!  165
2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办  163
点击报名 | 京东2025校招进校行程预告  163
华为纯血鸿蒙正式版9月底见!但Mate 70的内情还得接着挖...  159
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
如何玩转信创开放社区—从小白进阶到专家 15
信创开放社区邀请他人注册的具体步骤如下 15
方德桌面操作系统 14
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
我有15积分有什么用? 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!