ML之Xgboost:利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测


西牛粗犷
西牛粗犷 2022-09-19 15:25:17 50075
分类专栏: 资讯

ML之Xgboost:利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测

目录

输出结果

设计思路

核心代码


输出结果

设计思路

核心代码

  1. grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)
  2. grid_result = grid_search.fit(X, Y)
  3. param_grid = dict(learning_rate=learning_rate)
  4. kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
  1. class GridSearchCV(-title class_ inherited__">BaseSearchCV):
  2. """Exhaustive search over specified parameter values for an estimator.
  3. Important members are fit, predict.
  4. GridSearchCV implements a "fit" and a "score" method.
  5. It also implements "predict", "predict_proba", "decision_function",
  6. "transform" and "inverse_transform" if they are implemented in the
  7. estimator used.
  8. The parameters of the estimator used to apply these methods are
  9. optimized
  10. by cross-validated grid-search over a parameter grid.
  11. Read more in the :ref:`User Guide <grid_search>`.
  12. Parameters
  13. ----------
  14. estimator : estimator object.
  15. This is assumed to implement the scikit-learn estimator interface.
  16. Either estimator needs to provide a ``score`` function,
  17. or ``scoring`` must be passed.
  18. param_grid : dict or list of dictionaries
  19. Dictionary with parameters names (string) as keys and lists of
  20. parameter settings to try as values, or a list of such
  21. dictionaries, in which case the grids spanned by each dictionary
  22. in the list are explored. This enables searching over any sequence
  23. of parameter settings.
  24. scoring : string, callable, list/tuple, dict or None, default: None
  25. A single string (see :ref:`scoring_parameter`) or a callable
  26. (see :ref:`scoring`) to evaluate the predictions on the test set.
  27. For evaluating multiple metrics, either give a list of (unique) strings
  28. or a dict with names as keys and callables as values.
  29. NOTE that when using custom scorers, each scorer should return a
  30. single
  31. value. Metric functions returning a list/array of values can be wrapped
  32. into multiple scorers that return one value each.
  33. See :ref:`multimetric_grid_search` for an example.
  34. If None, the estimator's default scorer (if available) is used.
  35. fit_params : dict, optional
  36. Parameters to pass to the fit method.
  37. .. deprecated:: 0.19
  38. ``fit_params`` as a constructor argument was deprecated in version
  39. 0.19 and will be removed in version 0.21. Pass fit parameters to
  40. the ``fit`` method instead.
  41. n_jobs : int, default=1
  42. Number of jobs to run in parallel.
  43. pre_dispatch : int, or string, optional
  44. Controls the number of jobs that get dispatched during parallel
  45. execution. Reducing this number can be useful to avoid an
  46. explosion of memory consumption when more jobs get dispatched
  47. than CPUs can process. This parameter can be:
  48. - None, in which case all the jobs are immediately
  49. created and spawned. Use this for lightweight and
  50. fast-running jobs, to avoid delays due to on-demand
  51. spawning of the jobs
  52. - An int, giving the exact number of total jobs that are
  53. spawned
  54. - A string, giving an expression as a function of n_jobs,
  55. as in '2*n_jobs'
  56. iid : boolean, default=True
  57. If True, the data is assumed to be identically distributed across
  58. the folds, and the loss minimized is the total loss per sample,
  59. and not the mean loss across the folds.
  60. cv : int, cross-validation generator or an iterable, optional
  61. Determines the cross-validation splitting strategy.
  62. Possible inputs for cv are:
  63. - None, to use the default 3-fold cross validation,
  64. - integer, to specify the number of folds in a `(Stratified)KFold`,
  65. - An object to be used as a cross-validation generator.
  66. - An iterable yielding train, test splits.
  67. For integer/None inputs, if the estimator is a classifier and ``y`` is
  68. either binary or multiclass, :class:`StratifiedKFold` is used. In all
  69. other cases, :class:`KFold` is used.
  70. Refer :ref:`User Guide <cross_validation>` for the various
  71. cross-validation strategies that can be used here.
  72. refit : boolean, or string, default=True
  73. Refit an estimator using the best found parameters on the whole
  74. dataset.
  75. For multiple metric evaluation, this needs to be a string denoting the
  76. scorer is used to find the best parameters for refitting the estimator
  77. at the end.
  78. The refitted estimator is made available at the ``best_estimator_``
  79. attribute and permits using ``predict`` directly on this
  80. ``GridSearchCV`` instance.
  81. Also for multiple metric evaluation, the attributes ``best_index_``,
  82. ``best_score_`` and ``best_parameters_`` will only be available if
  83. ``refit`` is set and all of them will be determined w.r.t this specific
  84. scorer.
  85. See ``scoring`` parameter to know more about multiple metric
  86. evaluation.
  87. verbose : integer
  88. Controls the verbosity: the higher, the more messages.
  89. error_score : 'raise' (default) or numeric
  90. Value to assign to the score if an error occurs in estimator fitting.
  91. If set to 'raise', the error is raised. If a numeric value is given,
  92. FitFailedWarning is raised. This parameter does not affect the refit
  93. step, which will always raise the error.
  94. return_train_score : boolean, optional
  95. If ``False``, the ``cv_results_`` attribute will not include training
  96. scores.
  97. Current default is ``'warn'``, which behaves as ``True`` in addition
  98. to raising a warning when a training score is looked up.
  99. That default will be changed to ``False`` in 0.21.
  100. Computing training scores is used to get insights on how different
  101. parameter settings impact the overfitting/underfitting trade-off.
  102. However computing the scores on the training set can be
  103. computationally
  104. expensive and is not strictly required to select the parameters that
  105. yield the best generalization performance.
  106. Examples
  107. --------
  108. >>> from sklearn import svm, datasets
  109. >>> from sklearn.model_selection import GridSearchCV
  110. >>> iris = datasets.load_iris()
  111. >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
  112. >>> svc = svm.SVC()
  113. >>> clf = GridSearchCV(svc, parameters)
  114. >>> clf.fit(iris.data, iris.target)
  115. ... doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
  116. GridSearchCV(cv=None, error_score=...,
  117. estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
  118. decision_function_shape='ovr', degree=..., gamma=...,
  119. kernel='rbf', max_iter=-1, probability=False,
  120. random_state=None, shrinking=True, tol=...,
  121. verbose=False),
  122. fit_params=None, iid=..., n_jobs=1,
  123. param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,
  124. scoring=..., verbose=...)
  125. >>> sorted(clf.cv_results_.keys())
  126. ... doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
  127. ['mean_fit_time', 'mean_score_time', 'mean_test_score',...
  128. 'mean_train_score', 'param_C', 'param_kernel', 'params',...
  129. 'rank_test_score', 'split0_test_score',...
  130. 'split0_train_score', 'split1_test_score', 'split1_train_score',...
  131. 'split2_test_score', 'split2_train_score',...
  132. 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]
  133. Attributes
  134. ----------
  135. cv_results_ : dict of numpy (masked) ndarrays
  136. A dict with keys as column headers and values as columns, that can be
  137. imported into a pandas ``DataFrame``.
  138. For instance the below given table
  139. +------------+-----------+------------+-----------------+---+---------+
  140. |param_kernel|param_gamma|param_degree|split0_test_score|...
  141. |rank_t...|
  142. +============+===========+============+========
  143. =========+===+=========+
  144. | 'poly' | -- | 2 | 0.8 |...| 2 |
  145. +------------+-----------+------------+-----------------+---+---------+
  146. | 'poly' | -- | 3 | 0.7 |...| 4 |
  147. +------------+-----------+------------+-----------------+---+---------+
  148. | 'rbf' | 0.1 | -- | 0.8 |...| 3 |
  149. +------------+-----------+------------+-----------------+---+---------+
  150. | 'rbf' | 0.2 | -- | 0.9 |...| 1 |
  151. +------------+-----------+------------+-----------------+---+---------+
  152. will be represented by a ``cv_results_`` dict of::
  153. {
  154. 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
  155. mask = [False False False False]...)
  156. 'param_gamma': masked_array(data = [-- -- 0.1 0.2],
  157. mask = [ True True False False]...),
  158. 'param_degree': masked_array(data = [2.0 3.0 -- --],
  159. mask = [False False True True]...),
  160. 'split0_test_score' : [0.8, 0.7, 0.8, 0.9],
  161. 'split1_test_score' : [0.82, 0.5, 0.7, 0.78],
  162. 'mean_test_score' : [0.81, 0.60, 0.75, 0.82],
  163. 'std_test_score' : [0.02, 0.01, 0.03, 0.03],
  164. 'rank_test_score' : [2, 4, 3, 1],
  165. 'split0_train_score' : [0.8, 0.9, 0.7],
  166. 'split1_train_score' : [0.82, 0.5, 0.7],
  167. 'mean_train_score' : [0.81, 0.7, 0.7],
  168. 'std_train_score' : [0.03, 0.03, 0.04],
  169. 'mean_fit_time' : [0.73, 0.63, 0.43, 0.49],
  170. 'std_fit_time' : [0.01, 0.02, 0.01, 0.01],
  171. 'mean_score_time' : [0.007, 0.06, 0.04, 0.04],
  172. 'std_score_time' : [0.001, 0.002, 0.003, 0.005],
  173. 'params' : [{'kernel': 'poly', 'degree': 2}, ...],
  174. }
  175. NOTE
  176. The key ``'params'`` is used to store a list of parameter
  177. settings dicts for all the parameter candidates.
  178. The ``mean_fit_time``, ``std_fit_time``, ``mean_score_time`` and
  179. ``std_score_time`` are all in seconds.
  180. For multi-metric evaluation, the scores for all the scorers are
  181. available in the ``cv_results_`` dict at the keys ending with that
  182. scorer's name (``'_<scorer_name>'``) instead of ``'_score'`` shown
  183. above. ('split0_test_precision', 'mean_train_precision' etc.)
  184. best_estimator_ : estimator or dict
  185. Estimator that was chosen by the search, i.e. estimator
  186. which gave highest score (or smallest loss if specified)
  187. on the left out data. Not available if ``refit=False``.
  188. See ``refit`` parameter for more information on allowed values.
  189. best_score_ : float
  190. Mean cross-validated score of the best_estimator
  191. For multi-metric evaluation, this is present only if ``refit`` is
  192. specified.
  193. best_params_ : dict
  194. Parameter setting that gave the best results on the hold out data.
  195. For multi-metric evaluation, this is present only if ``refit`` is
  196. specified.

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=3263
赞同 0
评论 0 条
西牛粗犷L0
粉丝 0 发表 6 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2951
【软件正版化】软件正版化工作要点  2872
统信UOS试玩黑神话:悟空  2833
信刻光盘安全隔离与信息交换系统  2728
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1261
grub引导程序无法找到指定设备和分区  1226
华为全联接大会2024丨软通动力分论坛精彩议程抢先看!  165
2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办  163
点击报名 | 京东2025校招进校行程预告  163
华为纯血鸿蒙正式版9月底见!但Mate 70的内情还得接着挖...  159
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
如何玩转信创开放社区—从小白进阶到专家 15
信创开放社区邀请他人注册的具体步骤如下 15
方德桌面操作系统 14
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
我有15积分有什么用? 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!