EL之Bagging:kaggle比赛之利用titanic(泰坦尼克号)数据集建立Bagging模型对每个人进行获救是否预测


辅助宝
辅助宝 2022-09-19 15:41:22 51230
分类专栏: 资讯

EL之Bagging:kaggle比赛之利用titanic(泰坦尼克号)数据集建立Bagging模型对每个人进行获救是否预测

目录

输出结果

设计思路

核心代码


输出结果

设计思路

核心代码

  1. bagging_clf = BaggingRegressor(clf_LoR, n_estimators=10, max_samples=0.8, max_features=1.0, bootstrap=True, bootstrap_features=False, n_jobs=-1)
  2. bagging_clf.fit(X, y)
  3. BaggingRegressor
  4. class BaggingRegressor Found at: sklearn.ensemble.bagging
  5. class BaggingRegressor(BaseBagging, RegressorMixin):
  6. """A Bagging regressor.
  7. A Bagging regressor is an ensemble meta-estimator that fits base
  8. regressors each on random subsets of the original dataset and then
  9. aggregate their individual predictions (either by voting or by averaging)
  10. to form a final prediction. Such a meta-estimator can typically be used as
  11. a way to reduce the variance of a black-box estimator (e.g., a decision
  12. tree), by introducing randomization into its construction procedure and
  13. then making an ensemble out of it.
  14. This algorithm encompasses several works from the literature. When
  15. random
  16. subsets of the dataset are drawn as random subsets of the samples, then
  17. this algorithm is known as Pasting [1]_. If samples are drawn with
  18. replacement, then the method is known as Bagging [2]_. When random
  19. subsets
  20. of the dataset are drawn as random subsets of the features, then the
  21. method
  22. is known as Random Subspaces [3]_. Finally, when base estimators are
  23. built
  24. on subsets of both samples and features, then the method is known as
  25. Random Patches [4]_.
  26. Read more in the :ref:`User Guide <bagging>`.
  27. Parameters
  28. ----------
  29. base_estimator : object or None, optional (default=None)
  30. The base estimator to fit on random subsets of the dataset.
  31. If None, then the base estimator is a decision tree.
  32. n_estimators : int, optional (default=10)
  33. The number of base estimators in the ensemble.
  34. max_samples : int or float, optional (default=1.0)
  35. The number of samples to draw from X to train each base estimator.
  36. - If int, then draw `max_samples` samples.
  37. - If float, then draw `max_samples * X.shape[0]` samples.
  38. max_features : int or float, optional (default=1.0)
  39. The number of features to draw from X to train each base estimator.
  40. - If int, then draw `max_features` features.
  41. - If float, then draw `max_features * X.shape[1]` features.
  42. bootstrap : boolean, optional (default=True)
  43. Whether samples are drawn with replacement.
  44. bootstrap_features : boolean, optional (default=False)
  45. Whether features are drawn with replacement.
  46. oob_score : bool
  47. Whether to use out-of-bag samples to estimate
  48. the generalization error.
  49. warm_start : bool, optional (default=False)
  50. When set to True, reuse the solution of the previous call to fit
  51. and add more estimators to the ensemble, otherwise, just fit
  52. a whole new ensemble.
  53. n_jobs : int, optional (default=1)
  54. The number of jobs to run in parallel for both `fit` and `predict`.
  55. If -1, then the number of jobs is set to the number of cores.
  56. random_state : int, RandomState instance or None, optional
  57. (default=None)
  58. If int, random_state is the seed used by the random number generator;
  59. If RandomState instance, random_state is the random number
  60. generator;
  61. If None, the random number generator is the RandomState instance
  62. used
  63. by `np.random`.
  64. verbose : int, optional (default=0)
  65. Controls the verbosity of the building process.
  66. Attributes
  67. ----------
  68. estimators_ : list of estimators
  69. The collection of fitted sub-estimators.
  70. estimators_samples_ : list of arrays
  71. The subset of drawn samples (i.e., the in-bag samples) for each base
  72. estimator. Each subset is defined by a boolean mask.
  73. estimators_features_ : list of arrays
  74. The subset of drawn features for each base estimator.
  75. oob_score_ : float
  76. Score of the training dataset obtained using an out-of-bag estimate.
  77. oob_prediction_ : array of shape = [n_samples]
  78. Prediction computed with out-of-bag estimate on the training
  79. set. If n_estimators is small it might be possible that a data point
  80. was never left out during the bootstrap. In this case,
  81. `oob_prediction_` might contain NaN.
  82. References
  83. ----------
  84. .. [1] L. Breiman, "Pasting small votes for classification in large
  85. databases and on-line", Machine Learning, 36(1), 85-103, 1999.
  86. .. [2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140,
  87. 1996.
  88. .. [3] T. Ho, "The random subspace method for constructing decision
  89. forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844,
  90. 1998.
  91. .. [4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine
  92. Learning and Knowledge Discovery in Databases, 346-361, 2012.
  93. """
  94. def __init__(self,
  95. base_estimator=None,
  96. n_estimators=10,
  97. max_samples=1.0,
  98. max_features=1.0,
  99. bootstrap=True,
  100. bootstrap_features=False,
  101. oob_score=False,
  102. warm_start=False,
  103. n_jobs=1,
  104. random_state=None,
  105. verbose=0):
  106. super(BaggingRegressor, self).__init__(base_estimator,
  107. n_estimators=n_estimators, max_samples=max_samples,
  108. max_features=max_features, bootstrap=bootstrap,
  109. bootstrap_features=bootstrap_features, oob_score=oob_score,
  110. warm_start=warm_start, n_jobs=n_jobs, random_state=random_state,
  111. verbose=verbose)
  112. def predict(self, X):
  113. """Predict regression target for X.
  114. The predicted regression target of an input sample is computed as
  115. the
  116. mean predicted regression targets of the estimators in the ensemble.
  117. Parameters
  118. ----------
  119. X : {array-like, sparse matrix} of shape = [n_samples, n_features]
  120. The training input samples. Sparse matrices are accepted only if
  121. they are supported by the base estimator.
  122. Returns
  123. -------
  124. y : array of shape = [n_samples]
  125. The predicted values.
  126. """
  127. check_is_fitted(self, "estimators_features_")
  128. Check data
  129. X = check_array(X, accept_sparse=['csr', 'csc'])
  130. Parallel loop
  131. n_jobs, n_estimators, starts = _partition_estimators(self.n_estimators,
  132. self.n_jobs)
  133. all_y_hat = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
  134. delayed(_parallel_predict_regression)(
  135. self.estimators_[starts[i]:starts[i + 1]],
  136. self.estimators_features_[starts[i]:starts[i + 1]],
  137. X) for
  138. i in range(n_jobs))
  139. Reduce
  140. y_hat = sum(all_y_hat) / self.n_estimators
  141. return y_hat
  142. def _validate_estimator(self):
  143. """Check the estimator and set the base_estimator_ attribute."""
  144. super(BaggingRegressor, self)._validate_estimator
  145. (default=DecisionTreeRegressor())
  146. def _set_oob_score(self, X, y):
  147. n_samples = y.shape[0]
  148. predictions = np.zeros((n_samples, ))
  149. n_predictions = np.zeros((n_samples, ))
  150. for estimator, samples, features in zip(self.estimators_,
  151. self.estimators_samples_,
  152. self.estimators_features_):
  153. Create mask for OOB samples
  154. mask = ~samples
  155. predictions[mask] += estimator.predict(mask:])[(X[:features])
  156. n_predictions[mask] += 1
  157. if (n_predictions == 0).any():
  158. warn("Some inputs do not have OOB scores. "
  159. "This probably means too few estimators were used "
  160. "to compute any reliable oob estimates.")
  161. n_predictions[n_predictions == 0] = 1
  162. predictions /= n_predictions
  163. self.oob_prediction_ = predictions
  164. self.oob_score_ = r2_score(y, predictions)
文章知识点与官方知识档案匹配,可进一步学习相关知识

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=3353
赞同 0
评论 0 条
辅助宝L0
粉丝 0 发表 6 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2935
【软件正版化】软件正版化工作要点  2854
统信UOS试玩黑神话:悟空  2811
信刻光盘安全隔离与信息交换系统  2702
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1235
grub引导程序无法找到指定设备和分区  1205
点击报名 | 京东2025校招进校行程预告  162
华为全联接大会2024丨软通动力分论坛精彩议程抢先看!  160
2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办  156
华为纯血鸿蒙正式版9月底见!但Mate 70的内情还得接着挖...  154
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
信创开放社区邀请他人注册的具体步骤如下 15
如何玩转信创开放社区—从小白进阶到专家 15
方德桌面操作系统 14
我有15积分有什么用? 13
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!