Python之category-encoders:category-encoders库的简介、安装、使用方法之详细攻略
目录
一组scikit-learn风格的转换器,用不同的技术将类别变量编码成数字。一组scikit-learn风格的转换器,用不同的技术将分类变量编码成数字。虽然序数编码、单热编码和散列编码在现有scikit-learn版本中具有类似的等价性,但这个库中的变形金刚都有一些有用的特性:
文档:http://contrib.scikit-learn.org/category_encoders/
pip install category-encoders
有两种类型的编码器:无监督和有监督的。
- from category_encoders import *
- import pandas as pd
- from sklearn.datasets import load_boston
-
- prepare some data
- bunch = load_boston()
- y = bunch.target
- X = pd.DataFrame(bunch.data, columns=bunch.feature_names)
-
- use binary encoding to encode two categorical features
- enc = BinaryEncoder(cols=['CHAS', 'RAD']).fit(X)
-
- transform the dataset
- numeric_dataset = enc.transform(X)
- from category_encoders import *
- import pandas as pd
- from sklearn.datasets import load_boston
-
- prepare some data
- bunch = load_boston()
- y_train = bunch.target[0:250]
- y_test = bunch.target[250:506]
- X_train = pd.DataFrame(bunch.data[0:250], columns=bunch.feature_names)
- X_test = pd.DataFrame(bunch.data[250:506], columns=bunch.feature_names)
-
- use target encoding to encode two categorical features
- enc = TargetEncoder(cols=['CHAS', 'RAD'])
-
- transform the datasets
- training_numeric_dataset = enc.fit_transform(X_train, y_train)
- testing_numeric_dataset = enc.transform(X_test)
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!