Python之sklearn:LabelEncoder函数简介(编码与编码还原)、使用方法、具体案例之详细攻略
目录
2、在数据缺失和test数据内存在新值(train数据未出现过)环境下的数据LabelEncoder化
class LabelEncoder Found at: sklearn.preprocessing._labelclass LabelEncoder(TransformerMixin, BaseEstimator): | ""对目标标签进行编码,值在0到n_class -1之间。 这个转换器应该用于编码目标值,*即' y ',而不是输入' X '。 更多内容见:ref: ' User Guide '。 |
.. versionadded:: 0.12 Attributes ---------- classes_ : array of shape (n_class,) Holds the label for each class. Examples -------- `LabelEncoder` can be used to normalize labels. >>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6]) It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. >>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris'] See also -------- sklearn.preprocessing.OrdinalEncoder : Encode categorical features using an ordinal encoding scheme. sklearn.preprocessing.OneHotEncoder : Encode categorical features as a one-hot numeric array. | . .versionadded:: 0.12 >>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
|
""" Parameters Returns Parameters Returns Parameters Returns Parameters Returns |
| Fit label encoder |
Fit label encoder and return encoded labels | |
| Get parameters for this estimator. |
Transform labels back to original encoding. | |
| Set the parameters of this estimator. |
| Transform labels to normalized encoding. |
- import pandas as pd
- from sklearn.preprocessing import LabelEncoder
- from DataScienceNYY.DataAnalysis import dataframe_fillAnyNull,Dataframe2LabelEncoder
-
-
- 构造数据
- train_data_dict={'Name':['张三','李四','王五','赵六','张七','李八','王十','un'],
- 'Age':[22,23,24,25,22,22,22,None],
- 'District':['北京','上海','广东','深圳','山东','河南','浙江',' '],
- 'Job':['CEO','CTO','CFO','COO','CEO','CTO','CEO','']}
- test_data_dict={'Name':['张三','李四','王十一',None],
- 'Age':[22,23,22,'un'],
- 'District':['北京','上海','广东',''],
- 'Job':['CEO','CTO','UFO',' ']}
- train_data_df = pd.DataFrame(train_data_dict)
- test_data_df = pd.DataFrame(test_data_dict)
- print(train_data_df,'\n',test_data_df)
-
-
- 缺失数据填充
- for col in train_data_df.columns:
- train_data_df[col]=dataframe_fillAnyNull(train_data_df,col)
- test_data_df[col]=dataframe_fillAnyNull(test_data_df,col)
- print(train_data_df,'\n',test_data_df)
-
-
- 数据LabelEncoder化
- train_data,test_data=Dataframe2LabelEncoder(train_data_df,test_data_df)
- print(train_data,'\n',test_data)
- LabelEncoder can be used to normalize labels.
-
- >>>
- -meta">>>> from sklearn import preprocessing
- -meta">>>> le = preprocessing.LabelEncoder()
- -meta">>>> le.fit([1, 2, 2, 6])
- LabelEncoder()
- -meta">>>> le.classes_
- array([1, 2, 6])
- -meta">>>> le.transform([1, 1, 2, 6])
- array([0, 0, 1, 2]...)
- -meta">>>> le.inverse_transform([0, 0, 1, 2])
- array([1, 1, 2, 6])
- It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.
-
- >>>
- -meta">>>> le = preprocessing.LabelEncoder()
- -meta">>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
- LabelEncoder()
- -meta">>>> list(le.classes_)
- ['amsterdam', 'paris', 'tokyo']
- -meta">>>> le.transform(["tokyo", "tokyo", "paris"])
- array([2, 2, 1]...)
- -meta">>>> list(le.inverse_transform([2, 2, 1]))
- ['tokyo', 'tokyo', 'paris']
参考文章:Python之sklearn:LabelEncoder函数的使用方法之使用LabelEncoder之前的必要操作
- import numpy as np
- from sklearn.preprocessing import LabelEncoder
-
- 训练train数据
- LE= LabelEncoder()
- LE.fit(train_df[col])
-
- test数据中的新值添加到LE.classes_
- test_df[col] =test_df[col].map(lambda s:'Unknown' if s not in LE.classes_ else s)
- LE.classes_ = np.append(LE.classes_, 'Unknown')
-
- 分别转化train、test数据
- train_df[col] = LE.transform(train_df[col])
- test_df[col] = LE.transform(test_df[col])
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
添加我为好友,拉您入交流群!
请使用微信扫一扫!