ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
目录
基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
推荐文章
Py之featuretools:featuretools库的简介、安装、使用方法之详细攻略
ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生实现
contents={"name": ['Bob', 'LiSa', 'Mary', 'Alan'],
"ID": [1, 2, 3, 4], 输出 NaN
"age": [np.nan, 28, 38 , '' ], 输出
"born": [pd.NaT, pd.Timestamp("1990-01-01"), pd.Timestamp("1980-01-01"), ''], 输出 NaT
"sex": ['男', '女', '女', '男',], 输出 None
"hobbey":['打篮球', '打羽毛球', '打乒乓球', '',], 输出
"money":[200.0, 240.0, 290.0, 300.0], 输出
"weight":[140.5, 120.8, 169.4, 155.6], 输出
}
- name ID age born sex hobbey money weight
- 0 Bob 1 NaN NaT 男 打篮球 200.0 140.5
- 1 LiSa 2 28 1990-01-01 女 打羽毛球 240.0 120.8
- 2 Mary 3 38 1980-01-01 女 打乒乓球 290.0 169.4
- 3 Alan 4 NaT 男 300.0 155.6
- -------------------------------------------
- nums_df:----------------------------------
- name ID age money weight
- 0 Bob 1 NaN 200.0 140.5
- 1 LiSa 2 28.0 240.0 120.8
- 2 Mary 3 38.0 290.0 169.4
- 3 Alan 4 NaN 300.0 155.6
- cats_df:----------------------------------
- ID hobbey sex born
- 0 4 NaN 男 NaN
- 1 1 打篮球 男 NaN
- 2 2 打羽毛球 女 1990-01-01
- ---------------------------------DFS设计:-----------------------------------
- feature_matrix_nums
- ID age money weight cats.hobbey cats.sex cats.COUNT(nums) \
- name
- Bob 1 NaN 200.0 140.5 打篮球 男 1.0
- LiSa 2 28.0 240.0 120.8 打羽毛球 女 1.0
- Mary 3 38.0 290.0 169.4 NaN NaN NaN
-
- cats.MAX(nums.age) cats.MAX(nums.money) cats.MAX(nums.weight) \
- name
- Bob NaN 200.0 140.5
- LiSa 28.0 240.0 120.8
- Mary NaN NaN NaN
-
- cats.MEAN(nums.age) cats.MEAN(nums.money) cats.MEAN(nums.weight) \
- name
- Bob NaN 200.0 140.5
- LiSa 28.0 240.0 120.8
- Mary NaN NaN NaN
-
- cats.MIN(nums.age) cats.MIN(nums.money) cats.MIN(nums.weight) \
- name
- Bob NaN 200.0 140.5
- LiSa 28.0 240.0 120.8
- Mary NaN NaN NaN
-
- cats.SKEW(nums.age) cats.SKEW(nums.money) cats.SKEW(nums.weight) \
- name
- Bob NaN NaN NaN
- LiSa NaN NaN NaN
- Mary NaN NaN NaN
-
- cats.STD(nums.age) cats.STD(nums.money) cats.STD(nums.weight) \
- name
- Bob NaN NaN NaN
- LiSa NaN NaN NaN
- Mary NaN NaN NaN
-
- cats.SUM(nums.age) cats.SUM(nums.money) cats.SUM(nums.weight) \
- name
- Bob 0.0 200.0 140.5
- LiSa 28.0 240.0 120.8
- Mary NaN NaN NaN
-
- cats.DAY(born) cats.MONTH(born) cats.WEEKDAY(born) cats.YEAR(born)
- name
- Bob NaN NaN NaN NaN
- LiSa 1.0 1.0 0.0 1990.0
- Mary NaN NaN NaN NaN
- features_defs_nums: 29 [<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]
- feature_matrix_cats_df
- hobbey sex COUNT(nums) MAX(nums.age) MAX(nums.money) MAX(nums.weight) \
- ID
- 4 NaN 男 1 NaN 300.0 155.6
- 1 打篮球 男 1 NaN 200.0 140.5
- 2 打羽毛球 女 1 28.0 240.0 120.8
-
- MEAN(nums.age) MEAN(nums.money) MEAN(nums.weight) MIN(nums.age) \
- ID
- 4 NaN 300.0 155.6 NaN
- 1 NaN 200.0 140.5 NaN
- 2 28.0 240.0 120.8 28.0
-
- MIN(nums.money) MIN(nums.weight) SKEW(nums.age) SKEW(nums.money) \
- ID
- 4 300.0 155.6 NaN NaN
- 1 200.0 140.5 NaN NaN
- 2 240.0 120.8 NaN NaN
-
- SKEW(nums.weight) STD(nums.age) STD(nums.money) STD(nums.weight) \
- ID
- 4 NaN NaN NaN NaN
- 1 NaN NaN NaN NaN
- 2 NaN NaN NaN NaN
-
- SUM(nums.age) SUM(nums.money) SUM(nums.weight) DAY(born) MONTH(born) \
- ID
- 4 0.0 300.0 155.6 NaN NaN
- 1 0.0 200.0 140.5 NaN NaN
- 2 28.0 240.0 120.8 1.0 1.0
-
- WEEKDAY(born) YEAR(born)
- ID
- 4 NaN NaN
- 1 NaN NaN
- 2 0.0 1990.0
- features_defs_cats_df: 25 [<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]
- <Feature: SUM(nums.age)>
- The sum of the "age" of all instances of "nums" for each "ID" in "cats".
features_defs_cats_df: 25
[<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]
ID | hobbey | sex | COUNT(nums) | MAX(nums.age) | MAX(nums.money) | MAX(nums.weight) | MEAN(nums.age) | MEAN(nums.money) | MEAN(nums.weight) | MIN(nums.age) | MIN(nums.money) | MIN(nums.weight) | SKEW(nums.age) | SKEW(nums.money) | SKEW(nums.weight) | STD(nums.age) | STD(nums.money) | STD(nums.weight) | SUM(nums.age) | SUM(nums.money) | SUM(nums.weight) | DAY(born) | MONTH(born) | WEEKDAY(born) | YEAR(born) |
4 | 男 | 1 | 300 | 155.6 | 300 | 155.6 | 300 | 155.6 | 0 | 300 | 155.6 | ||||||||||||||
1 | 打篮球 | 男 | 1 | 200 | 140.5 | 200 | 140.5 | 200 | 140.5 | 0 | 200 | 140.5 | |||||||||||||
2 | 打羽毛球 | 女 | 1 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | 1 | 1 | 0 | 1990 |
ID | hobbey | sex | COUNT(nums) | ||||||
4 | 男 | 1 | |||||||
1 | 打篮球 | 男 | 1 | ||||||
2 | 打羽毛球 | 女 | 1 | ||||||
MAX(nums.age) | MAX(nums.money) | MAX(nums.weight) | MEAN(nums.age) | MEAN(nums.money) | MEAN(nums.weight) | MIN(nums.age) | MIN(nums.money) | MIN(nums.weight) | |
300 | 155.6 | 300 | 155.6 | 300 | 155.6 | ||||
200 | 140.5 | 200 | 140.5 | 200 | 140.5 | ||||
28 | 240 | 120.8 | 28 | 240 | 120.8 | 28 | 240 | 120.8 | |
SKEW(nums.age) | SKEW(nums.money) | SKEW(nums.weight) | STD(nums.age) | STD(nums.money) | STD(nums.weight) | SUM(nums.age) | SUM(nums.money) | SUM(nums.weight) | |
0 | 300 | 155.6 | |||||||
0 | 200 | 140.5 | |||||||
28 | 240 | 120.8 | |||||||
DAY(born) | MONTH(born) | WEEKDAY(born) | YEAR(born) | ||||||
1 | 1 | 0 | 1990 |
字段解释:
features_defs_nums: 29
[<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]
name | ID | age | money | weight | cats.hobbey | cats.sex | cats.COUNT(nums) | cats.MAX(nums.age) | cats.MAX(nums.money) | cats.MAX(nums.weight) | cats.MEAN(nums.age) | cats.MEAN(nums.money) | cats.MEAN(nums.weight) | cats.MIN(nums.age) | cats.MIN(nums.money) | cats.MIN(nums.weight) | cats.SKEW(nums.age) | cats.SKEW(nums.money) |
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!