Python语言学习:利用pandas对两列字段元素求差集(对比两列字段所有元素的异同)
目录
利用pandas对两列字段元素求差集(对比两列字段所有元素的异同)
- parallel_match_res:
- col01 col02
- 11 12.0 NaN
- 12 13.0 NaN
- 13 14.0 NaN
- 14 NaN 15.0
- 15 NaN 16.0
- 16 NaN 17.0
- cross_match_res01: -punctuation">[nan-punctuation">, nan-punctuation">, nan-punctuation">, 15.0-punctuation">, 16.0-punctuation">, 17.0-punctuation">]
- cross_match_res02: -punctuation">[nan-punctuation">, nan-punctuation">, nan-punctuation">, nan-punctuation">, nan-punctuation">, nan-punctuation">, 15.0-punctuation">, 16.0-punctuation">, 17.0-punctuation">, 12.0-punctuation">, 13.0-punctuation">, 14.0-punctuation">]
- only_list_prod: 6 -punctuation">[12.0-punctuation">, 13.0-punctuation">, 14.0-punctuation">, nan-punctuation">, nan-punctuation">, nan-punctuation">]
- only_list_dev: 6 -punctuation">[nan-punctuation">, nan-punctuation">, nan-punctuation">, 15.0-punctuation">, 16.0-punctuation">, 17.0-punctuation">]
- Python语言学习:利用pandas对两列字段元素求差集(对比两列字段所有元素的异同)
-
- import pandas as pd
-
-
- data_path = 'data/demo_data_find_difference.xls'
- df = pd.read_excel(data_path)
-
-
-
- 求差集
- (1)、两列平行匹配求差集
- parallel_match_res = df[df["col01"] != df["col02"]]
- print('parallel_match_res: \n',parallel_match_res)
-
-
- (2)、两列交叉匹配求差集
- list_prod = df["col01"].tolist()
- list_dev = df["col02"].tolist()
-
- T1、集合交集运算符实现
- cross_match_res01 = list(set(list_dev).difference(set(list_prod)))
- cross_match_res02 = list(set(list_prod)^set(list_dev))
- print('cross_match_res01:',sorted(cross_match_res01))
- print('cross_match_res02:',cross_match_res02)
-
- T2、for循环判断实现
- only_list_prod = [x for x in list_prod if x not in list_dev] 在list1列表中而不在list2列表中
- only_list_dev = [y for y in list_dev if y not in list_prod] 在list2列表中而不在list1列表中
- print('only_list_prod:',len(only_list_prod), sorted(only_list_prod))
- print('only_list_dev:',len(only_list_dev), sorted(only_list_dev))
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!