pandas统计数据项重复值次数和删除
程序员文章站
2022-04-14 10:39:25
...
整体代码:
import numpy as np
import pandas as pd
# 原始数据
df = pd.DataFrame({'key1':['a','a','b','b','a','a'],
'key2':['one','one','one','two','one','one'],
'data1':[1,1,2,2,3,3],
# 'data2':np.random.randn(6)
})
df
df.duplicated()#查看是否有重复
-df.duplicated()
dup=df[df.duplicated()]#重复
df[df.duplicated()].count()#重复个数
nodup=df[-df.duplicated()]#重复去除
nodup
图片:
分解看:
import numpy as np
import pandas as pd
# 原始数据
df = pd.DataFrame({'key1':['a','a','b','b','a','a'],
'key2':['one','one','one','two','one','one'],
'data1':[1,1,2,2,3,3],
# 'data2':np.random.randn(6)
})
df
df.duplicated()#查看是否有重复
-df.duplicated()
dup=df[df.duplicated()]#重复
df[df.duplicated()].count()#重复个数
nodup=df[-df.duplicated()]#重复去除
nodup