我有一个有3列的Dataframe,两个是类别数据,一个是float16。当我执行groupby并在agg中运行lambda特定函数以根据dtype对每一列进行不同处理时,分类列上会有一个下降。
如果这样做的话,它就会奏效。
i=pd.DataFrame({"A":["a","a","a","b","c","c"],"B":[1,2,3,4,5,6],"C":[ "NaN" ,"b","NaN","b","c","c"]}) i['A'] = i['A'].astype('category') i['B'] = i['B'].astype('float16') i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: x.mean() if np.dtype(x)=='float16' else x.value_counts().index[0])
输出,这是我想要得到的是:
A B C 0 a 2.0 NaN 1 b 4.0 b 2 c 5.5 c
但是,每当我声明C列为绝对列时,python就会自动删除列C。
i=pd.DataFrame({"A":["a","a","a","b","c","c"],"B":[1,2,3,4,5,6],"C":[ "NaN" ,"b","NaN","b","c","c"]}) i['A'] = i['A'].astype('category') i['B'] = i['B'].astype('float16') i['C'] = i['C'].astype('category') i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: x.mean() if np.dtype(x)=='float16' else x.value_counts().index[0])
答案如下:
['C'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning. A B 0 a 2.0 1 b 4.0 2 c 5.5
是否有人知道groupby中的agg不能处理分类列?
发布于 2022-06-08 23:47:58
请注意,您没有正确检索类型:
i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: print(np.dtype(x)))
给出“无”,而使用 x.dtype=='float16' ,因为 x 是 pd.Series 。您可以向 .agg(lambda x: print(type(x))) 查询
x.dtype=='float16'
x
pd.Series
.agg(lambda x: print(type(x)))
i['A'] = i['A'].astype('category')