相关文章推荐
开朗的小刀  ·  Release 0.5.0 - ...·  3 月前    · 
爱喝酒的白开水  ·  pandas.read_excel — ...·  1 月前    · 
性感的煎鸡蛋  ·  ./node_modules/axios/i ...·  2 年前    · 
暴走的柿子  ·  C 语言中将uint8_t ...·  2 年前    · 
精明的豆腐  ·  SpringBoot集成RabbitMQ ...·  2 年前    · 
严肃的鼠标  ·  python - ...·  3 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Pandas GroupBy.agg() throws TypeError: aggregate() missing 1 required positional argument: 'arg'

Ask Question

I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the documentation:

https://pandas-docs.github.io/pandas-docs-travis/user_guide/groupby.html#named-aggregation

I do not see why I’m getting the error below. Could someone please point out the issue and tell me how to fix it?

code:

qt_dy.groupby('date').agg(std_qty=('qty','std'),mean_qty=('qty','mean'),)

error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-6bb3aabf313f> in <module>
      6 qt_dy.groupby('date')\
----> 7 .agg(std_qty=('qty','std'),mean_qty=('qty','mean'))
TypeError: aggregate() missing 1 required positional argument: 'arg'

Looks like you're trying to use agg with Named aggregationsthis is a supported feature from v0.25 and above ONLY.

For older versions, you will need to use the list of tuples format:

qt_dy.groupby('date')['qty'].agg([('std_qty','std'), ('mean_qty','mean')])

Or, to aggregate multiple columns, a dictionary:

qt_dy.groupby('date').agg({'qty': [('std_qty','std'), ('mean_qty','mean')]})

For more information, take a look at my answer here.

Thanks that did the trick. I'm running python3.7 with anaconda. If I upgraded conda and all my packages would my original code work? I just installed python on this machine like a month ago, figured it was a pretty new version of python. – user3476463 Jun 30, 2019 at 0:35 @user3476463 no, this is a pandas version issue. 0.25 is currently in development and won't be out for a few more weeks. – cs95 Jun 30, 2019 at 0:48

I just wanted to add to the above answer.

If you are getting this error because your pandas version is older than 0.25 print(pd.__version__) and if you want to aggregate across multple columns avoiding the pivot structure that pandas generate here is the code.

Let us first create a sample Pandas dataframe

import pandas as pd
df = pd.DataFrame({'key1' : ['a','a','a','b','a'],
                   'key2' : ['c','c','d','d','e'],
                   'value1' : [1,2,2,3,3],
                   'value2' : [9,8,7,6,5]})
df.head(5)

Here is how the table we created looks like:

|----------------|-------------|------------|------------|
|      key1      |     key2    |    value1  |    value2  |
|----------------|-------------|------------|------------|
|       a        |       c     |      1     |       9    |
|       a        |       c     |      2     |       8    |
|       a        |       d     |      2     |       7    |
|       b        |       d     |      3     |       6    |
|       a        |       e     |      3     |       5    |
|----------------|-------------|------------|------------|

Now to do the aggregation for both value1 and value2 you will run this code:

df_agg = df.groupby(['key1','key2'],as_index=False).agg({'value1':['mean','count'],'value2':'sum'})
df_agg.columns = ['_'.join(col).strip() for col in df_agg.columns.values]                                       
df_agg.head(5)

The resulting table will look like this:

|----------------|-------------|--------------------|-------------------|---------------------|
|      key1      |     key2    |     value1_mean    |   value1_count    |      value2_sum     |
|----------------|-------------|--------------------|-------------------|---------------------|
|       a        |      c      |         1.5        |         2         |          17         |
|       a        |      d      |         2.0        |         1         |           7         |   
|       a        |      e      |         3.0        |         1         |           5         |        
|       b        |      d      |         3.0        |         1         |           6         |     
|----------------|-------------|--------------------|-------------------|---------------------|

If you want the column names to be something else then just rename it like below:

df_agg.rename(columns={"value1_mean" : "mean of value1", 
                   "value1_count" : "count of value1",
                   "value2_sum" : "sum of value2"      
                   }, inplace=True)

Hope this helps.

I get underscore after key1 and key2 as well; is there a way to avoid that? pandas version 0.24.2 – discipulus Jun 6, 2021 at 11:50 You just have to rename the column names. That is the only way that I know of right now. I have updated my answer to include how to do that. – Abhishek R Jun 7, 2021 at 5:13

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.