Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the documentation:
https://pandas-docs.github.io/pandas-docs-travis/user_guide/groupby.html#named-aggregation
I do not see why I’m getting the error below. Could someone please point out the issue and tell me how to fix it?
code:
qt_dy.groupby('date').agg(std_qty=('qty','std'),mean_qty=('qty','mean'),)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-6bb3aabf313f> in <module>
6 qt_dy.groupby('date')\
----> 7 .agg(std_qty=('qty','std'),mean_qty=('qty','mean'))
TypeError: aggregate() missing 1 required positional argument: 'arg'
Looks like you're trying to use agg with Named aggregations—this is a supported feature from v0.25 and above ONLY.
For older versions, you will need to use the list of tuples format:
qt_dy.groupby('date')['qty'].agg([('std_qty','std'), ('mean_qty','mean')])
Or, to aggregate multiple columns, a dictionary:
qt_dy.groupby('date').agg({'qty': [('std_qty','std'), ('mean_qty','mean')]})
For more information, take a look at my answer here.
–
–
I just wanted to add to the above answer.
If you are getting this error because your pandas version is older than 0.25 print(pd.__version__) and if you want to aggregate across multple columns avoiding the pivot structure that pandas generate here is the code.
Let us first create a sample Pandas dataframe
import pandas as pd
df = pd.DataFrame({'key1' : ['a','a','a','b','a'],
'key2' : ['c','c','d','d','e'],
'value1' : [1,2,2,3,3],
'value2' : [9,8,7,6,5]})
df.head(5)
Here is how the table we created looks like:
|----------------|-------------|------------|------------|
| key1 | key2 | value1 | value2 |
|----------------|-------------|------------|------------|
| a | c | 1 | 9 |
| a | c | 2 | 8 |
| a | d | 2 | 7 |
| b | d | 3 | 6 |
| a | e | 3 | 5 |
|----------------|-------------|------------|------------|
Now to do the aggregation for both value1 and value2 you will run this code:
df_agg = df.groupby(['key1','key2'],as_index=False).agg({'value1':['mean','count'],'value2':'sum'})
df_agg.columns = ['_'.join(col).strip() for col in df_agg.columns.values]
df_agg.head(5)
The resulting table will look like this:
|----------------|-------------|--------------------|-------------------|---------------------|
| key1 | key2 | value1_mean | value1_count | value2_sum |
|----------------|-------------|--------------------|-------------------|---------------------|
| a | c | 1.5 | 2 | 17 |
| a | d | 2.0 | 1 | 7 |
| a | e | 3.0 | 1 | 5 |
| b | d | 3.0 | 1 | 6 |
|----------------|-------------|--------------------|-------------------|---------------------|
If you want the column names to be something else then just rename it like below:
df_agg.rename(columns={"value1_mean" : "mean of value1",
"value1_count" : "count of value1",
"value2_sum" : "sum of value2"
}, inplace=True)
Hope this helps.
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.