Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have a dataframe,
df
, which is as follows:
| date | Revenue |
|-----------|---------|
| 6/2/2017 | 100 |
| 5/23/2017 | 200 |
| 5/20/2017 | 300 |
| 6/22/2017 | 400 |
| 6/21/2017 | 500 |
I need to group the above data by month to get output as:
| date | SUM(Revenue) |
|------|--------------|
| May | 500 |
| June | 1000 |
I tried this code, but it did not work:
df.groupby(month('date')).agg({'Revenue': 'sum'})
I want to only use Pandas or NumPy and no additional libraries.
–
In [59]: df.groupby(df['date'].dt.strftime('%B'))['Revenue'].sum().sort_values()
Out[59]:
May 500
June 1000
–
Try a groupby using a pandas Grouper:
df = pd.DataFrame({'date':['6/2/2017','5/23/2017','5/20/2017','6/22/2017','6/21/2017'],'Revenue':[100,200,300,400,500]})
df.date = pd.to_datetime(df.date)
dg = df.groupby(pd.Grouper(key='date', freq='1M')).sum() # groupby each 1 month
dg.index = dg.index.strftime('%B')
Output:
Revenue
May 500
June 1000
–
–
For DataFrame with many rows, using strftime
takes up more time. If the date column already has dtype of datetime64[ns]
(can use pd.to_datetime()
to convert, or specify parse_dates
during csv import, etc.), one can directly access datetime property for groupby
labels (Method 3). The speedup is substantial.
import numpy as np
import pandas as pd
T = pd.date_range(pd.Timestamp(0), pd.Timestamp.now()).to_frame(index=False)
T = pd.concat([T for i in range(1,10)])
T['revenue'] = pd.Series(np.random.randint(1000, size=T.shape[0]))
T.columns.values[0] = 'date'
print(T.shape) #(159336, 2)
print(T.dtypes) #date: datetime64[ns], revenue: int32
Method 1: strftime
%timeit -n 10 -r 7 T.groupby(T['date'].dt.strftime('%B'))['revenue'].sum()
1.47 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 2: Grouper
%timeit -n 10 -r 7 T.groupby(pd.Grouper(key='date', freq='1M')).sum()
#NOTE Manually map months as integer {01..12} to strings
56.9 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 3: datetime properties
%timeit -n 10 -r 7 T.groupby(T['date'].dt.month)['revenue'].sum()
#NOTE Manually map months as integer {01..12} to strings
34 ms ± 3.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
–
# Explicitly convert to date
df['Date'] = pd.to_datetime(df['Date'])
# Set your date column as index
df.set_index('Date',inplace=True)
# For monthly use 'M', If needed for other freq you can change.
df[revenue].resample('M').sum()
This code gives the same result as shivsn's answer on the first post.
But the thing is we can do a lot more operations in this mentioned code.
It is recommended to use this:
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df.set_index('Date',inplace=True)
>>> df['withdrawal'].resample('M').sum().sort_values()
2019-10-31 28710.00
2019-04-30 31437.00
2019-07-31 39728.00
2019-11-30 40121.00
2019-05-31 46495.00
2020-02-29 57751.10
2019-12-31 72469.13
2020-01-31 76115.78
2019-06-30 76947.00
2019-09-30 79847.04
2020-03-31 97920.18
2019-08-31 205279.45
Name: withdrawal, dtype: float64
where shivsn's code does the same.
>>> df.groupby(df['Date'].dt.strftime('%B'))['withdrawal'].sum().sort_values()
October 28710.00
April 31437.00
July 39728.00
November 40121.00
May 46495.00
February 57751.10
December 72469.13
January 76115.78
June 76947.00
September 79847.04
March 97920.18
August 205279.45
Name: withdrawal, dtype: float64
–
–
---> df['Date'] = pd.to_datetime(df['Date'])
Insert a new row in the data frame which has month like [May, 'June']
---> df['months'] = df['date'].apply(lambda x:x.strftime('%B'))
---> here x is date which take from date column in data frame.
Now aggregate the aggregate data in the month column and sum the revenue.
--->response_data_frame = df.groupby('months')['Revenue'].sum()
---->print(response_data_frame)
Output:
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.