How can I group by month from a date field using Python and Pandas?

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have a dataframe, df , which is as follows:
| date      | Revenue |
|-----------|---------|
| 6/2/2017  | 100     |
| 5/23/2017 | 200     |
| 5/20/2017 | 300     |
| 6/22/2017 | 400     |
| 6/21/2017 | 500     |
I need to group the above data by month to get output as:
| date | SUM(Revenue) |
|------|--------------|
| May  | 500          |
| June | 1000         |
I tried this code, but it did not work:
df.groupby(month('date')).agg({'Revenue': 'sum'})
I want to only use Pandas or NumPy and no additional libraries.
                df.groupby(pd.Grouper(key='Date',freq='M')).agg({'Revenue':'sum'}), this assumes the data type of the date column is datetime
– gold_cy
                Jul 4, 2017 at 14:25
In [59]: df.groupby(df['date'].dt.strftime('%B'))['Revenue'].sum().sort_values()
Out[59]:
May      500
June    1000
                FYI this gives u a string column for the date which is not as performant nor useful (as real resamplimg / time grouping)
– Jeff
                Jul 4, 2017 at 15:30
Try a groupby using a pandas Grouper:
df = pd.DataFrame({'date':['6/2/2017','5/23/2017','5/20/2017','6/22/2017','6/21/2017'],'Revenue':[100,200,300,400,500]})
df.date = pd.to_datetime(df.date)
dg = df.groupby(pd.Grouper(key='date', freq='1M')).sum() # groupby each 1 month
dg.index = dg.index.strftime('%B')
Output:
     Revenue
 May    500
June    1000
                this is the most usable since the months are aggregated in order. This should be the top answer.
– JJSanDiego
                Feb 2, 2021 at 3:43
                This doesn't work if your dates are across years. Then instead of just one row for month (say May and June above), it will have multiple May's and June's for each year.
– Meet
                Jun 18, 2021 at 10:36
For DataFrame with many rows, using strftime takes up more time. If the date column already has dtype of datetime64[ns] (can use pd.to_datetime() to convert, or specify parse_dates during csv import, etc.), one can directly access datetime property for groupby labels (Method 3). The speedup is substantial.
import numpy as np
import pandas as pd
T = pd.date_range(pd.Timestamp(0), pd.Timestamp.now()).to_frame(index=False)
T = pd.concat([T for i in range(1,10)])
T['revenue'] = pd.Series(np.random.randint(1000, size=T.shape[0]))
T.columns.values[0] = 'date'
print(T.shape) #(159336, 2)
print(T.dtypes) #date: datetime64[ns], revenue: int32
Method 1: strftime
%timeit -n 10 -r 7 T.groupby(T['date'].dt.strftime('%B'))['revenue'].sum()
1.47 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 2: Grouper
%timeit -n 10 -r 7 T.groupby(pd.Grouper(key='date', freq='1M')).sum()
#NOTE Manually map months as integer {01..12} to strings
56.9 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 3: datetime properties
%timeit -n 10 -r 7 T.groupby(T['date'].dt.month)['revenue'].sum()
#NOTE Manually map months as integer {01..12} to strings
34 ms ± 3.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                Note that if you have data from more than 1 year, methods 1 and 3 aggregate over them whereas method 2 does not. Also, the result from method 1 is sorted alphabetically.
– HenriV
                Sep 12, 2018 at 14:08
# Explicitly convert to date
df['Date'] = pd.to_datetime(df['Date'])
# Set your date column as index 
df.set_index('Date',inplace=True) 
# For monthly use 'M', If needed for other freq you can change.
df[revenue].resample('M').sum()
This code gives the same result as shivsn's answer on the first post.
But the thing is we can do a lot more operations in this mentioned code.
It is recommended to use this:
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df.set_index('Date',inplace=True)
>>> df['withdrawal'].resample('M').sum().sort_values()
2019-10-31     28710.00
2019-04-30     31437.00
2019-07-31     39728.00
2019-11-30     40121.00
2019-05-31     46495.00
2020-02-29     57751.10
2019-12-31     72469.13
2020-01-31     76115.78
2019-06-30     76947.00
2019-09-30     79847.04
2020-03-31     97920.18
2019-08-31    205279.45
Name: withdrawal, dtype: float64
where shivsn's code does the same.
>>> df.groupby(df['Date'].dt.strftime('%B'))['withdrawal'].sum().sort_values()
October       28710.00
April         31437.00
July          39728.00
November      40121.00
May           46495.00
February      57751.10
December      72469.13
January       76115.78
June          76947.00
September     79847.04
March         97920.18
August       205279.45
Name: withdrawal, dtype: float64
                Why would you have to set the date as the DataFrame index when groupby works just fine? What happens if you have same date repeated? I would also consider the date formatting as first step, the highest voted answer, because it is not clear from the question if that column will be properly formatted.
– user7440787
                Apr 6, 2020 at 17:50
                Either it is a duplicate or unique , it is going to sum up all those values. It acts as same as groupby.And apart from groupby options , By setting the date column as index you can perform lot more basic operations. Thanks.
– Jeywanth Kannan
                Apr 7, 2020 at 20:14
---> df['Date'] = pd.to_datetime(df['Date'])
Insert a new row in the data frame which has month like [May, 'June']
---> df['months'] = df['date'].apply(lambda x:x.strftime('%B'))
---> here x is date which take from date column in data frame.
Now aggregate the aggregate data in the month column and sum the revenue.
--->response_data_frame = df.groupby('months')['Revenue'].sum()
---->print(response_data_frame)
Output:
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.