Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams
>>> df = pd.DataFrame({'user_id':['a','a','s','s','s'],
                    'session':[4,5,4,5,5],
                    'revenue':[-1,0,1,2,1]})
   revenue  session user_id
0       -1        4       a
1        0        5       a
2        1        4       s
3        2        5       s
4        1        5       s

And each value of session and revenue represents a kind of type, and I want to count the number of each kind say the number of revenue=-1 and session=4 of user_id=a is 1.

And I found simple call count() function after groupby() can't output the result I want.

>>> df.groupby('user_id').count()
         revenue  session
user_id
a              2        2
s              3        3

How can I do that?

from pandas 1.1 this is made possible with df.value_counts, allowing you to avoid the grouping step. – cs95 Jul 8, 2020 at 19:51

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

pandas >= 1.1: df.value_counts is available!

From pandas 1.1, this will be my recommended method for counting the number of rows in groups (i.e., the group size). To count the number of non-nan rows in a group for a specific column, check out the accepted answer.

df.groupby(['A', 'B']).size()   # df.groupby(['A', 'B'])['C'].count()

New [✓]

df.value_counts(subset=['A', 'B']) 

Note that size and count are not identical, the former counts all rows per group, the latter counts non-null rows only. See this other answer of mine for more.

Minimal Example

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'
df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
                   'num_wings': [2, 0, 0, 0]},
                  index=['falcon', 'dog', 'cat', 'ant'])
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0
df.value_counts(subset=['num_legs', 'num_wings'], sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
dtype: int64

Compare this output with

df.groupby(['num_legs', 'num_wings'])['num_legs'].size()
num_legs  num_wings
2         2            1
4         0            2
6         0            1
Name: num_legs, dtype: int64

Performance

It's also faster if you don't sort the result:

%timeit df.groupby(['num_legs', 'num_wings'])['num_legs'].count()
%timeit df.value_counts(subset=['num_legs', 'num_wings'], sort=False)
640 µs ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
568 µs ± 6.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
                When I use subset with value_counts and do a print, I lose the column titles; any idea why this could be?
– FMFF
                Apr 27, 2021 at 6:48
                Having the same problem as @FMFF. I'm loosing the columns i grouped by. They are instead functioning as the index in the form of 'columnA/columnB'
– petzholt
                Sep 16, 2021 at 8:13
                @FMFF I simply call .to_frame() when I want the index to create a new df. In this case it would be: df.value_counts(subset=['num_legs', 'num_wings'], sort=False).to_frame()
– Filippo Vitale
                Jul 21, 2022 at 1:05

I struggled with the same issue, made use of the solution provided above. You can actually designate any of the columns to count:

df.groupby(['revenue','session','user_id'])['revenue'].count()
df.groupby(['revenue','session','user_id'])['session'].count()

would give the same answer.

This answer is incorrect, you cannot designate any column for the "count", since count() only counts non-null values and different columns can have different nan counts. If you are looking to count the number of rows in each group, use df.value_counts(subset=['...]) instead (see here). – cs95 Jul 9, 2020 at 5:05