Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to get unique counts column-wise but my array has categorical variables (dtype object)
val, count = np.unique(x, axis=1, return_counts=True)
Though I am getting an error like this:
TypeError: The axis argument to unique is not supported for dtype object
How do I sove this problem?
Sample x:
array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
Need the following counts:
for x_T in x.T:
val, count = np.unique(x_T, return_counts=True)
print (val,count)
[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
' Never-married'] [1 6 1 1]
–
You could use Itemfreq eventhough it the output does not look like yours it delivers the desired counts:
import numpy as np
from scipy.stats import itemfreq
x = np. array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
itemfreq(x)
Output:
array([[' 11th', 1],
[' 9th', 1],
[' Bachelors', 2],
[' Divorced', 1],
[' HS-grad', 2],
[' Married-civ-spouse', 6],
[' Married-spouse-absent', 1],
[' Masters', 2],
[' Never-married', 1],
[' Private', 8],
[' Self-emp-not-inc', 1],
[' Some-college', 1]], dtype=object)
otherwise you could try to specifiy another dtype such as:
val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)
for this however your array has to be different
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.