1 | |
1 | |
1 | |
1 | |
For the country column, there are 17040 entries. Each entry is one of the 163 categories.
Let’s keep only 3 categories which interest us:
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
Now we observe that even if there remains only 42 entries, there are still 163 categories as before.
To solve this, we can do the following:
1 | |
1 | |
Let’s see what’s going on.
df2["country"].cat is a pandas.core.arrays.categorical.CategoricalAccessor object. The method Series.cat.remove_unused_categories() removes categories which are not used and returns a Series (DataFrame column).
The above can also be done by:
1 | |
or
1 | |