1 |
|
1 |
|
1 |
|
1 |
|
For the country
column, there are 17040 entries. Each entry is one of the 163 categories.
Let’s keep only 3 categories which interest us:
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Now we observe that even if there remains only 42 entries, there are still 163 categories as before.
To solve this, we can do the following:
1 |
|
1 |
|
Let’s see what’s going on.
df2["country"].cat
is a pandas.core.arrays.categorical.CategoricalAccessor
object. The method Series.cat.remove_unused_categories()
removes categories which are not used and returns a Series
(DataFrame column).
The above can also be done by:
1 |
|
or
1 |
|