Python & Pandas: How to query if a list-type column contains something?

Question

I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example

df['genre']

## returns 

0       ['comedy', 'sci-fi']
1       ['action', 'romance', 'comedy']
2       ['documentary']
3       ['crime','horror']
...

I want to know how can I query the df, so it returns the movie belongs to a cerain genre?

For example, something may like df['genre'].contains('comedy') returns 0, 1.

I know for a list, I can do things like

'comedy' in  ['comedy', 'sci-fi']

but in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.


Show source
| pandas   | python   2017-01-07 08:01 3 Answers

Answers to Python & Pandas: How to query if a list-type column contains something? ( 3 )

  1. 2017-01-07 08:01

    You can use apply for create mask and then boolean indexing:

    mask = df.genre.apply(lambda x: 'comedy' in x)
    df1 = df[mask]
    print (df1)
                           genre
    0           [comedy, sci-fi]
    1  [action, romance, comedy]
    
  2. 2017-01-07 09:01

    using sets

    df.genre.map(set(['comedy']).issubset)
    
    0     True
    1     True
    2    False
    3    False
    dtype: bool
    

    df.genre[df.genre.map(set(['comedy']).issubset)]
    
    0             [comedy, sci-fi]
    1    [action, romance, comedy]
    dtype: object
    

    presented in a way I like better

    comedy = set(['comedy'])
    iscomedy = comedy.issubset
    df[df.genre.map(iscomedy)]
    

    more efficient

    comedy = set(['comedy'])
    iscomedy = comedy.issubset
    df[[iscomedy(l) for l in df.genre.values.tolist()]]
    

    using str in two passes
    slow! and not perfectly accurate!

    df[df.genre.str.join(' ').str.contains('comedy')]
    
  3. 2017-01-07 09:01

    According to the source code, you can use .str.contains(..., regex=False).

Leave a reply to - Python & Pandas: How to query if a list-type column contains something?

◀ Go back