Python & Pandas: How to query if a list-type column contains something?


I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example


## returns 

0       ['comedy', 'sci-fi']
1       ['action', 'romance', 'comedy']
2       ['documentary']
3       ['crime','horror']

I want to know how can I query the df, so it returns the movie belongs to a cerain genre?

For example, something may like df['genre'].contains('comedy') returns 0, 1.

I know for a list, I can do things like

'comedy' in  ['comedy', 'sci-fi']

but in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.

Show source
| pandas   | python   2017-01-07 08:01 3 Answers

Answers ( 3 )

  1. 2017-01-07 08:01

    You can use apply for create mask and then boolean indexing:

    mask = df.genre.apply(lambda x: 'comedy' in x)
    df1 = df[mask]
    print (df1)
    0           [comedy, sci-fi]
    1  [action, romance, comedy]
  2. 2017-01-07 09:01

    using sets['comedy']).issubset)
    0     True
    1     True
    2    False
    3    False
    dtype: bool

    0             [comedy, sci-fi]
    1    [action, romance, comedy]
    dtype: object

    presented in a way I like better

    comedy = set(['comedy'])
    iscomedy = comedy.issubset

    more efficient

    comedy = set(['comedy'])
    iscomedy = comedy.issubset
    df[[iscomedy(l) for l in df.genre.values.tolist()]]

    using str in two passes
    slow! and not perfectly accurate!

    df[df.genre.str.join(' ').str.contains('comedy')]
  3. 2017-01-07 09:01

    According to the source code, you can use .str.contains(..., regex=False).

◀ Go back