cudf.core.column.string.StringMethods.character_ngrams#
- StringMethods.character_ngrams(n: int = 2, as_list: bool = False) SeriesOrIndex #
Generate the n-grams from characters in a column of strings.
- Parameters
- nint
The degree of the n-gram (number of consecutive characters). Default of 2 for bigrams.
- as_listbool
Set to True to return ngrams in a list column where each list element is the ngrams for each string.
Examples
>>> import cudf >>> str_series = cudf.Series(['abcd','efgh','xyz']) >>> str_series.str.character_ngrams(2) 0 ab 1 bc 2 cd 3 ef 4 fg 5 gh 6 xy 7 yz dtype: object >>> str_series.str.character_ngrams(3) 0 abc 1 bcd 2 efg 3 fgh 4 xyz dtype: object >>> str_series.str.character_ngrams(3,True) 0 [abc, bcd] 1 [efg, fgh] 2 [xyz] dtype: list