cudf.core.column.string.StringMethods.tokenize#

StringMethods.tokenize(delimiter: str = ' ') → SeriesOrIndex#

Each string is split into tokens using the provided delimiter(s). The sequence returned contains the tokens in the order they were found.

Parameters

delimiterstr or list of strs, Default is whitespace.: The string used to locate the split points of each string.

Returns

Series or Index of object.

Examples

>>> import cudf
>>> data = ["hello world", "goodbye world", "hello goodbye"]
>>> ser = cudf.Series(data)
>>> ser.str.tokenize()
0      hello
1      world
2    goodbye
3      world
4      hello
5    goodbye
dtype: object