cudf.core.column.string.StringMethods.tokenize#

StringMethods.tokenize(delimiter: str = ' ') SeriesOrIndex#

Each string is split into tokens using the provided delimiter(s). The sequence returned contains the tokens in the order they were found.

Parameters
delimiterstr or list of strs, Default is whitespace.

The string used to locate the split points of each string.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> data = ["hello world", "goodbye world", "hello goodbye"]
>>> ser = cudf.Series(data)
>>> ser.str.tokenize()
0      hello
1      world
2    goodbye
3      world
4      hello
5    goodbye
dtype: object