libcudf  24.04.00
Files | Functions

Files

file  capitalize.hpp
 
file  case.hpp
 

Functions

std::unique_ptr< columncudf::strings::capitalize (strings_column_view const &input, string_scalar const &delimiters=string_scalar("", true, cudf::get_default_stream()), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of capitalized strings. More...
 
std::unique_ptr< columncudf::strings::title (strings_column_view const &input, string_character_types sequence_type=string_character_types::ALPHA, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Modifies first character of each word to upper-case and lower-cases the rest. More...
 
std::unique_ptr< columncudf::strings::is_title (strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Checks if the strings in the input column are title formatted. More...
 
std::unique_ptr< columncudf::strings::to_lower (strings_column_view const &strings, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts a column of strings to lower case. More...
 
std::unique_ptr< columncudf::strings::to_upper (strings_column_view const &strings, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Converts a column of strings to upper case. More...
 
std::unique_ptr< columncudf::strings::swapcase (strings_column_view const &strings, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a column of strings converting lower case characters to upper case and vice versa. More...
 

Detailed Description

Function Documentation

◆ capitalize()

std::unique_ptr<column> cudf::strings::capitalize ( strings_column_view const &  input,
string_scalar const &  delimiters = string_scalar("", true, cudf::get_default_stream()),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Returns a column of capitalized strings.

If the delimiters is an empty string, then only the first character of each row is capitalized. Otherwise, a non-delimiter character is capitalized after any delimiter character is found.

Example:
input = ["tesT1", "a Test", "Another Test", "a\tb"];
output = capitalize(input)
output is ["Test1", "A test", "Another test", "A\tb"]
output = capitalize(input, " ")
output is ["Test1", "A Test", "Another Test", "A\tb"]
output = capitalize(input, " \t")
output is ["Test1", "A Test", "Another Test", "A\tB"]

Any null string entries return corresponding null output column entries.

Exceptions
cudf::logic_errorif delimiter.is_valid() is false.
Parameters
inputString column
delimitersCharacters for identifying words to capitalize
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
Column of strings capitalized from the input column

◆ is_title()

std::unique_ptr<column> cudf::strings::is_title ( strings_column_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Checks if the strings in the input column are title formatted.

The first character of each word should be upper-case while all other characters should be lower-case. A word is a sequence of upper-case and lower-case characters.

This function returns a column of booleans indicating true if the string in the input row is in title format and false if not.

Example:
input = [" Test1", "A Test", " Another test ", "N2Vidia Corp", "!Abc"];
output = is_title(input)
output is [true, true, false, true, true]

Any null string entries result in corresponding null output column entries.

Parameters
inputString column
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
Column of type BOOL8

◆ swapcase()

std::unique_ptr<column> cudf::strings::swapcase ( strings_column_view const &  strings,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Returns a column of strings converting lower case characters to upper case and vice versa.

Only upper or lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters
stringsStrings instance for this operation.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column of strings with characters converted.

◆ title()

std::unique_ptr<column> cudf::strings::title ( strings_column_view const &  input,
string_character_types  sequence_type = string_character_types::ALPHA,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Modifies first character of each word to upper-case and lower-cases the rest.

A word here is a sequence of characters of sequence_type delimited by any characters not part of the sequence_type character set.

This function returns a column of strings where, for each string row in the input, the first character of each word is converted to upper-case, while all the remaining characters in a word are converted to lower-case.

Example:
input = [" teST1", "a Test", " Another test ", "n2vidia"];
output = title(input)
output is [" Test1", "A Test", " Another Test ", "N2Vidia"]
output = title(input,ALPHANUM)
output is [" Test1", "A Test", " Another Test ", "N2vidia"]

Any null string entries return corresponding null output column entries.

Parameters
inputString column
sequence_typeThe character type that is used when identifying words
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
Column of titled strings

◆ to_lower()

std::unique_ptr<column> cudf::strings::to_lower ( strings_column_view const &  strings,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Converts a column of strings to lower case.

Only upper case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters
stringsStrings instance for this operation.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column of strings with characters converted.

◆ to_upper()

std::unique_ptr<column> cudf::strings::to_upper ( strings_column_view const &  strings,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Converts a column of strings to upper case.

Only lower case alphabetical characters are converted. All other characters are copied. Case conversion may result in strings that are longer or shorter than the original string in bytes.

Any null entries create null entries in the output column.

Parameters
stringsStrings instance for this operation.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column of strings with characters converted.