libcudf  24.04.00
Files | Enumerations | Functions
Combining

Files

file  strings/combine.hpp
 Strings APIs for concatenate and join.
 

Enumerations

enum class  cudf::strings::separator_on_nulls { cudf::strings::YES , cudf::strings::NO }
 Setting for specifying how separators are added with null strings elements. More...
 
enum class  cudf::strings::output_if_empty_list { cudf::strings::EMPTY_STRING , cudf::strings::NULL_ELEMENT }
 Setting for specifying what will be output from join_list_elements when an input list is empty. More...
 

Functions

std::unique_ptr< columncudf::strings::join_strings (strings_column_view const &input, string_scalar const &separator=string_scalar(""), string_scalar const &narep=string_scalar("", false), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Concatenates all strings in the column into one new string delimited by an optional separator string. More...
 
std::unique_ptr< columncudf::strings::concatenate (table_view const &strings_columns, strings_column_view const &separators, string_scalar const &separator_narep=string_scalar("", false), string_scalar const &col_narep=string_scalar("", false), separator_on_nulls separate_nulls=separator_on_nulls::YES, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Concatenates a list of strings columns using separators for each row and returns the result as a strings column. More...
 
std::unique_ptr< columncudf::strings::concatenate (table_view const &strings_columns, string_scalar const &separator=string_scalar(""), string_scalar const &narep=string_scalar("", false), separator_on_nulls separate_nulls=separator_on_nulls::YES, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Row-wise concatenates the given list of strings columns and returns a single strings column result. More...
 
std::unique_ptr< columncudf::strings::join_list_elements (lists_column_view const &lists_strings_column, strings_column_view const &separators, string_scalar const &separator_narep=string_scalar("", false), string_scalar const &string_narep=string_scalar("", false), separator_on_nulls separate_nulls=separator_on_nulls::YES, output_if_empty_list empty_list_policy=output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. More...
 
std::unique_ptr< columncudf::strings::join_list_elements (lists_column_view const &lists_strings_column, string_scalar const &separator=string_scalar(""), string_scalar const &narep=string_scalar("", false), separator_on_nulls separate_nulls=separator_on_nulls::YES, output_if_empty_list empty_list_policy=output_if_empty_list::EMPTY_STRING, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result. More...
 

Detailed Description

Enumeration Type Documentation

◆ output_if_empty_list

Setting for specifying what will be output from join_list_elements when an input list is empty.

Enumerator
EMPTY_STRING 

Empty list will result in empty string.

NULL_ELEMENT 

Empty list will result in a null.

Definition at line 48 of file strings/combine.hpp.

◆ separator_on_nulls

Setting for specifying how separators are added with null strings elements.

Enumerator
YES 

Always add separators between elements.

NO 

Do not add separators if an element is null.

Definition at line 39 of file strings/combine.hpp.

Function Documentation

◆ concatenate() [1/2]

std::unique_ptr<column> cudf::strings::concatenate ( table_view const &  strings_columns,
string_scalar const &  separator = string_scalar(""),
string_scalar const &  narep = string_scalar("", false),
separator_on_nulls  separate_nulls = separator_on_nulls::YES,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Row-wise concatenates the given list of strings columns and returns a single strings column result.

Each new string is created by concatenating the strings from the same row delimited by the separator provided.

Any row with a null entry will result in the corresponding output row to be null entry unless a narep string is specified to be used in its place.

If separate_nulls is set to NO and narep is valid then separators are not added to the output between null elements. Otherwise, separators are always added if narep is valid.

More than one column must be specified in the input strings_columns table.

Example:
s1 = ['aa', null, '', 'dd']
s2 = ['', 'bb', 'cc', null]
out = concatenate({s1, s2})
out is ['aa', null, 'cc', null]
out = concatenate({s1, s2}, ':', '_')
out is ['aa:', '_:bb', ':cc', 'dd:_']
out = concatenate({s1, s2}, ':', '', separator_on_nulls::NO)
out is ['aa:', 'bb', ':cc', 'dd']
Exceptions
cudf::logic_errorif input columns are not all strings columns.
cudf::logic_errorif separator is not valid.
cudf::logic_errorif only one column is specified
Parameters
strings_columnsList of string columns to concatenate
separatorString that should inserted between each string from each row. Default is an empty string.
narepString to replace any null strings found in any column. Default of invalid-scalar means any null entry in any column will produces a null result for that row.
separate_nullsIf YES, then the separator is included for null rows if narep is valid
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with concatenated results

◆ concatenate() [2/2]

std::unique_ptr<column> cudf::strings::concatenate ( table_view const &  strings_columns,
strings_column_view const &  separators,
string_scalar const &  separator_narep = string_scalar("", false),
string_scalar const &  col_narep = string_scalar("", false),
separator_on_nulls  separate_nulls = separator_on_nulls::YES,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Concatenates a list of strings columns using separators for each row and returns the result as a strings column.

Each new string is created by concatenating the strings from the same row delimited by the row separator provided for that row. The following rules are applicable:

  • If row separator for a given row is null, output column for that row is null, unless there is a valid separator_narep
  • The separator is applied between two output row values if the separate_nulls is YES or only between valid rows if separate_nulls is NO.
  • If separator_narep and col_narep are both valid, the output column is always non nullable
Example:
c0 = ['aa', null, '', 'ee', null, 'ff']
c1 = [null, 'cc', 'dd', null, null, 'gg']
c2 = ['bb', '', null, null, null, 'hh']
sep = ['::', '%%', '^^', '!', '*', null]
out = concatenate({c0, c1, c2}, sep)
// all rows have at least one null or sep[i]==null
out is [null, null, null, null, null, null]
sep_rep = '+'
out = concatenate({c0, c1, c2}, sep, sep_rep)
// all rows with at least one null output as null
out is [null, null, null, null, null, 'ff+gg+hh']
col_narep = '-'
sep_na = non-valid scalar
out = concatenate({c0, c1, c2}, sep, sep_na, col_narep)
// only the null entry in the sep column produces a null row
out is ['aa::-::bb', '-%%cc%%', '^^dd^^-', 'ee!-!-', '-*-*-', null]
col_narep = ''
out = concatenate({c0, c1, c2}, sep, sep_rep, col_narep, separator_on_nulls:NO)
// parameter suppresses separator for null rows
out is ['aa::bb', 'cc%%', '^^dd', 'ee', '', 'ff+gg+hh']
Exceptions
cudf::logic_errorif no input columns are specified - table view is empty
cudf::logic_errorif input columns are not all strings columns.
cudf::logic_errorif the number of rows from separators and strings_columns do not match
Parameters
strings_columnsList of strings columns to concatenate
separatorsStrings column that provides the separator for a given row
separator_narepString to replace a null separator for a given row. Default of invalid-scalar means no row separator value replacements.
col_narepString that should be used in place of any null strings found in any column. Default of invalid-scalar means no null column value replacements.
separate_nullsIf YES, then the separator is included for null rows if col_narep is valid.
streamCUDA stream used for device memory operations and kernel launches
mrResource for allocating device memory
Returns
New column with concatenated results

◆ join_list_elements() [1/2]

std::unique_ptr<column> cudf::strings::join_list_elements ( lists_column_view const &  lists_strings_column,
string_scalar const &  separator = string_scalar(""),
string_scalar const &  narep = string_scalar("", false),
separator_on_nulls  separate_nulls = separator_on_nulls::YES,
output_if_empty_list  empty_list_policy = output_if_empty_list::EMPTY_STRING,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.

Each new string is created by concatenating the strings from the same row (same list element) delimited by the separator provided.

A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a narep string is specified to be used in its place.

If separate_nulls is set to NO and narep is valid then separators are not added to the output between null elements. Otherwise, separators are always added if narep is valid.

If empty_list_policy is set to EMPTY_STRING, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.

In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of narep and separate_nulls values.

Example:
s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff'] ]
out = join_list_elements(s)
out is ['aabbcc', null, 'dd', null, 'ff']
out = join_list_elements(s, ':', '_')
out is ['aa:bb:cc', null, ':dd', 'ee:_', 'ff']
out = join_list_elements(s, ':', '', separator_on_nulls::NO)
out is ['aa:bb:cc', null, ':dd', 'ee', 'ff']
Exceptions
cudf::logic_errorif input column is not lists of strings column.
cudf::logic_errorif separator is not valid.
Parameters
lists_strings_columnColumn containing lists of strings to concatenate
separatorString to insert between strings of each list row. Default is an empty string.
narepString to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nullsIf YES, then the separator is included for null rows if narep is valid
empty_list_policyIf set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column with concatenated results

◆ join_list_elements() [2/2]

std::unique_ptr<column> cudf::strings::join_list_elements ( lists_column_view const &  lists_strings_column,
strings_column_view const &  separators,
string_scalar const &  separator_narep = string_scalar("", false),
string_scalar const &  string_narep = string_scalar("", false),
separator_on_nulls  separate_nulls = separator_on_nulls::YES,
output_if_empty_list  empty_list_policy = output_if_empty_list::EMPTY_STRING,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Given a lists column of strings (each row is a list of strings), concatenates the strings within each row and returns a single strings column result.

Each new string is created by concatenating the strings from the same row (same list element) delimited by the row separator provided in the separators strings column.

A null list row will always result in a null string in the output row. Any non-null list row having a null element will result in the corresponding output row to be null unless a valid string_narep scalar is provided to be used in its place. Any null row in the separators column will also result in a null output row unless a valid separator_narep scalar is provided to be used in place of the null separators.

If separate_nulls is set to NO and string_narep is valid then separators are not added to the output between null elements. Otherwise, separators are always added if string_narep is valid.

If empty_list_policy is set to EMPTY_STRING, any row that is an empty list will result in an empty output string. Otherwise, the output will be a null.

In the special case when the input list row contains all null elements, the output will be the same as in case of empty input list regardless of string_narep and separate_nulls values.

Example:
s = [ ['aa', 'bb', 'cc'], null, ['', 'dd'], ['ee', null], ['ff', 'gg'] ]
sep = ['::', '%%', '!', '*', null]
out = join_list_elements(s, sep)
out is ['aa::bb::cc', null, '!dd', null, null]
out = join_list_elements(s, sep, ':', '_')
out is ['aa::bb::cc', null, '!dd', 'ee*_', 'ff:gg']
out = join_list_elements(s, sep, ':', '', separator_on_nulls::NO)
out is ['aa::bb::cc', null, '!dd', 'ee', 'ff:gg']
Exceptions
cudf::logic_errorif input column is not lists of strings column.
cudf::logic_errorif the number of rows from separators and lists_strings_column do not match
Parameters
lists_strings_columnColumn containing lists of strings to concatenate
separatorsStrings column that provides separators for concatenation
separator_narepString that should be used to replace a null separator. Default is an invalid-scalar denoting that rows containing null separator will result in a null string in the corresponding output rows.
string_narepString to replace null strings in any non-null list row. Default is an invalid-scalar denoting that list rows containing null strings will result in a null string in the corresponding output rows.
separate_nullsIf YES, then the separator is included for null rows if narep is valid
empty_list_policyIf set to EMPTY_STRING, any input row that is an empty list will result in an empty string. Otherwise, it will result in a null.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New strings column with concatenated results

◆ join_strings()

std::unique_ptr<column> cudf::strings::join_strings ( strings_column_view const &  input,
string_scalar const &  separator = string_scalar(""),
string_scalar const &  narep = string_scalar("", false),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Concatenates all strings in the column into one new string delimited by an optional separator string.

This returns a column with one string. Any null entries are ignored unless the narep parameter specifies a replacement string.

Example:
s = ['aa', null, '', 'zz' ]
r = join_strings(s,':','_')
r is ['aa:_::zz']
Exceptions
cudf::logic_errorif separator is not valid.
Parameters
inputStrings for this operation
separatorString that should inserted between each string. Default is an empty string.
narepString to replace any null strings found. Default of invalid-scalar will ignore any null entries.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New column containing one string.