libcudf  23.12.00
Public Member Functions | Static Public Member Functions | List of all members
cudf::io::parquet_reader_options Class Reference

Settings for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions

 parquet_reader_options ()=default
 Default constructor. More...
 
source_info const & get_source () const
 Returns source info. More...
 
bool is_enabled_convert_strings_to_categories () const
 Returns true/false depending on whether strings should be converted to categories or not. More...
 
bool is_enabled_use_pandas_metadata () const
 Returns true/false depending whether to use pandas metadata or not while reading. More...
 
std::optional< std::vector< reader_column_schema > > get_column_schema () const
 Returns optional tree of metadata. More...
 
int64_t get_skip_rows () const
 Returns number of rows to skip from the start. More...
 
std::optional< size_type > const & get_num_rows () const
 Returns number of rows to read. More...
 
auto const & get_columns () const
 Returns names of column to be read, if set. More...
 
auto const & get_row_groups () const
 Returns list of individual row groups to be read. More...
 
auto const & get_filter () const
 Returns AST based filter for predicate pushdown. More...
 
data_type get_timestamp_type () const
 Returns timestamp type used to cast timestamp columns. More...
 
void set_columns (std::vector< std::string > col_names)
 Sets names of the columns to be read. More...
 
void set_row_groups (std::vector< std::vector< size_type >> row_groups)
 Sets vector of individual row groups to read. More...
 
void set_filter (ast::expression const &filter)
 Sets AST based filter for predicate pushdown. More...
 
void enable_convert_strings_to_categories (bool val)
 Sets to enable/disable conversion of strings to categories. More...
 
void enable_use_pandas_metadata (bool val)
 Sets to enable/disable use of pandas metadata to read. More...
 
void set_column_schema (std::vector< reader_column_schema > val)
 Sets reader column schema. More...
 
void set_skip_rows (int64_t val)
 Sets number of rows to skip. More...
 
void set_num_rows (size_type val)
 Sets number of rows to read. More...
 
void set_timestamp_type (data_type type)
 Sets timestamp_type used to cast timestamp columns. More...
 

Static Public Member Functions

static parquet_reader_options_builder builder (source_info src)
 Creates a parquet_reader_options_builder which will build parquet_reader_options. More...
 

Detailed Description

Settings for read_parquet().

Definition at line 53 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options()

cudf::io::parquet_reader_options::parquet_reader_options ( )
explicitdefault

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack.

Member Function Documentation

◆ builder()

static parquet_reader_options_builder cudf::io::parquet_reader_options::builder ( source_info  src)
static

Creates a parquet_reader_options_builder which will build parquet_reader_options.

Parameters
srcSource information to read parquet file
Returns
Builder to build reader options

◆ enable_convert_strings_to_categories()

void cudf::io::parquet_reader_options::enable_convert_strings_to_categories ( bool  val)
inline

Sets to enable/disable conversion of strings to categories.

Parameters
valBoolean value to enable/disable conversion of string columns to categories

Definition at line 207 of file parquet.hpp.

◆ enable_use_pandas_metadata()

void cudf::io::parquet_reader_options::enable_use_pandas_metadata ( bool  val)
inline

Sets to enable/disable use of pandas metadata to read.

Parameters
valBoolean value whether to use pandas metadata

Definition at line 214 of file parquet.hpp.

◆ get_column_schema()

std::optional<std::vector<reader_column_schema> > cudf::io::parquet_reader_options::get_column_schema ( ) const
inline

Returns optional tree of metadata.

Returns
vector of reader_column_schema objects.

Definition at line 133 of file parquet.hpp.

◆ get_columns()

auto const& cudf::io::parquet_reader_options::get_columns ( ) const
inline

Returns names of column to be read, if set.

Returns
Names of column to be read; nullopt if the option is not set

Definition at line 158 of file parquet.hpp.

◆ get_filter()

auto const& cudf::io::parquet_reader_options::get_filter ( ) const
inline

Returns AST based filter for predicate pushdown.

Returns
AST expression to use as filter

Definition at line 172 of file parquet.hpp.

◆ get_num_rows()

std::optional<size_type> const& cudf::io::parquet_reader_options::get_num_rows ( ) const
inline

Returns number of rows to read.

Returns
Number of rows to read; nullopt if the option hasn't been set (in which case the file is read until the end)

Definition at line 151 of file parquet.hpp.

◆ get_row_groups()

auto const& cudf::io::parquet_reader_options::get_row_groups ( ) const
inline

Returns list of individual row groups to be read.

Returns
List of individual row groups to be read

Definition at line 165 of file parquet.hpp.

◆ get_skip_rows()

int64_t cudf::io::parquet_reader_options::get_skip_rows ( ) const
inline

Returns number of rows to skip from the start.

Returns
Number of rows to skip from the start

Definition at line 143 of file parquet.hpp.

◆ get_source()

source_info const& cudf::io::parquet_reader_options::get_source ( ) const
inline

Returns source info.

Returns
Source info

Definition at line 108 of file parquet.hpp.

◆ get_timestamp_type()

data_type cudf::io::parquet_reader_options::get_timestamp_type ( ) const
inline

Returns timestamp type used to cast timestamp columns.

Returns
Timestamp type used to cast timestamp columns

Definition at line 179 of file parquet.hpp.

◆ is_enabled_convert_strings_to_categories()

bool cudf::io::parquet_reader_options::is_enabled_convert_strings_to_categories ( ) const
inline

Returns true/false depending on whether strings should be converted to categories or not.

Returns
true if strings should be converted to categories

Definition at line 116 of file parquet.hpp.

◆ is_enabled_use_pandas_metadata()

bool cudf::io::parquet_reader_options::is_enabled_use_pandas_metadata ( ) const
inline

Returns true/false depending whether to use pandas metadata or not while reading.

Returns
true if pandas metadata is used while reading

Definition at line 126 of file parquet.hpp.

◆ set_column_schema()

void cudf::io::parquet_reader_options::set_column_schema ( std::vector< reader_column_schema val)
inline

Sets reader column schema.

Parameters
valTree of schema nodes to enable/disable conversion of binary to string columns. Note default is to convert to string columns.

Definition at line 222 of file parquet.hpp.

◆ set_columns()

void cudf::io::parquet_reader_options::set_columns ( std::vector< std::string >  col_names)
inline

Sets names of the columns to be read.

Parameters
col_namesVector of column names

Definition at line 186 of file parquet.hpp.

◆ set_filter()

void cudf::io::parquet_reader_options::set_filter ( ast::expression const &  filter)
inline

Sets AST based filter for predicate pushdown.

Parameters
filterAST expression to use as filter

Definition at line 200 of file parquet.hpp.

◆ set_num_rows()

void cudf::io::parquet_reader_options::set_num_rows ( size_type  val)

Sets number of rows to read.

Parameters
valNumber of rows to read after skip

◆ set_row_groups()

void cudf::io::parquet_reader_options::set_row_groups ( std::vector< std::vector< size_type >>  row_groups)

Sets vector of individual row groups to read.

Parameters
row_groupsVector of row groups to read

◆ set_skip_rows()

void cudf::io::parquet_reader_options::set_skip_rows ( int64_t  val)

Sets number of rows to skip.

Parameters
valNumber of rows to skip from start

◆ set_timestamp_type()

void cudf::io::parquet_reader_options::set_timestamp_type ( data_type  type)
inline

Sets timestamp_type used to cast timestamp columns.

Parameters
typeThe timestamp data_type to which all timestamp columns need to be cast

Definition at line 246 of file parquet.hpp.


The documentation for this class was generated from the following file: