libcudf  24.02.00
Classes | Public Member Functions | Static Public Member Functions | List of all members
cudf::io::datasource Class Referenceabstract

Interface class for providing input data to the readers. More...

#include <datasource.hpp>

Inheritance diagram for cudf::io::datasource:
cudf::io::arrow_io_source cudf::io::external::kafka::kafka_consumer

Classes

class  buffer
 Interface class for buffers that the datasource returns to the caller. More...
 
class  non_owning_buffer
 Implementation for non owning buffer where datasource holds buffer until destruction. More...
 
class  owning_buffer
 Derived implementation of buffer that owns the data. More...
 

Public Member Functions

virtual ~datasource ()
 Base class destructor.
 
virtual std::unique_ptr< datasource::bufferhost_read (size_t offset, size_t size)=0
 Returns a buffer with a subset of data from the source. More...
 
virtual size_t host_read (size_t offset, size_t size, uint8_t *dst)=0
 Reads a selected range into a preallocated buffer. More...
 
virtual bool supports_device_read () const
 Whether or not this source supports reading directly into device memory. More...
 
virtual bool is_device_read_preferred (size_t size) const
 Estimates whether a direct device read would be more optimal for the given size. More...
 
virtual std::unique_ptr< datasource::bufferdevice_read (size_t offset, size_t size, rmm::cuda_stream_view stream)
 Returns a device buffer with a subset of data from the source. More...
 
virtual size_t device_read (size_t offset, size_t size, uint8_t *dst, rmm::cuda_stream_view stream)
 Reads a selected range into a preallocated device buffer. More...
 
virtual std::future< size_t > device_read_async (size_t offset, size_t size, uint8_t *dst, rmm::cuda_stream_view stream)
 Asynchronously reads a selected range into a preallocated device buffer. More...
 
virtual size_t size () const =0
 Returns the size of the data in the source. More...
 
virtual bool is_empty () const
 Returns whether the source contains any data. More...
 

Static Public Member Functions

static std::unique_ptr< datasourcecreate (std::string const &filepath, size_t offset=0, size_t size=0)
 Creates a source from a file path. More...
 
static std::unique_ptr< datasourcecreate (host_buffer const &buffer)
 Creates a source from a host memory buffer. More...
 
static std::unique_ptr< datasourcecreate (cudf::host_span< std::byte const > buffer)
 Creates a source from a host memory buffer. More...
 
static std::unique_ptr< datasourcecreate (cudf::device_span< std::byte const > buffer)
 Creates a source from a device memory buffer. More...
 
static std::unique_ptr< datasourcecreate (datasource *source)
 Creates a source from an user implemented datasource object. More...
 
template<typename T >
static std::vector< std::unique_ptr< datasource > > create (std::vector< T > const &args)
 Creates a vector of datasources, one per element in the input vector. More...
 

Detailed Description

Interface class for providing input data to the readers.

Definition at line 41 of file datasource.hpp.

Member Function Documentation

◆ create() [1/6]

static std::unique_ptr<datasource> cudf::io::datasource::create ( cudf::device_span< std::byte const >  buffer)
static

Creates a source from a device memory buffer.

Parameters
bufferDevice buffer object
Returns
Constructed datasource object

◆ create() [2/6]

static std::unique_ptr<datasource> cudf::io::datasource::create ( cudf::host_span< std::byte const >  buffer)
static

Creates a source from a host memory buffer.

Parameters
[in]bufferHost buffer object
Returns
Constructed datasource object

◆ create() [3/6]

static std::unique_ptr<datasource> cudf::io::datasource::create ( datasource source)
static

Creates a source from an user implemented datasource object.

Parameters
[in]sourceNon-owning pointer to the datasource object
Returns
Constructed datasource object

◆ create() [4/6]

static std::unique_ptr<datasource> cudf::io::datasource::create ( host_buffer const &  buffer)
static

Creates a source from a host memory buffer.

@deprecated Since 23.04

Parameters
[in]bufferHost buffer object
Returns
Constructed datasource object

◆ create() [5/6]

static std::unique_ptr<datasource> cudf::io::datasource::create ( std::string const &  filepath,
size_t  offset = 0,
size_t  size = 0 
)
static

Creates a source from a file path.

Parameters
[in]filepathPath to the file to use
[in]offsetBytes from the start of the file (the default is zero)
[in]sizeBytes from the offset; use zero for entire file (the default is zero)
Returns
Constructed datasource object

◆ create() [6/6]

template<typename T >
static std::vector<std::unique_ptr<datasource> > cudf::io::datasource::create ( std::vector< T > const &  args)
inlinestatic

Creates a vector of datasources, one per element in the input vector.

Parameters
[in]argsvector of parameters
Returns
Constructed vector of datasource objects

Definition at line 138 of file datasource.hpp.

◆ device_read() [1/2]

virtual std::unique_ptr<datasource::buffer> cudf::io::datasource::device_read ( size_t  offset,
size_t  size,
rmm::cuda_stream_view  stream 
)
inlinevirtual

Returns a device buffer with a subset of data from the source.

For optimal performance, should only be called when is_device_read_preferred returns true. Data source implementations that don't support direct device reads don't need to override this function.

Exceptions
cudf::logic_errorthe object does not support direct device reads, i.e. supports_device_read returns false.
Parameters
offsetNumber of bytes from the start
sizeNumber of bytes to read
streamCUDA stream to use
Returns
The data buffer in the device memory

Definition at line 215 of file datasource.hpp.

◆ device_read() [2/2]

virtual size_t cudf::io::datasource::device_read ( size_t  offset,
size_t  size,
uint8_t *  dst,
rmm::cuda_stream_view  stream 
)
inlinevirtual

Reads a selected range into a preallocated device buffer.

For optimal performance, should only be called when is_device_read_preferred returns true. Data source implementations that don't support direct device reads don't need to override this function.

Exceptions
cudf::logic_errorwhen the object does not support direct device reads, i.e. supports_device_read returns false.
Parameters
offsetNumber of bytes from the start
sizeNumber of bytes to read
dstAddress of the existing device memory
streamCUDA stream to use
Returns
The number of bytes read (can be smaller than size)

Definition at line 239 of file datasource.hpp.

◆ device_read_async()

virtual std::future<size_t> cudf::io::datasource::device_read_async ( size_t  offset,
size_t  size,
uint8_t *  dst,
rmm::cuda_stream_view  stream 
)
inlinevirtual

Asynchronously reads a selected range into a preallocated device buffer.

Returns a future value that contains the number of bytes read. Calling get() method of the return value synchronizes this function.

For optimal performance, should only be called when is_device_read_preferred returns true. Data source implementations that don't support direct device reads don't need to override this function.

Exceptions
cudf::logic_errorwhen the object does not support direct device reads, i.e. supports_device_read returns false.
Parameters
offsetNumber of bytes from the start
sizeNumber of bytes to read
dstAddress of the existing device memory
streamCUDA stream to use
Returns
The number of bytes read as a future value (can be smaller than size)

Definition at line 264 of file datasource.hpp.

◆ host_read() [1/2]

virtual std::unique_ptr<datasource::buffer> cudf::io::datasource::host_read ( size_t  offset,
size_t  size 
)
pure virtual

Returns a buffer with a subset of data from the source.

Parameters
[in]offsetBytes from the start
[in]sizeBytes to read
Returns
The data buffer (can be smaller than size)

Implemented in cudf::io::external::kafka::kafka_consumer, and cudf::io::arrow_io_source.

◆ host_read() [2/2]

virtual size_t cudf::io::datasource::host_read ( size_t  offset,
size_t  size,
uint8_t *  dst 
)
pure virtual

Reads a selected range into a preallocated buffer.

Parameters
[in]offsetBytes from the start
[in]sizeBytes to read
[in]dstAddress of the existing host memory
Returns
The number of bytes read (can be smaller than size)

Implemented in cudf::io::external::kafka::kafka_consumer, and cudf::io::arrow_io_source.

◆ is_device_read_preferred()

virtual bool cudf::io::datasource::is_device_read_preferred ( size_t  size) const
inlinevirtual

Estimates whether a direct device read would be more optimal for the given size.

Parameters
sizeNumber of bytes to read
Returns
whether the device read is expected to be more performant for the given size

Definition at line 194 of file datasource.hpp.

◆ is_empty()

virtual bool cudf::io::datasource::is_empty ( ) const
inlinevirtual

Returns whether the source contains any data.

Returns
True if there is data, False otherwise

Definition at line 284 of file datasource.hpp.

◆ size()

virtual size_t cudf::io::datasource::size ( ) const
pure virtual

Returns the size of the data in the source.

Returns
The size of the source data in bytes

Implemented in cudf::io::external::kafka::kafka_consumer, and cudf::io::arrow_io_source.

◆ supports_device_read()

virtual bool cudf::io::datasource::supports_device_read ( ) const
inlinevirtual

Whether or not this source supports reading directly into device memory.

If this function returns true, the datasource will receive calls to device_read() instead of host_read() when the reader processes the data on the device. Most readers will still make host_read() calls, for the parts of input that are processed on the host (e.g. metadata).

Data source implementations that don't support direct device reads don't need to override this function. The implementations that do should override it to return false.

Returns
bool Whether this source supports device_read() calls

Definition at line 186 of file datasource.hpp.


The documentation for this class was generated from the following file: