Table of Contents

Class ParquetDataConnector

Namespace
Datafication.Connectors.ParquetConnector
Assembly
Datafication.ParquetConnector.dll

Provides functionality to connect to and retrieve data from a Parquet data source.

public class ParquetDataConnector : IDataConnector
Inheritance
object
ParquetDataConnector
Implements
Extension Methods

Remarks

This connector utilizes the provided configuration to establish a connection to the Parquet data source, either local or remote, and then extracts data records from the retrieved Parquet.

Constructors

ParquetDataConnector(ParquetConnectorConfiguration)

Initializes a new instance of the ParquetDataConnector class.

public ParquetDataConnector(ParquetConnectorConfiguration configuration)

Parameters

configuration ParquetConnectorConfiguration

The configuration to be used by the connector.

Exceptions

ArgumentNullException

Thrown if the provided configuration is null.

ArgumentException

Thrown when the configuration validation fails.

Properties

Configuration

Gets the configuration used by this Parquet data connector.

public ParquetConnectorConfiguration Configuration { get; }

Property Value

ParquetConnectorConfiguration

Methods

GetConnectorId()

Retrieves the unique identifier associated with this connector's configuration.

public string GetConnectorId()

Returns

string

The unique identifier of the configuration.

GetDataAsync()

Asynchronously retrieves data from the Parquet file specified in the configuration.

public Task<DataBlock> GetDataAsync()

Returns

Task<DataBlock>

A task that represents the asynchronous operation. The task result contains a DataBlock with the Parquet data.

Remarks

This method reads the Parquet file and converts its contents into a DataBlock. The Parquet file's schema will be used to determine the structure of the resulting DataBlock.

Exceptions

Exception

Thrown when there is an error reading the Parquet file or when the configuration is invalid.

GetStorageDataAsync(IStorageDataBlock, int)

Asynchronously retrieves data from the Parquet file and appends it in batches to the provided storage data block.

public Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)

Parameters

target IStorageDataBlock

The storage data block to append batches to.

batchSize int

The maximum number of rows to accumulate before appending a batch. Defaults to 10000.

Returns

Task<IStorageDataBlock>

A task that represents the asynchronous operation. The task result contains the provided IStorageDataBlock with all data appended.

Remarks

This method mirrors the exact data processing logic from GetDataAsync() but streams data in batches to the provided storage block. The Parquet file's schema will be used to determine the structure of the resulting DataBlock.

Exceptions

ArgumentNullException

Thrown when target is null.

Exception

Thrown when there is an error reading the Parquet file or when the configuration is invalid.