Class ParquetDataConnector
- Namespace
- Datafication.Connectors.ParquetConnector
- Assembly
- Datafication.ParquetConnector.dll
Provides functionality to connect to and retrieve data from a Parquet data source.
public class ParquetDataConnector : IDataConnector
- Inheritance
-
objectParquetDataConnector
- Implements
- Extension Methods
Remarks
This connector utilizes the provided configuration to establish a connection to the Parquet data source, either local or remote, and then extracts data records from the retrieved Parquet.
Constructors
ParquetDataConnector(ParquetConnectorConfiguration)
Initializes a new instance of the ParquetDataConnector class.
public ParquetDataConnector(ParquetConnectorConfiguration configuration)
Parameters
configurationParquetConnectorConfigurationThe configuration to be used by the connector.
Exceptions
- ArgumentNullException
Thrown if the provided configuration is null.
- ArgumentException
Thrown when the configuration validation fails.
Properties
Configuration
Gets the configuration used by this Parquet data connector.
public ParquetConnectorConfiguration Configuration { get; }
Property Value
Methods
GetConnectorId()
Retrieves the unique identifier associated with this connector's configuration.
public string GetConnectorId()
Returns
- string
The unique identifier of the configuration.
GetDataAsync()
Asynchronously retrieves data from the Parquet file specified in the configuration.
public Task<DataBlock> GetDataAsync()
Returns
- Task<DataBlock>
A task that represents the asynchronous operation. The task result contains a DataBlock with the Parquet data.
Remarks
This method reads the Parquet file and converts its contents into a DataBlock. The Parquet file's schema will be used to determine the structure of the resulting DataBlock.
Exceptions
- Exception
Thrown when there is an error reading the Parquet file or when the configuration is invalid.
GetStorageDataAsync(IStorageDataBlock, int)
Asynchronously retrieves data from the Parquet file and appends it in batches to the provided storage data block.
public Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)
Parameters
targetIStorageDataBlockThe storage data block to append batches to.
batchSizeintThe maximum number of rows to accumulate before appending a batch. Defaults to 10000.
Returns
- Task<IStorageDataBlock>
A task that represents the asynchronous operation. The task result contains the provided IStorageDataBlock with all data appended.
Remarks
This method mirrors the exact data processing logic from GetDataAsync() but streams data in batches to the provided storage block. The Parquet file's schema will be used to determine the structure of the resulting DataBlock.
Exceptions
- ArgumentNullException
Thrown when target is null.
- Exception
Thrown when there is an error reading the Parquet file or when the configuration is invalid.