Table of Contents

Class S3DataConnector

Namespace
Datafication.Connectors.S3Connector
Assembly
Datafication.S3Connector.dll

Represents an S3 data connector which is used to connect to an S3 data source and retrieve data.

public class S3DataConnector : IDataConnector
Inheritance
object
S3DataConnector
Implements

Remarks

This class provides functionalities to connect to an S3 source, download objects, and retrieve data asynchronously. Supports both authenticated and anonymous access to S3 buckets.

Constructors

S3DataConnector(S3ConnectorConfiguration)

Initializes a new instance of the S3DataConnector class.

public S3DataConnector(S3ConnectorConfiguration configuration)

Parameters

configuration S3ConnectorConfiguration

The configuration for the S3 data connector.

Remarks

This constructor ensures that the S3 data connector is properly configured before any operations are performed. It creates an appropriate S3 client based on the authentication settings in the configuration.

Exceptions

ArgumentNullException

Thrown when the provided configuration is null.

ArgumentException

Thrown when the configuration validation fails.

Properties

Configuration

Gets the configuration for the S3 data connector.

public S3ConnectorConfiguration Configuration { get; }

Property Value

S3ConnectorConfiguration

Methods

Dispose()

Disposes the S3 client when the connector is disposed.

public void Dispose()

GetConnectorId()

Gets the connector's unique identifier.

public string GetConnectorId()

Returns

string

The unique identifier of the connector.

Remarks

This identifier is derived from the provided S3ConnectorConfiguration.

GetDataAsync()

Asynchronously retrieves data from the S3 source specified in the configuration.

public Task<DataBlock> GetDataAsync()

Returns

Task<DataBlock>

A task that represents the asynchronous operation. The task result contains a DataBlock with the S3 object data.

Remarks

This method only supports single file mode and loads all data into memory.

For multiple file segments, use GetStorageDataAsync() with a VelocityDataBlock to prevent memory issues.

Downloads the S3 object to a temporary file and then delegates parsing to the appropriate format-specific connector based on the file extension (CSV, JSON, Parquet, Excel).

Exceptions

NotSupportedException

Thrown when ObjectKey is a prefix pattern (multi-segment mode not supported with GetDataAsync).

Exception

Thrown when there is an error reading the S3 source or when the configuration is invalid.

GetStorageDataAsync(IStorageDataBlock, int)

Asynchronously retrieves data from the S3 source and appends it in batches to the provided storage data block.

public Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)

Parameters

target IStorageDataBlock

The storage data block to append batches to.

batchSize int

The maximum number of rows to accumulate before appending a batch. Defaults to 10000.

Returns

Task<IStorageDataBlock>

A task that represents the asynchronous operation. The task result contains the provided IStorageDataBlock with all data appended.

Remarks

This method supports both single file and multi-segment modes.

For single file: Downloads the S3 object and delegates streaming to the appropriate format-specific connector.

For multi-segment: Lists all objects matching the prefix, validates they're the same type, and streams each sequentially.

The format-specific connector handles efficient batch processing for large files.

Exceptions

ArgumentNullException

Thrown when target is null.

NotSupportedException

Thrown when prefix is used without AllowMultipleSegments enabled.

InvalidOperationException

Thrown when mixed file types are found in multi-segment mode.

Exception

Thrown when there is an error reading the S3 source or when the configuration is invalid.