Class S3DataConnector
- Namespace
- Datafication.Connectors.S3Connector
- Assembly
- Datafication.S3Connector.dll
Represents an S3 data connector which is used to connect to an S3 data source and retrieve data.
public class S3DataConnector : IDataConnector
- Inheritance
-
objectS3DataConnector
- Implements
Remarks
This class provides functionalities to connect to an S3 source, download objects, and retrieve data asynchronously. Supports both authenticated and anonymous access to S3 buckets.
Constructors
S3DataConnector(S3ConnectorConfiguration)
Initializes a new instance of the S3DataConnector class.
public S3DataConnector(S3ConnectorConfiguration configuration)
Parameters
configurationS3ConnectorConfigurationThe configuration for the S3 data connector.
Remarks
This constructor ensures that the S3 data connector is properly configured before any operations are performed. It creates an appropriate S3 client based on the authentication settings in the configuration.
Exceptions
- ArgumentNullException
Thrown when the provided configuration is null.
- ArgumentException
Thrown when the configuration validation fails.
Properties
Configuration
Gets the configuration for the S3 data connector.
public S3ConnectorConfiguration Configuration { get; }
Property Value
Methods
Dispose()
Disposes the S3 client when the connector is disposed.
public void Dispose()
GetConnectorId()
Gets the connector's unique identifier.
public string GetConnectorId()
Returns
- string
The unique identifier of the connector.
Remarks
This identifier is derived from the provided S3ConnectorConfiguration.
GetDataAsync()
Asynchronously retrieves data from the S3 source specified in the configuration.
public Task<DataBlock> GetDataAsync()
Returns
- Task<DataBlock>
A task that represents the asynchronous operation. The task result contains a DataBlock with the S3 object data.
Remarks
This method only supports single file mode and loads all data into memory.
For multiple file segments, use GetStorageDataAsync() with a VelocityDataBlock to prevent memory issues.
Downloads the S3 object to a temporary file and then delegates parsing to the appropriate format-specific connector based on the file extension (CSV, JSON, Parquet, Excel).
Exceptions
- NotSupportedException
Thrown when ObjectKey is a prefix pattern (multi-segment mode not supported with GetDataAsync).
- Exception
Thrown when there is an error reading the S3 source or when the configuration is invalid.
GetStorageDataAsync(IStorageDataBlock, int)
Asynchronously retrieves data from the S3 source and appends it in batches to the provided storage data block.
public Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)
Parameters
targetIStorageDataBlockThe storage data block to append batches to.
batchSizeintThe maximum number of rows to accumulate before appending a batch. Defaults to 10000.
Returns
- Task<IStorageDataBlock>
A task that represents the asynchronous operation. The task result contains the provided IStorageDataBlock with all data appended.
Remarks
This method supports both single file and multi-segment modes.
For single file: Downloads the S3 object and delegates streaming to the appropriate format-specific connector.
For multi-segment: Lists all objects matching the prefix, validates they're the same type, and streams each sequentially.
The format-specific connector handles efficient batch processing for large files.
Exceptions
- ArgumentNullException
Thrown when target is null.
- NotSupportedException
Thrown when prefix is used without AllowMultipleSegments enabled.
- InvalidOperationException
Thrown when mixed file types are found in multi-segment mode.
- Exception
Thrown when there is an error reading the S3 source or when the configuration is invalid.