Table of Contents

Class ParquetSinkExtension

Namespace
Datafication.Sinks.Connectors.ParquetConnector
Assembly
Datafication.ParquetConnector.dll

Provides extension methods for transforming DataBlock to Parquet format.

public static class ParquetSinkExtension
Inheritance
object
ParquetSinkExtension

Methods

ParquetSink(DataBlock, CompressionMethod)

Synchronously transforms the DataBlock into a Parquet file (as byte array) using the Parquet sink.

public static byte[] ParquetSink(this DataBlock dataBlock, CompressionMethod compression = CompressionMethod.Snappy)

Parameters

dataBlock DataBlock

The DataBlock to transform.

compression CompressionMethod

The compression method to use. Default is Snappy.

Returns

byte[]

A byte array representing the Parquet file.

Remarks

Columns containing nested DataBlock values are automatically skipped.

ParquetSink(DataBlock, out List<string>, CompressionMethod)

Synchronously transforms the DataBlock into a Parquet file (as byte array) using the Parquet sink, and returns the list of columns that were skipped due to unsupported types.

public static byte[] ParquetSink(this DataBlock dataBlock, out List<string> skippedColumns, CompressionMethod compression = CompressionMethod.Snappy)

Parameters

dataBlock DataBlock

The DataBlock to transform.

skippedColumns List<string>

Output parameter containing the names of columns that were skipped.

compression CompressionMethod

The compression method to use. Default is Snappy.

Returns

byte[]

A byte array representing the Parquet file.

ParquetSinkAsync(DataBlock, CompressionMethod)

Asynchronously transforms the DataBlock into a Parquet file (as byte array) using the Parquet sink.

public static Task<byte[]> ParquetSinkAsync(this DataBlock dataBlock, CompressionMethod compression = CompressionMethod.Snappy)

Parameters

dataBlock DataBlock

The DataBlock to transform.

compression CompressionMethod

The compression method to use. Default is Snappy.

Returns

Task<byte[]>

A task that represents the asynchronous transformation into a byte array.

Remarks

Columns containing nested DataBlock values are automatically skipped. Use ParquetSinkWithSkippedColumnsAsync(DataBlock, CompressionMethod) to retrieve the list of skipped columns.

ParquetSinkWithSkippedColumns(DataBlock, CompressionMethod)

Synchronously transforms the DataBlock into a Parquet file (as byte array) using the Parquet sink, and returns the list of columns that were skipped due to unsupported types.

public static (byte[] Data, List<string> SkippedColumns) ParquetSinkWithSkippedColumns(this DataBlock dataBlock, CompressionMethod compression = CompressionMethod.Snappy)

Parameters

dataBlock DataBlock

The DataBlock to transform.

compression CompressionMethod

The compression method to use. Default is Snappy.

Returns

(byte[] Data, List<string> SkippedColumns)

A tuple containing the byte array and the list of skipped column names.

ParquetSinkWithSkippedColumnsAsync(DataBlock, CompressionMethod)

Asynchronously transforms the DataBlock into a Parquet file (as byte array) using the Parquet sink, and returns the list of columns that were skipped due to unsupported types.

public static Task<(byte[] Data, List<string> SkippedColumns)> ParquetSinkWithSkippedColumnsAsync(this DataBlock dataBlock, CompressionMethod compression = CompressionMethod.Snappy)

Parameters

dataBlock DataBlock

The DataBlock to transform.

compression CompressionMethod

The compression method to use. Default is Snappy.

Returns

Task<(byte[] Data, List<string> SkippedColumns)>

A task that produces a tuple containing the byte array and the list of skipped column names.