Class VelocityDataBlock

Namespace: Datafication.Storage.Velocity

Assembly: Datafication.Storage.Velocity.dll

Enterprise-grade data block implementation with full CRUD support using DFC segmented storage. Provides O(1) updates/deletes, automatic compaction, and production-ready data lake features.

public class VelocityDataBlock : IStorageDataBlock, IDataBlock

Inheritance: object

VelocityDataBlock

Implements: IStorageDataBlock

IDataBlock

Extension Methods: VelocityDataBlockExtensions.Any(VelocityDataBlock)

VelocityDataBlockExtensions.Any(VelocityDataBlock, string, object)

VelocityDataBlockExtensions.Any(VelocityDataBlock, string, object, ComparisonOperator)

VelocityDataBlockExtensions.Count(VelocityDataBlock)

VelocityDataBlockExtensions.Count(VelocityDataBlock, string, object)

VelocityDataBlockExtensions.Count(VelocityDataBlock, string, object, ComparisonOperator)

VelocityDataBlockExtensions.FirstOrDefault(VelocityDataBlock)

VelocityDataBlockExtensions.FirstOrDefault(VelocityDataBlock, string, object)

Constructors

VelocityDataBlock(string, VelocityOptions?)

Creates a new enterprise VelocityDataBlock with full CRUD support. If the file exists, it will be opened with segmented reader. If not, it will be ready for creation.

public VelocityDataBlock(string filePath, VelocityOptions? options = null)

Parameters

filePath string: The path to the DFC file.
options VelocityOptions: Enterprise storage options for segmented operations.

Properties

ActiveRowCount

Gets the active (non-deleted) row count

public uint ActiveRowCount { get; }

Property Value

uint

RowCount

Gets the active (non-deleted) row count

public int RowCount { get; }

Property Value

int

Schema

Gets the schema of the data block

public DataSchema Schema { get; }

Property Value

DataSchema

SupportsBatchAppend

Supports batch append operations

public bool SupportsBatchAppend { get; }

Property Value

bool

TotalRowCount

Gets the total row count including deleted rows

public uint TotalRowCount { get; }

Property Value

uint

Methods

AddColumn(DataColumn)

Adds a column to the data block schema

public void AddColumn(DataColumn column)

Parameters

column DataColumn

AddRow(params object[])

Adds a row to the data block (appends to storage)

public void AddRow(params object[] values)

Parameters

values object[]

AppendAsync(IDataBlock)

Appends additional data to the data block (enterprise segmented approach)

public Task AppendAsync(IDataBlock additionalData)

Parameters

additionalData IDataBlock

Returns

Task

AppendBatchAsync(DataBlock)

Appends a batch of data efficiently using segmented storage (true append, no rewrites)

public Task AppendBatchAsync(DataBlock batch)

Parameters

batch DataBlock

Returns

Task

AsResult()

Returns a lazy VelocityResult that enables efficient row counting and streaming without full data materialization. The query plan is executed to compute qualifying row indices, but data is not read until explicitly requested.

public VelocityResult AsResult()

Returns

VelocityResult: A VelocityResult representing the query results.

Remarks

This method provides significant performance benefits for scenarios where:

Only the row count is needed (uses SIMD PopCount, ~50-100x faster)
Streaming enumeration is preferred over full materialization
Memory usage must be minimized for large result sets

Example usage:

// Efficient counting without full materialization
var result = vdb.Where("Country", "USA").AsResult();
int count = result.RowCount;  // Fast, no data read

// Explicit materialization when needed
DataBlock data = result.ToDataBlock();

// Streaming enumeration
foreach (var row in result.EnumerateRows())
{
    Console.WriteLine(row["Name"]);
}

ClearQueryPlan()

Clears any pending query operations without executing them.

public VelocityDataBlock ClearQueryPlan()

Returns

VelocityDataBlock

Clone()

Creates a clone of the data block

public DataBlock Clone()

Returns

DataBlock

CompactAsync()

Compacts the storage to optimize performance and reclaim space

public Task CompactAsync()

Returns

Task

CompactAsync(VelocityCompactionStrategy)

Compacts using a specific strategy

public Task CompactAsync(VelocityCompactionStrategy strategy)

Parameters

strategy VelocityCompactionStrategy

Returns

Task

Compute(string, string)

Adds a computed column to the query plan based on an expression. Part of fluent query plan - deferred execution until Execute() is called. Uses DFC columnar optimizations and vectorization for high performance.

public VelocityDataBlock Compute(string columnName, string expression)

Parameters

columnName string: The name for the computed column
expression string: The expression to evaluate (e.g., "Total Profit / Total Revenue")

Returns

VelocityDataBlock: The VelocityDataBlock instance for method chaining

Examples

var result = velocityDataBlock
    .Select("Total Profit", "Total Revenue", "Country")
    .Compute("Profit Margin", "Total Profit / Total Revenue")
    .Where("Profit Margin", 0.25, ComparisonOperator.GreaterThan)
    .Sort(SortDirection.Descending, "Profit Margin")
    .Execute();

ConfigureAutoCompaction(bool, VelocityCompactionTrigger, int)

Configures automatic compaction settings

public void ConfigureAutoCompaction(bool enabled, VelocityCompactionTrigger trigger = VelocityCompactionTrigger.SegmentCount, int threshold = 10)

Parameters

enabled bool
trigger VelocityCompactionTrigger
threshold int

Count(params string[])

Counts the number of non-null values for each specified column or all columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient counting.

public VelocityDataBlock Count(params string[] fields)

Parameters

fields string[]: The fields to count. If null or empty, all fields will be counted.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

CreateEnterprise(string, string?)

Creates a VelocityDataBlock with enterprise features enabled

public static VelocityDataBlock CreateEnterprise(string filePath, string? primaryKeyColumn = null)

Parameters

filePath string
primaryKeyColumn string

Returns

VelocityDataBlock

CreateHighThroughput(string, string?)

Creates a VelocityDataBlock optimized for high-throughput workloads

public static VelocityDataBlock CreateHighThroughput(string filePath, string? primaryKeyColumn = null)

Parameters

filePath string
primaryKeyColumn string

Returns

VelocityDataBlock

DeleteRowAsync(VelocityRowId)

Deletes a row by internal row ID (O(1) performance)

public Task DeleteRowAsync(VelocityRowId rowId)

Parameters

rowId VelocityRowId

Returns

Task

DeleteRowAsync(string)

Deletes a row by primary key (requires primary key configuration)

public Task DeleteRowAsync(string primaryKey)

Parameters

primaryKey string

Returns

Task

DeleteRowsAsync(IEnumerable<string>)

Deletes multiple rows by primary keys (optimized batch operation)

public Task DeleteRowsAsync(IEnumerable<string> primaryKeys)

Parameters

primaryKeys IEnumerable<string>

Returns

Task

Dispose()

Disposes the VelocityDataBlock and releases all resources

public void Dispose()

DropDuplicates(KeepDuplicateMode)

Adds a DropDuplicates operation to the query plan that removes duplicate rows based on all columns. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep = KeepDuplicateMode.First)

Parameters

keep KeepDuplicateMode: Specifies which duplicates to keep (First, Last, or None). Defaults to First.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .DropDuplicates(KeepDuplicateMode.First)
    .Where("Status", "Active")
    .Execute();

DropDuplicates(KeepDuplicateMode, params string[])

Adds a DropDuplicates operation to the query plan that removes duplicate rows based on specific columns. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep, params string[] columns)

Parameters

keep KeepDuplicateMode: Specifies which duplicates to keep (First, Last, or None).
columns string[]: The columns to consider when identifying duplicates.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Remove duplicates based on 'Name' and 'Email', keep first
var result = velocityDataBlock
    .DropDuplicates(KeepDuplicateMode.First, "Name", "Email")
    .Execute();

Exceptions

ArgumentException: Thrown when no columns are specified or columns don't exist.

DropNulls(DropNullMode)

Returns a new DataBlock with rows dropped based on null values.

public DataBlock DropNulls(DropNullMode dropMode)

Parameters

dropMode DropNullMode: Specifies the criteria for dropping rows.

Returns

DataBlock: A new DataBlock with rows dropped according to the specified criteria.

DropNulls(params string[])

Removes rows that contain null values in any of the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient null checking.

public VelocityDataBlock DropNulls(params string[] columnNames)

Parameters

columnNames string[]: The names of the columns to check for null values. If null or empty, all columns are checked.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .DropNulls("Name", "Email")
    .Where("Status", "Active")
    .Execute();

Exceptions

ArgumentException: Thrown when any of the specified column names do not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

EnableBackgroundCompaction(bool)

Enables background compaction for non-blocking optimization

public void EnableBackgroundCompaction(bool enabled = true)

Parameters

enabled bool

Execute()

Executes the accumulated query operations and returns a materialized DataBlock. Uses DFC columnar optimizations when possible for maximum performance.

public DataBlock Execute()

Returns

DataBlock

FillNulls(FillMethod, object, params string[])

Adds a FillNulls operation to the query plan that will fill null values with a constant value. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock FillNulls(FillMethod method, object constantValue, params string[] columnNames)

Parameters

method FillMethod: The fill method to use (typically FillMethod.ConstantValue).
constantValue object: The constant value to use for filling nulls.
columnNames string[]: The columns to apply the fill operation to.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Fill nulls with constant value
var result = velocityDataBlock
    .Where("Status", "Active", ComparisonOperator.Equals)
    .FillNulls(FillMethod.ConstantValue, 0.0, "sales", "revenue")
    .Execute();

Exceptions

ArgumentException: Thrown when no columns are specified.
ObjectDisposedException: Thrown when the data block has been disposed.

FillNulls(FillMethod, params string[])

Adds a FillNulls operation to the query plan that will fill null values according to the specified method. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock FillNulls(FillMethod method, params string[] columnNames)

Parameters

method FillMethod: The fill method to use.
columnNames string[]: The columns to apply the fill operation to.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Country", "USA", ComparisonOperator.Equals)
    .Select("Date", "Temperature", "Humidity")
    .FillNulls(FillMethod.ForwardFill, "Temperature")
    .FillNulls(FillMethod.Mean, "Humidity")
    .Execute();

Exceptions

ArgumentException: Thrown when no columns are specified.
ObjectDisposedException: Thrown when the data block has been disposed.

Filter(Func<Dictionary<string, object>, bool>, params string[])

Filters rows based on a predicate and projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering and projection.

public VelocityDataBlock Filter(Func<Dictionary<string, object>, bool> predicate, params string[] columnNames)

Parameters

predicate Func<Dictionary<string, object>, bool>: A function that determines whether a row should be included based on its values.
columnNames string[]: The names of the columns to include. If null or empty, all columns are included.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Filter(row => (int)row["Age"] > 25 && (string)row["City"] == "New York", "Name", "Age")
    .Where("Department", "Engineering")
    .Execute();

Exceptions

ArgumentNullException: Thrown when the predicate is null.
ArgumentException: Thrown when any of the specified column names do not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

FindRowIdAsync(string)

Finds a row ID by primary key value

public Task<VelocityRowId?> FindRowIdAsync(string primaryKey)

Parameters

primaryKey string

Returns

Task<VelocityRowId?>

FlushAsync()

Flushes any pending changes to storage

public Task FlushAsync()

Returns

Task

GetColumn(string)

Gets a column by name

public DataColumn? GetColumn(string columnName)

Parameters

columnName string

Returns

DataColumn

GetPrimaryKeyIndexStats()

Gets performance statistics about the primary key index for benchmarking

public (int IndexedKeys, bool IndexBuilt, int Segments) GetPrimaryKeyIndexStats()

Returns

(int IndexedKeys, bool IndexBuilt, int Segments)

GetRowCursor()

Gets a row cursor for iterating through active rows

public IDataRowCursor GetRowCursor()

Returns

IDataRowCursor

GetRowCursor(params string[])

Gets a row cursor for specific columns

public IDataRowCursor GetRowCursor(params string[] columnNames)

Parameters

columnNames string[]

Returns

IDataRowCursor

GetStorageStatsAsync()

Gets comprehensive storage statistics

public Task<StorageStats> GetStorageStatsAsync()

Returns

Task<StorageStats>

GetValue(VelocityRowId, int)

Gets a value by row ID with automatic update following

public object? GetValue(VelocityRowId rowId, int columnIndex)

Parameters

rowId VelocityRowId
columnIndex int

Returns

object

GetValue(VelocityRowId, string)

Gets a value by row ID and column name with automatic update following

public object? GetValue(VelocityRowId rowId, string columnName)

Parameters

rowId VelocityRowId
columnName string

Returns

object

GetValue(int, int)

Gets a value from the data block (legacy row-based access)

public object? GetValue(int rowIndex, int columnIndex)

Parameters

rowIndex int
columnIndex int

Returns

object

GroupBy(string)

Groups the rows by the specified column name. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping.

public VelocityDataBlock GroupBy(string columnName)

Parameters

columnName string: The name of the column to group by.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .GroupBy("Department")
    .Execute();

Exceptions

ArgumentException: Thrown when the specified column does not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

GroupByAggregate(string, Dictionary<string, AggregationType>, Dictionary<string, string>)

Groups the data by the specified column and applies multiple aggregation functions. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.

public VelocityDataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations, Dictionary<string, string> resultColumnNames = null)

Parameters

groupByColumn string: The column to group by.
aggregations Dictionary<string, AggregationType>: Dictionary mapping column names to aggregation types.
resultColumnNames Dictionary<string, string>: Optional dictionary mapping aggregate columns to custom result column names.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

var aggregations = new Dictionary<string, AggregationType>
{
    ["session_duration"] = AggregationType.Mean,
    ["page_views"] = AggregationType.Sum,
    ["user_id"] = AggregationType.Count
};
var result = velocityDataBlock.GroupByAggregate("user_type", aggregations).Execute();

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation types are invalid.
ObjectDisposedException: Thrown when the data block has been disposed.

GroupByAggregate(string, string, AggregationType, string)

Groups the data by the specified column and applies an aggregation function to another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.

public VelocityDataBlock GroupByAggregate(string groupByColumn, string aggregateColumn, AggregationType aggregationType, string resultColumnName = null)

Parameters

groupByColumn string: The column to group by.
aggregateColumn string: The column to aggregate.
aggregationType AggregationType: The type of aggregation to perform.
resultColumnName string: Optional custom name for the result column. If null, uses pattern like "avg_columnName".

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .GroupByAggregate("user_type", "session_duration", AggregationType.Mean, "avg_duration")
    .Execute();

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation type is invalid for the column type.
ObjectDisposedException: Thrown when the data block has been disposed.

HasColumn(string)

Checks if a column exists

public bool HasColumn(string columnName)

Parameters

columnName string

Returns

bool

Head(int)

Returns the first rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient head operations.

public VelocityDataBlock Head(int rowCount)

Parameters

rowCount int: The number of rows to return.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Head(10)
    .Execute();

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero.
ObjectDisposedException: Thrown when the data block has been disposed.

Info()

Generates and returns a new DataBlock that contains summary information similar to the Info output in Pandas and Microsoft's DataFrame. The resulting DataBlock will include columns for column names, data types, non-null counts, and memory usage. This implementation is optimized for DFC format by reading null bitmaps directly.

public DataBlock Info()

Returns

DataBlock: A new DataBlock containing summary information.

InsertRow(int, object[])

Inserts a row at specific index (not supported efficiently in segmented storage)

public void InsertRow(int index, object[] values)

Parameters

index int
values object[]

IsRowDeleted(VelocityRowId)

Checks if a row is deleted by row ID

public bool IsRowDeleted(VelocityRowId rowId)

Parameters

rowId VelocityRowId

Returns

bool

Max(params string[])

Calculates the maximum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Max(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

Mean(params string[])

Calculates the mean value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Mean(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

Melt(IEnumerable<string>, string, string)

Unpivots the VelocityDataBlock from wide format to long format by keeping the specified fixed columns and converting the remaining columns into key-value pairs. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient melting operations.

public VelocityDataBlock Melt(IEnumerable<string> fixedColumns, string meltedColumnName, string meltedValueName)

Parameters

fixedColumns IEnumerable<string>: A collection of column names to remain fixed in the output.
meltedColumnName string: The name of the column that will hold the original column names that were melted.
meltedValueName string: The name of the column that will hold the values from the melted columns.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Melt(new[] { "ID", "Name" }, "Attribute", "Value")
    .Execute();

Exceptions

ArgumentNullException: Thrown when fixedColumns is null.
ArgumentException: Thrown when any fixed column does not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

Merge(DataBlock, string, string, MergeMode)

Merges the current VelocityDataBlock with another DataBlock based on specified key columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient join operations.

public VelocityDataBlock Merge(DataBlock other, string keyColumn, string keyColumnOther, MergeMode mergeMode)

Parameters

other DataBlock: The other DataBlock to merge with.
keyColumn string: The name of the key column in this VelocityDataBlock.
keyColumnOther string: The name of the key column in the other DataBlock.
mergeMode MergeMode: The merge mode specifying the type of join (Left, Right, Full, Inner).

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Merge(otherDataBlock, "ID", "UserID", MergeMode.Inner)
    .Select("Name", "Email", "Department")
    .Execute();

Exceptions

ArgumentException: Thrown if a specified key column is not present in the respective DataBlock.
ObjectDisposedException: Thrown when the data block has been disposed.

Min(params string[])

Calculates the minimum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Min(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

OpenAsync(string, VelocityOptions?)

Opens a DFC file as an enterprise VelocityDataBlock with full CRUD support.

public static Task<VelocityDataBlock> OpenAsync(string pathOrId, VelocityOptions? options = null)

Parameters

pathOrId string: The file path or identifier of the DFC file.
options VelocityOptions: Enterprise storage options.

Returns

Task<VelocityDataBlock>: A new VelocityDataBlock instance with segmented storage.

Pivot(IEnumerable<string>, string, string, AggregationType, string)

Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column, using multiple index columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.

public VelocityDataBlock Pivot(IEnumerable<string> indexColumns, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum, string columnNameFormat = "{pivot}_{value}")

Parameters

indexColumns IEnumerable<string>: The columns to use as row identifiers (become row keys in output).
pivotColumn string: The column whose unique values become new column names.
valueColumn string: The column containing values to aggregate.
aggregationType AggregationType: The aggregation function to apply when multiple values exist for the same index/pivot combination.
columnNameFormat string: Format string for generated column names. Use {pivot} for pivot value and {value} for value column name.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Pivot with multiple index columns
var result = velocityDataBlock
    .Pivot(new[] { "Year", "Category" }, "Region", "Sales", AggregationType.Sum)
    .Execute();

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation type is invalid for the column type.
ObjectDisposedException: Thrown when the data block has been disposed.

Pivot(string, string, string, AggregationType)

Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.

public VelocityDataBlock Pivot(string indexColumn, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum)

Parameters

indexColumn string: The column to use as row identifier (becomes row keys in output).
pivotColumn string: The column whose unique values become new column names.
valueColumn string: The column containing values to aggregate.
aggregationType AggregationType: The aggregation function to apply when multiple values exist for the same index/pivot combination.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Year", 2024)
    .Pivot("Category", "Region", "Sales", AggregationType.Sum)
    .Execute();

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation type is invalid for the column type.
ObjectDisposedException: Thrown when the data block has been disposed.

RemoveColumn(params string[])

Removes columns (not supported in segmented storage)

public void RemoveColumn(params string[] columnNames)

Parameters

columnNames string[]

RemoveRow(int)

Removes a row by index (legacy interface support)

public void RemoveRow(int rowIndex)

Parameters

rowIndex int

Sample(int, int?)

Returns a random sample of rowCount rows from the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sampling.

public VelocityDataBlock Sample(int rowCount, int? seed = null)

Parameters

rowCount int: The number of rows to include in the sample.
seed int?: Optional seed for random number generation.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sample(100, seed: 42)
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero or greater than the total number of rows.
ObjectDisposedException: Thrown when the data block has been disposed.

SaveAsync(string, IDataBlock, VelocityOptions?)

Saves a DataBlock to enterprise DFC segmented storage with full CRUD support.

public static Task<VelocityDataBlock> SaveAsync(string pathOrId, IDataBlock source, VelocityOptions? options = null)

Parameters

pathOrId string: The target file path or identifier.
source IDataBlock: The source DataBlock to save.
options VelocityOptions: Enterprise storage options.

Returns

Task<VelocityDataBlock>: A new VelocityDataBlock instance with enterprise features.

Select(params string[])

Projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient column selection.

public VelocityDataBlock Select(params string[] columnNames)

Parameters

columnNames string[]: The names of the columns to include. If null or empty, all columns are included.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Select("Name", "Age", "City")
    .Where("Age", 25, ComparisonOperator.GreaterThan)
    .Execute();

Exceptions

ArgumentException: Thrown when any of the specified column names do not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

Sort(SortDirection, string)

Sorts the data in the VelocityDataBlock based on the specified column and sort direction. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sorting.

public VelocityDataBlock Sort(SortDirection direction, string columnName)

Parameters

direction SortDirection: The direction to sort the data (Ascending or Descending).
columnName string: The name of the column to sort by.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentException: Thrown when the specified column does not exist.
ObjectDisposedException: Thrown when the data block has been disposed.

StandardDeviation(params string[])

Calculates the standard deviation for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock StandardDeviation(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

Sum(params string[])

Calculates the sum for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Sum(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

Tail(int)

Returns the last rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient tail operations.

public VelocityDataBlock Tail(int rowCount)

Parameters

rowCount int: The number of rows to return.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Tail(10)
    .Execute();

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero.
ObjectDisposedException: Thrown when the data block has been disposed.

Transpose(string)

Transposes the rows and columns of the VelocityDataBlock. Converts rows into columns and columns into rows. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient transposition.

public VelocityDataBlock Transpose(string headerColumnName = null)

Parameters

headerColumnName string: Optional. The name of the column to use as headers for the transposed data. If not provided, the first row will be used as headers.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Type", "Metric")
    .Transpose("MetricName")
    .Execute();

Exceptions

ArgumentException: Thrown when the specified header column does not exist.
ObjectDisposedException: Thrown when the data block has been disposed.

UpdateRow(int, object[])

Updates a row by index (legacy interface support)

public void UpdateRow(int rowIndex, object[] values)

Parameters

rowIndex int
values object[]

UpdateRowAsync(VelocityRowId, object[])

Updates a row by internal row ID (O(1) performance)

public Task UpdateRowAsync(VelocityRowId rowId, object[] newValues)

Parameters

rowId VelocityRowId
newValues object[]

Returns

Task

UpdateRowAsync(string, object[])

Updates a row by primary key (requires primary key configuration)

public Task UpdateRowAsync(string primaryKey, object[] newValues)

Parameters

primaryKey string
newValues object[]

Returns

Task

UpdateRowsAsync(Dictionary<string, object[]>)

Updates multiple rows by primary keys (optimized batch operation)

public Task UpdateRowsAsync(Dictionary<string, object[]> updates)

Parameters

updates Dictionary<string, object[]>

Returns

Task

UpdateRowsByIndexAsync(Dictionary<int, object[]>)

Updates multiple rows by row index (optimized batch operation using internal row IDs) Ideal when no primary key is configured or when updating by position.

public Task UpdateRowsByIndexAsync(Dictionary<int, object[]> updates)

Parameters

updates Dictionary<int, object[]>

Returns

Task

ValidateExpression(string, out string)

Validates an expression against the current VelocityDataBlock schema.

public bool ValidateExpression(string expression, out string error)

Parameters

expression string: The expression to validate
error string: Output parameter containing error message if validation fails

Returns

bool: True if expression is valid, false otherwise

Variance(params string[])

Calculates the variance for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Variance(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.

Where(string, object, ComparisonOperator)

Filters the data block to include only rows where the specified column matches the given value. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering.

public VelocityDataBlock Where(string columnName, object value, ComparisonOperator op = ComparisonOperator.Equals)

Parameters

columnName string: The name of the column to filter on.
value object: The value to compare against.
op ComparisonOperator: The comparison operator to use (default: Equals).

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Age", 25, ComparisonOperator.GreaterThan)
    .Where("City", "New York")
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentNullException: Thrown when columnName is null.
ArgumentException: Thrown when the specified column does not exist in the data block.
ObjectDisposedException: Thrown when the data block has been disposed.

WhereContains(string, string)

Filters the data block for string columns containing the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance. This method is part of the fluent query plan and does not execute immediately.

public VelocityDataBlock WhereContains(string columnName, string pattern)

Parameters

columnName string: The name of the string column to search in.
pattern string: The pattern to search for.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Exceptions

ArgumentNullException: Thrown when columnName or pattern is null.
ArgumentException: Thrown when the specified column does not exist.
ObjectDisposedException: Thrown when the data block has been disposed.

WhereEndsWith(string, string)

Filters the data block for string columns ending with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.

public VelocityDataBlock WhereEndsWith(string columnName, string pattern)

Parameters

columnName string: The name of the string column to search in.
pattern string: The pattern to search for.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

WhereStartsWith(string, string)

Filters the data block for string columns starting with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.

public VelocityDataBlock WhereStartsWith(string columnName, string pattern)

Parameters

columnName string: The name of the string column to search in.
pattern string: The pattern to search for.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Window(string, WindowFunctionType, int?, string, string, string[], WindowFrame, double?, object)

Applies a window function over the dataset, computing values based on a sliding or cumulative window. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations with SIMD vectorization for maximum performance.

public VelocityDataBlock Window(string columnName, WindowFunctionType functionType, int? windowSize = null, string resultColumnName = null, string orderByColumn = null, string[] partitionByColumns = null, WindowFrame frameSpec = null, double? percentile = null, object defaultValue = null)

Parameters

columnName string: The column to apply the window function to (null for RowNumber)
functionType WindowFunctionType: The type of window function to apply
windowSize int?: Window size for moving functions (required for Moving* functions). Ignored if frameSpec is provided.
resultColumnName string: Name for the result column (auto-generated if null)
orderByColumn string: Column to order by before applying window function
partitionByColumns string[]: Columns to partition by (applies window function within each partition)
frameSpec WindowFrame: Optional window frame specification (ROWS BETWEEN syntax). If null and windowSize is provided, auto-creates frame (N PRECEDING AND CURRENT ROW).
percentile double?: Percentile value for MovingPercentile function (0.0 to 1.0). Required for MovingPercentile, ignored for other functions.
defaultValue object: Default value to use when Lag/Lead functions reference rows that don't exist. If null, these functions will return null for out-of-bounds references.

Returns

VelocityDataBlock: This VelocityDataBlock instance for method chaining.

Remarks

Performance characteristics:

Moving functions (Average, Sum, Min, Max): 30-100M values/sec with SIMD
Cumulative functions: 50-150M values/sec with SIMD prefix sum algorithms
Lag/Lead: Near-memory-bandwidth with vectorized copying
Ranking functions: Optimized with parallel processing for large datasets

The three-tier optimization strategy:

DFC stats metadata (when available) - metadata-only, 1000x+ faster
SIMD vectorization (numeric types) - 10-50x faster than scalar
Parallel processing (large datasets) - scales with core count
Scalar fallback - maintains correctness for all data types

Exceptions

ObjectDisposedException: Thrown when the data block has been disposed.
ArgumentException: Thrown when parameters are invalid for the specified function type.

Table of Contents

Class VelocityDataBlock

Constructors

VelocityDataBlock(string, VelocityOptions?)

Parameters

Properties

ActiveRowCount

Property Value

RowCount

Property Value

Schema

Property Value

SupportsBatchAppend

Property Value

TotalRowCount

Property Value

Methods

AddColumn(DataColumn)

Parameters

AddRow(params object[])

Parameters

AppendAsync(IDataBlock)

Parameters

Returns

AppendBatchAsync(DataBlock)

Parameters

Returns

AsResult()

Returns

Remarks

ClearQueryPlan()

Returns

Clone()

Returns

CompactAsync()

Returns

CompactAsync(VelocityCompactionStrategy)

Parameters

Returns

Compute(string, string)

Parameters

Returns

Examples

ConfigureAutoCompaction(bool, VelocityCompactionTrigger, int)

Parameters

Count(params string[])

Parameters

Returns

Exceptions

CreateEnterprise(string, string?)

Parameters

Returns

CreateHighThroughput(string, string?)

Parameters

Returns

DeleteRowAsync(VelocityRowId)

Parameters

Returns

DeleteRowAsync(string)

Parameters

Returns

DeleteRowsAsync(IEnumerable<string>)

Parameters

Returns

Dispose()

DropDuplicates(KeepDuplicateMode)

Parameters

Returns

Examples

DropDuplicates(KeepDuplicateMode, params string[])

Parameters

Returns

Examples

Exceptions

DropNulls(DropNullMode)

Parameters

Returns

DropNulls(params string[])

Parameters

Returns