Table of Contents

Class VelocityDataBlock

Namespace
Datafication.Storage.Velocity
Assembly
Datafication.Storage.Velocity.dll

Enterprise-grade data block implementation with full CRUD support using DFC segmented storage. Provides O(1) updates/deletes, automatic compaction, and production-ready data lake features.

public class VelocityDataBlock : IStorageDataBlock, IDataBlock
Inheritance
object
VelocityDataBlock
Implements
Extension Methods

Constructors

VelocityDataBlock(string, VelocityOptions?)

Creates a new enterprise VelocityDataBlock with full CRUD support. If the file exists, it will be opened with segmented reader. If not, it will be ready for creation.

public VelocityDataBlock(string filePath, VelocityOptions? options = null)

Parameters

filePath string

The path to the DFC file.

options VelocityOptions

Enterprise storage options for segmented operations.

Properties

ActiveRowCount

Gets the active (non-deleted) row count

public uint ActiveRowCount { get; }

Property Value

uint

RowCount

Gets the active (non-deleted) row count

public int RowCount { get; }

Property Value

int

Schema

Gets the schema of the data block

public DataSchema Schema { get; }

Property Value

DataSchema

SupportsBatchAppend

Supports batch append operations

public bool SupportsBatchAppend { get; }

Property Value

bool

TotalRowCount

Gets the total row count including deleted rows

public uint TotalRowCount { get; }

Property Value

uint

Methods

AddColumn(DataColumn)

Adds a column to the data block schema

public void AddColumn(DataColumn column)

Parameters

column DataColumn

AddRow(params object[])

Adds a row to the data block (appends to storage)

public void AddRow(params object[] values)

Parameters

values object[]

AppendAsync(IDataBlock)

Appends additional data to the data block (enterprise segmented approach)

public Task AppendAsync(IDataBlock additionalData)

Parameters

additionalData IDataBlock

Returns

Task

AppendBatchAsync(DataBlock)

Appends a batch of data efficiently using segmented storage (true append, no rewrites)

public Task AppendBatchAsync(DataBlock batch)

Parameters

batch DataBlock

Returns

Task

AsResult()

Returns a lazy VelocityResult that enables efficient row counting and streaming without full data materialization. The query plan is executed to compute qualifying row indices, but data is not read until explicitly requested.

public VelocityResult AsResult()

Returns

VelocityResult

A VelocityResult representing the query results.

Remarks

This method provides significant performance benefits for scenarios where:

  • Only the row count is needed (uses SIMD PopCount, ~50-100x faster)
  • Streaming enumeration is preferred over full materialization
  • Memory usage must be minimized for large result sets

Example usage:

// Efficient counting without full materialization
var result = vdb.Where("Country", "USA").AsResult();
int count = result.RowCount;  // Fast, no data read

// Explicit materialization when needed
DataBlock data = result.ToDataBlock();

// Streaming enumeration
foreach (var row in result.EnumerateRows())
{
    Console.WriteLine(row["Name"]);
}

ClearQueryPlan()

Clears any pending query operations without executing them.

public VelocityDataBlock ClearQueryPlan()

Returns

VelocityDataBlock

Clone()

Creates a clone of the data block

public DataBlock Clone()

Returns

DataBlock

CompactAsync()

Compacts the storage to optimize performance and reclaim space

public Task CompactAsync()

Returns

Task

CompactAsync(VelocityCompactionStrategy)

Compacts using a specific strategy

public Task CompactAsync(VelocityCompactionStrategy strategy)

Parameters

strategy VelocityCompactionStrategy

Returns

Task

Compute(string, string)

Adds a computed column to the query plan based on an expression. Part of fluent query plan - deferred execution until Execute() is called. Uses DFC columnar optimizations and vectorization for high performance.

public VelocityDataBlock Compute(string columnName, string expression)

Parameters

columnName string

The name for the computed column

expression string

The expression to evaluate (e.g., "Total Profit / Total Revenue")

Returns

VelocityDataBlock

The VelocityDataBlock instance for method chaining

Examples

var result = velocityDataBlock
    .Select("Total Profit", "Total Revenue", "Country")
    .Compute("Profit Margin", "Total Profit / Total Revenue")
    .Where("Profit Margin", 0.25, ComparisonOperator.GreaterThan)
    .Sort(SortDirection.Descending, "Profit Margin")
    .Execute();

ConfigureAutoCompaction(bool, VelocityCompactionTrigger, int)

Configures automatic compaction settings

public void ConfigureAutoCompaction(bool enabled, VelocityCompactionTrigger trigger = VelocityCompactionTrigger.SegmentCount, int threshold = 10)

Parameters

enabled bool
trigger VelocityCompactionTrigger
threshold int

Count(params string[])

Counts the number of non-null values for each specified column or all columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient counting.

public VelocityDataBlock Count(params string[] fields)

Parameters

fields string[]

The fields to count. If null or empty, all fields will be counted.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

CreateEnterprise(string, string?)

Creates a VelocityDataBlock with enterprise features enabled

public static VelocityDataBlock CreateEnterprise(string filePath, string? primaryKeyColumn = null)

Parameters

filePath string
primaryKeyColumn string

Returns

VelocityDataBlock

CreateHighThroughput(string, string?)

Creates a VelocityDataBlock optimized for high-throughput workloads

public static VelocityDataBlock CreateHighThroughput(string filePath, string? primaryKeyColumn = null)

Parameters

filePath string
primaryKeyColumn string

Returns

VelocityDataBlock

DeleteRowAsync(VelocityRowId)

Deletes a row by internal row ID (O(1) performance)

public Task DeleteRowAsync(VelocityRowId rowId)

Parameters

rowId VelocityRowId

Returns

Task

DeleteRowAsync(string)

Deletes a row by primary key (requires primary key configuration)

public Task DeleteRowAsync(string primaryKey)

Parameters

primaryKey string

Returns

Task

DeleteRowsAsync(IEnumerable<string>)

Deletes multiple rows by primary keys (optimized batch operation)

public Task DeleteRowsAsync(IEnumerable<string> primaryKeys)

Parameters

primaryKeys IEnumerable<string>

Returns

Task

Dispose()

Disposes the VelocityDataBlock and releases all resources

public void Dispose()

DropDuplicates(KeepDuplicateMode)

Adds a DropDuplicates operation to the query plan that removes duplicate rows based on all columns. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep = KeepDuplicateMode.First)

Parameters

keep KeepDuplicateMode

Specifies which duplicates to keep (First, Last, or None). Defaults to First.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .DropDuplicates(KeepDuplicateMode.First)
    .Where("Status", "Active")
    .Execute();

DropDuplicates(KeepDuplicateMode, params string[])

Adds a DropDuplicates operation to the query plan that removes duplicate rows based on specific columns. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep, params string[] columns)

Parameters

keep KeepDuplicateMode

Specifies which duplicates to keep (First, Last, or None).

columns string[]

The columns to consider when identifying duplicates.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Remove duplicates based on 'Name' and 'Email', keep first
var result = velocityDataBlock
    .DropDuplicates(KeepDuplicateMode.First, "Name", "Email")
    .Execute();

Exceptions

ArgumentException

Thrown when no columns are specified or columns don't exist.

DropNulls(DropNullMode)

Returns a new DataBlock with rows dropped based on null values.

public DataBlock DropNulls(DropNullMode dropMode)

Parameters

dropMode DropNullMode

Specifies the criteria for dropping rows.

Returns

DataBlock

A new DataBlock with rows dropped according to the specified criteria.

DropNulls(params string[])

Removes rows that contain null values in any of the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient null checking.

public VelocityDataBlock DropNulls(params string[] columnNames)

Parameters

columnNames string[]

The names of the columns to check for null values. If null or empty, all columns are checked.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .DropNulls("Name", "Email")
    .Where("Status", "Active")
    .Execute();

Exceptions

ArgumentException

Thrown when any of the specified column names do not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

EnableBackgroundCompaction(bool)

Enables background compaction for non-blocking optimization

public void EnableBackgroundCompaction(bool enabled = true)

Parameters

enabled bool

Execute()

Executes the accumulated query operations and returns a materialized DataBlock. Uses DFC columnar optimizations when possible for maximum performance.

public DataBlock Execute()

Returns

DataBlock

FillNulls(FillMethod, object, params string[])

Adds a FillNulls operation to the query plan that will fill null values with a constant value. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock FillNulls(FillMethod method, object constantValue, params string[] columnNames)

Parameters

method FillMethod

The fill method to use (typically FillMethod.ConstantValue).

constantValue object

The constant value to use for filling nulls.

columnNames string[]

The columns to apply the fill operation to.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Fill nulls with constant value
var result = velocityDataBlock
    .Where("Status", "Active", ComparisonOperator.Equals)
    .FillNulls(FillMethod.ConstantValue, 0.0, "sales", "revenue")
    .Execute();

Exceptions

ArgumentException

Thrown when no columns are specified.

ObjectDisposedException

Thrown when the data block has been disposed.

FillNulls(FillMethod, params string[])

Adds a FillNulls operation to the query plan that will fill null values according to the specified method. This operation is evaluated lazily when Execute() is called.

public VelocityDataBlock FillNulls(FillMethod method, params string[] columnNames)

Parameters

method FillMethod

The fill method to use.

columnNames string[]

The columns to apply the fill operation to.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Country", "USA", ComparisonOperator.Equals)
    .Select("Date", "Temperature", "Humidity")
    .FillNulls(FillMethod.ForwardFill, "Temperature")
    .FillNulls(FillMethod.Mean, "Humidity")
    .Execute();

Exceptions

ArgumentException

Thrown when no columns are specified.

ObjectDisposedException

Thrown when the data block has been disposed.

Filter(Func<Dictionary<string, object>, bool>, params string[])

Filters rows based on a predicate and projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering and projection.

public VelocityDataBlock Filter(Func<Dictionary<string, object>, bool> predicate, params string[] columnNames)

Parameters

predicate Func<Dictionary<string, object>, bool>

A function that determines whether a row should be included based on its values.

columnNames string[]

The names of the columns to include. If null or empty, all columns are included.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Filter(row => (int)row["Age"] > 25 && (string)row["City"] == "New York", "Name", "Age")
    .Where("Department", "Engineering")
    .Execute();

Exceptions

ArgumentNullException

Thrown when the predicate is null.

ArgumentException

Thrown when any of the specified column names do not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

FindRowIdAsync(string)

Finds a row ID by primary key value

public Task<VelocityRowId?> FindRowIdAsync(string primaryKey)

Parameters

primaryKey string

Returns

Task<VelocityRowId?>

FlushAsync()

Flushes any pending changes to storage

public Task FlushAsync()

Returns

Task

GetColumn(string)

Gets a column by name

public DataColumn? GetColumn(string columnName)

Parameters

columnName string

Returns

DataColumn

GetPrimaryKeyIndexStats()

Gets performance statistics about the primary key index for benchmarking

public (int IndexedKeys, bool IndexBuilt, int Segments) GetPrimaryKeyIndexStats()

Returns

(int IndexedKeys, bool IndexBuilt, int Segments)

GetRowCursor()

Gets a row cursor for iterating through active rows

public IDataRowCursor GetRowCursor()

Returns

IDataRowCursor

GetRowCursor(params string[])

Gets a row cursor for specific columns

public IDataRowCursor GetRowCursor(params string[] columnNames)

Parameters

columnNames string[]

Returns

IDataRowCursor

GetStorageStatsAsync()

Gets comprehensive storage statistics

public Task<StorageStats> GetStorageStatsAsync()

Returns

Task<StorageStats>

GetValue(VelocityRowId, int)

Gets a value by row ID with automatic update following

public object? GetValue(VelocityRowId rowId, int columnIndex)

Parameters

rowId VelocityRowId
columnIndex int

Returns

object

GetValue(VelocityRowId, string)

Gets a value by row ID and column name with automatic update following

public object? GetValue(VelocityRowId rowId, string columnName)

Parameters

rowId VelocityRowId
columnName string

Returns

object

GetValue(int, int)

Gets a value from the data block (legacy row-based access)

public object? GetValue(int rowIndex, int columnIndex)

Parameters

rowIndex int
columnIndex int

Returns

object

GroupBy(string)

Groups the rows by the specified column name. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping.

public VelocityDataBlock GroupBy(string columnName)

Parameters

columnName string

The name of the column to group by.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .GroupBy("Department")
    .Execute();

Exceptions

ArgumentException

Thrown when the specified column does not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

GroupByAggregate(string, Dictionary<string, AggregationType>, Dictionary<string, string>)

Groups the data by the specified column and applies multiple aggregation functions. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.

public VelocityDataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations, Dictionary<string, string> resultColumnNames = null)

Parameters

groupByColumn string

The column to group by.

aggregations Dictionary<string, AggregationType>

Dictionary mapping column names to aggregation types.

resultColumnNames Dictionary<string, string>

Optional dictionary mapping aggregate columns to custom result column names.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

var aggregations = new Dictionary<string, AggregationType>
{
    ["session_duration"] = AggregationType.Mean,
    ["page_views"] = AggregationType.Sum,
    ["user_id"] = AggregationType.Count
};
var result = velocityDataBlock.GroupByAggregate("user_type", aggregations).Execute();

Exceptions

ArgumentException

Thrown when columns don't exist or aggregation types are invalid.

ObjectDisposedException

Thrown when the data block has been disposed.

GroupByAggregate(string, string, AggregationType, string)

Groups the data by the specified column and applies an aggregation function to another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.

public VelocityDataBlock GroupByAggregate(string groupByColumn, string aggregateColumn, AggregationType aggregationType, string resultColumnName = null)

Parameters

groupByColumn string

The column to group by.

aggregateColumn string

The column to aggregate.

aggregationType AggregationType

The type of aggregation to perform.

resultColumnName string

Optional custom name for the result column. If null, uses pattern like "avg_columnName".

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .GroupByAggregate("user_type", "session_duration", AggregationType.Mean, "avg_duration")
    .Execute();

Exceptions

ArgumentException

Thrown when columns don't exist or aggregation type is invalid for the column type.

ObjectDisposedException

Thrown when the data block has been disposed.

HasColumn(string)

Checks if a column exists

public bool HasColumn(string columnName)

Parameters

columnName string

Returns

bool

Head(int)

Returns the first rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient head operations.

public VelocityDataBlock Head(int rowCount)

Parameters

rowCount int

The number of rows to return.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Head(10)
    .Execute();

Exceptions

ArgumentOutOfRangeException

Thrown when rowCount is less than zero.

ObjectDisposedException

Thrown when the data block has been disposed.

Info()

Generates and returns a new DataBlock that contains summary information similar to the Info output in Pandas and Microsoft's DataFrame. The resulting DataBlock will include columns for column names, data types, non-null counts, and memory usage. This implementation is optimized for DFC format by reading null bitmaps directly.

public DataBlock Info()

Returns

DataBlock

A new DataBlock containing summary information.

InsertRow(int, object[])

Inserts a row at specific index (not supported efficiently in segmented storage)

public void InsertRow(int index, object[] values)

Parameters

index int
values object[]

IsRowDeleted(VelocityRowId)

Checks if a row is deleted by row ID

public bool IsRowDeleted(VelocityRowId rowId)

Parameters

rowId VelocityRowId

Returns

bool

Max(params string[])

Calculates the maximum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Max(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

Mean(params string[])

Calculates the mean value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Mean(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

Melt(IEnumerable<string>, string, string)

Unpivots the VelocityDataBlock from wide format to long format by keeping the specified fixed columns and converting the remaining columns into key-value pairs. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient melting operations.

public VelocityDataBlock Melt(IEnumerable<string> fixedColumns, string meltedColumnName, string meltedValueName)

Parameters

fixedColumns IEnumerable<string>

A collection of column names to remain fixed in the output.

meltedColumnName string

The name of the column that will hold the original column names that were melted.

meltedValueName string

The name of the column that will hold the values from the melted columns.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Melt(new[] { "ID", "Name" }, "Attribute", "Value")
    .Execute();

Exceptions

ArgumentNullException

Thrown when fixedColumns is null.

ArgumentException

Thrown when any fixed column does not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

Merge(DataBlock, string, string, MergeMode)

Merges the current VelocityDataBlock with another DataBlock based on specified key columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient join operations.

public VelocityDataBlock Merge(DataBlock other, string keyColumn, string keyColumnOther, MergeMode mergeMode)

Parameters

other DataBlock

The other DataBlock to merge with.

keyColumn string

The name of the key column in this VelocityDataBlock.

keyColumnOther string

The name of the key column in the other DataBlock.

mergeMode MergeMode

The merge mode specifying the type of join (Left, Right, Full, Inner).

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Merge(otherDataBlock, "ID", "UserID", MergeMode.Inner)
    .Select("Name", "Email", "Department")
    .Execute();

Exceptions

ArgumentException

Thrown if a specified key column is not present in the respective DataBlock.

ObjectDisposedException

Thrown when the data block has been disposed.

Min(params string[])

Calculates the minimum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Min(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

OpenAsync(string, VelocityOptions?)

Opens a DFC file as an enterprise VelocityDataBlock with full CRUD support.

public static Task<VelocityDataBlock> OpenAsync(string pathOrId, VelocityOptions? options = null)

Parameters

pathOrId string

The file path or identifier of the DFC file.

options VelocityOptions

Enterprise storage options.

Returns

Task<VelocityDataBlock>

A new VelocityDataBlock instance with segmented storage.

Pivot(IEnumerable<string>, string, string, AggregationType, string)

Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column, using multiple index columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.

public VelocityDataBlock Pivot(IEnumerable<string> indexColumns, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum, string columnNameFormat = "{pivot}_{value}")

Parameters

indexColumns IEnumerable<string>

The columns to use as row identifiers (become row keys in output).

pivotColumn string

The column whose unique values become new column names.

valueColumn string

The column containing values to aggregate.

aggregationType AggregationType

The aggregation function to apply when multiple values exist for the same index/pivot combination.

columnNameFormat string

Format string for generated column names. Use {pivot} for pivot value and {value} for value column name.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Pivot with multiple index columns
var result = velocityDataBlock
    .Pivot(new[] { "Year", "Category" }, "Region", "Sales", AggregationType.Sum)
    .Execute();

Exceptions

ArgumentException

Thrown when columns don't exist or aggregation type is invalid for the column type.

ObjectDisposedException

Thrown when the data block has been disposed.

Pivot(string, string, string, AggregationType)

Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.

public VelocityDataBlock Pivot(string indexColumn, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum)

Parameters

indexColumn string

The column to use as row identifier (becomes row keys in output).

pivotColumn string

The column whose unique values become new column names.

valueColumn string

The column containing values to aggregate.

aggregationType AggregationType

The aggregation function to apply when multiple values exist for the same index/pivot combination.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Year", 2024)
    .Pivot("Category", "Region", "Sales", AggregationType.Sum)
    .Execute();

Exceptions

ArgumentException

Thrown when columns don't exist or aggregation type is invalid for the column type.

ObjectDisposedException

Thrown when the data block has been disposed.

RemoveColumn(params string[])

Removes columns (not supported in segmented storage)

public void RemoveColumn(params string[] columnNames)

Parameters

columnNames string[]

RemoveRow(int)

Removes a row by index (legacy interface support)

public void RemoveRow(int rowIndex)

Parameters

rowIndex int

Sample(int, int?)

Returns a random sample of rowCount rows from the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sampling.

public VelocityDataBlock Sample(int rowCount, int? seed = null)

Parameters

rowCount int

The number of rows to include in the sample.

seed int?

Optional seed for random number generation.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sample(100, seed: 42)
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentOutOfRangeException

Thrown when rowCount is less than zero or greater than the total number of rows.

ObjectDisposedException

Thrown when the data block has been disposed.

SaveAsync(string, IDataBlock, VelocityOptions?)

Saves a DataBlock to enterprise DFC segmented storage with full CRUD support.

public static Task<VelocityDataBlock> SaveAsync(string pathOrId, IDataBlock source, VelocityOptions? options = null)

Parameters

pathOrId string

The target file path or identifier.

source IDataBlock

The source DataBlock to save.

options VelocityOptions

Enterprise storage options.

Returns

Task<VelocityDataBlock>

A new VelocityDataBlock instance with enterprise features.

Select(params string[])

Projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient column selection.

public VelocityDataBlock Select(params string[] columnNames)

Parameters

columnNames string[]

The names of the columns to include. If null or empty, all columns are included.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Select("Name", "Age", "City")
    .Where("Age", 25, ComparisonOperator.GreaterThan)
    .Execute();

Exceptions

ArgumentException

Thrown when any of the specified column names do not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

Sort(SortDirection, string)

Sorts the data in the VelocityDataBlock based on the specified column and sort direction. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sorting.

public VelocityDataBlock Sort(SortDirection direction, string columnName)

Parameters

direction SortDirection

The direction to sort the data (Ascending or Descending).

columnName string

The name of the column to sort by.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentException

Thrown when the specified column does not exist.

ObjectDisposedException

Thrown when the data block has been disposed.

StandardDeviation(params string[])

Calculates the standard deviation for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock StandardDeviation(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

Sum(params string[])

Calculates the sum for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Sum(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

Tail(int)

Returns the last rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient tail operations.

public VelocityDataBlock Tail(int rowCount)

Parameters

rowCount int

The number of rows to return.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Status", "Active")
    .Sort(SortDirection.Ascending, "Name")
    .Tail(10)
    .Execute();

Exceptions

ArgumentOutOfRangeException

Thrown when rowCount is less than zero.

ObjectDisposedException

Thrown when the data block has been disposed.

Transpose(string)

Transposes the rows and columns of the VelocityDataBlock. Converts rows into columns and columns into rows. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient transposition.

public VelocityDataBlock Transpose(string headerColumnName = null)

Parameters

headerColumnName string

Optional. The name of the column to use as headers for the transposed data. If not provided, the first row will be used as headers.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Type", "Metric")
    .Transpose("MetricName")
    .Execute();

Exceptions

ArgumentException

Thrown when the specified header column does not exist.

ObjectDisposedException

Thrown when the data block has been disposed.

UpdateRow(int, object[])

Updates a row by index (legacy interface support)

public void UpdateRow(int rowIndex, object[] values)

Parameters

rowIndex int
values object[]

UpdateRowAsync(VelocityRowId, object[])

Updates a row by internal row ID (O(1) performance)

public Task UpdateRowAsync(VelocityRowId rowId, object[] newValues)

Parameters

rowId VelocityRowId
newValues object[]

Returns

Task

UpdateRowAsync(string, object[])

Updates a row by primary key (requires primary key configuration)

public Task UpdateRowAsync(string primaryKey, object[] newValues)

Parameters

primaryKey string
newValues object[]

Returns

Task

UpdateRowsAsync(Dictionary<string, object[]>)

Updates multiple rows by primary keys (optimized batch operation)

public Task UpdateRowsAsync(Dictionary<string, object[]> updates)

Parameters

updates Dictionary<string, object[]>

Returns

Task

UpdateRowsByIndexAsync(Dictionary<int, object[]>)

Updates multiple rows by row index (optimized batch operation using internal row IDs) Ideal when no primary key is configured or when updating by position.

public Task UpdateRowsByIndexAsync(Dictionary<int, object[]> updates)

Parameters

updates Dictionary<int, object[]>

Returns

Task

ValidateExpression(string, out string)

Validates an expression against the current VelocityDataBlock schema.

public bool ValidateExpression(string expression, out string error)

Parameters

expression string

The expression to validate

error string

Output parameter containing error message if validation fails

Returns

bool

True if expression is valid, false otherwise

Variance(params string[])

Calculates the variance for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.

public VelocityDataBlock Variance(params string[] fields)

Parameters

fields string[]

The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

Where(string, object, ComparisonOperator)

Filters the data block to include only rows where the specified column matches the given value. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering.

public VelocityDataBlock Where(string columnName, object value, ComparisonOperator op = ComparisonOperator.Equals)

Parameters

columnName string

The name of the column to filter on.

value object

The value to compare against.

op ComparisonOperator

The comparison operator to use (default: Equals).

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Examples

// Chain operations with fluent API
var result = velocityDataBlock
    .Where("Age", 25, ComparisonOperator.GreaterThan)
    .Where("City", "New York")
    .Select("Name", "Age")
    .Execute();

Exceptions

ArgumentNullException

Thrown when columnName is null.

ArgumentException

Thrown when the specified column does not exist in the data block.

ObjectDisposedException

Thrown when the data block has been disposed.

WhereContains(string, string)

Filters the data block for string columns containing the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance. This method is part of the fluent query plan and does not execute immediately.

public VelocityDataBlock WhereContains(string columnName, string pattern)

Parameters

columnName string

The name of the string column to search in.

pattern string

The pattern to search for.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Exceptions

ArgumentNullException

Thrown when columnName or pattern is null.

ArgumentException

Thrown when the specified column does not exist.

ObjectDisposedException

Thrown when the data block has been disposed.

WhereEndsWith(string, string)

Filters the data block for string columns ending with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.

public VelocityDataBlock WhereEndsWith(string columnName, string pattern)

Parameters

columnName string

The name of the string column to search in.

pattern string

The pattern to search for.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

WhereStartsWith(string, string)

Filters the data block for string columns starting with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.

public VelocityDataBlock WhereStartsWith(string columnName, string pattern)

Parameters

columnName string

The name of the string column to search in.

pattern string

The pattern to search for.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Window(string, WindowFunctionType, int?, string, string, string[], WindowFrame, double?, object)

Applies a window function over the dataset, computing values based on a sliding or cumulative window. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations with SIMD vectorization for maximum performance.

public VelocityDataBlock Window(string columnName, WindowFunctionType functionType, int? windowSize = null, string resultColumnName = null, string orderByColumn = null, string[] partitionByColumns = null, WindowFrame frameSpec = null, double? percentile = null, object defaultValue = null)

Parameters

columnName string

The column to apply the window function to (null for RowNumber)

functionType WindowFunctionType

The type of window function to apply

windowSize int?

Window size for moving functions (required for Moving* functions). Ignored if frameSpec is provided.

resultColumnName string

Name for the result column (auto-generated if null)

orderByColumn string

Column to order by before applying window function

partitionByColumns string[]

Columns to partition by (applies window function within each partition)

frameSpec WindowFrame

Optional window frame specification (ROWS BETWEEN syntax). If null and windowSize is provided, auto-creates frame (N PRECEDING AND CURRENT ROW).

percentile double?

Percentile value for MovingPercentile function (0.0 to 1.0). Required for MovingPercentile, ignored for other functions.

defaultValue object

Default value to use when Lag/Lead functions reference rows that don't exist. If null, these functions will return null for out-of-bounds references.

Returns

VelocityDataBlock

This VelocityDataBlock instance for method chaining.

Remarks

Performance characteristics:

  • Moving functions (Average, Sum, Min, Max): 30-100M values/sec with SIMD
  • Cumulative functions: 50-150M values/sec with SIMD prefix sum algorithms
  • Lag/Lead: Near-memory-bandwidth with vectorized copying
  • Ranking functions: Optimized with parallel processing for large datasets

The three-tier optimization strategy:

  1. DFC stats metadata (when available) - metadata-only, 1000x+ faster
  2. SIMD vectorization (numeric types) - 10-50x faster than scalar
  3. Parallel processing (large datasets) - scales with core count
  4. Scalar fallback - maintains correctness for all data types

Exceptions

ObjectDisposedException

Thrown when the data block has been disposed.

ArgumentException

Thrown when parameters are invalid for the specified function type.