Class VelocityDataBlock
- Namespace
- Datafication.Storage.Velocity
- Assembly
- Datafication.Storage.Velocity.dll
Enterprise-grade data block implementation with full CRUD support using DFC segmented storage. Provides O(1) updates/deletes, automatic compaction, and production-ready data lake features.
public class VelocityDataBlock : IStorageDataBlock, IDataBlock
- Inheritance
-
objectVelocityDataBlock
- Implements
- Extension Methods
Constructors
VelocityDataBlock(string, VelocityOptions?)
Creates a new enterprise VelocityDataBlock with full CRUD support. If the file exists, it will be opened with segmented reader. If not, it will be ready for creation.
public VelocityDataBlock(string filePath, VelocityOptions? options = null)
Parameters
filePathstringThe path to the DFC file.
optionsVelocityOptionsEnterprise storage options for segmented operations.
Properties
ActiveRowCount
Gets the active (non-deleted) row count
public uint ActiveRowCount { get; }
Property Value
- uint
RowCount
Gets the active (non-deleted) row count
public int RowCount { get; }
Property Value
- int
Schema
Gets the schema of the data block
public DataSchema Schema { get; }
Property Value
SupportsBatchAppend
Supports batch append operations
public bool SupportsBatchAppend { get; }
Property Value
- bool
TotalRowCount
Gets the total row count including deleted rows
public uint TotalRowCount { get; }
Property Value
- uint
Methods
AddColumn(DataColumn)
Adds a column to the data block schema
public void AddColumn(DataColumn column)
Parameters
columnDataColumn
AddRow(params object[])
Adds a row to the data block (appends to storage)
public void AddRow(params object[] values)
Parameters
valuesobject[]
AppendAsync(IDataBlock)
Appends additional data to the data block (enterprise segmented approach)
public Task AppendAsync(IDataBlock additionalData)
Parameters
additionalDataIDataBlock
Returns
- Task
AppendBatchAsync(DataBlock)
Appends a batch of data efficiently using segmented storage (true append, no rewrites)
public Task AppendBatchAsync(DataBlock batch)
Parameters
batchDataBlock
Returns
- Task
AsResult()
Returns a lazy VelocityResult that enables efficient row counting and streaming without full data materialization. The query plan is executed to compute qualifying row indices, but data is not read until explicitly requested.
public VelocityResult AsResult()
Returns
- VelocityResult
A VelocityResult representing the query results.
Remarks
This method provides significant performance benefits for scenarios where:
- Only the row count is needed (uses SIMD PopCount, ~50-100x faster)
- Streaming enumeration is preferred over full materialization
- Memory usage must be minimized for large result sets
Example usage:
// Efficient counting without full materialization
var result = vdb.Where("Country", "USA").AsResult();
int count = result.RowCount; // Fast, no data read
// Explicit materialization when needed
DataBlock data = result.ToDataBlock();
// Streaming enumeration
foreach (var row in result.EnumerateRows())
{
Console.WriteLine(row["Name"]);
}
ClearQueryPlan()
Clears any pending query operations without executing them.
public VelocityDataBlock ClearQueryPlan()
Returns
Clone()
Creates a clone of the data block
public DataBlock Clone()
Returns
CompactAsync()
Compacts the storage to optimize performance and reclaim space
public Task CompactAsync()
Returns
- Task
CompactAsync(VelocityCompactionStrategy)
Compacts using a specific strategy
public Task CompactAsync(VelocityCompactionStrategy strategy)
Parameters
strategyVelocityCompactionStrategy
Returns
- Task
Compute(string, string)
Adds a computed column to the query plan based on an expression. Part of fluent query plan - deferred execution until Execute() is called. Uses DFC columnar optimizations and vectorization for high performance.
public VelocityDataBlock Compute(string columnName, string expression)
Parameters
columnNamestringThe name for the computed column
expressionstringThe expression to evaluate (e.g., "Total Profit / Total Revenue")
Returns
- VelocityDataBlock
The VelocityDataBlock instance for method chaining
Examples
var result = velocityDataBlock
.Select("Total Profit", "Total Revenue", "Country")
.Compute("Profit Margin", "Total Profit / Total Revenue")
.Where("Profit Margin", 0.25, ComparisonOperator.GreaterThan)
.Sort(SortDirection.Descending, "Profit Margin")
.Execute();
ConfigureAutoCompaction(bool, VelocityCompactionTrigger, int)
Configures automatic compaction settings
public void ConfigureAutoCompaction(bool enabled, VelocityCompactionTrigger trigger = VelocityCompactionTrigger.SegmentCount, int threshold = 10)
Parameters
enabledbooltriggerVelocityCompactionTriggerthresholdint
Count(params string[])
Counts the number of non-null values for each specified column or all columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient counting.
public VelocityDataBlock Count(params string[] fields)
Parameters
fieldsstring[]The fields to count. If null or empty, all fields will be counted.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
CreateEnterprise(string, string?)
Creates a VelocityDataBlock with enterprise features enabled
public static VelocityDataBlock CreateEnterprise(string filePath, string? primaryKeyColumn = null)
Parameters
filePathstringprimaryKeyColumnstring
Returns
CreateHighThroughput(string, string?)
Creates a VelocityDataBlock optimized for high-throughput workloads
public static VelocityDataBlock CreateHighThroughput(string filePath, string? primaryKeyColumn = null)
Parameters
filePathstringprimaryKeyColumnstring
Returns
DeleteRowAsync(VelocityRowId)
Deletes a row by internal row ID (O(1) performance)
public Task DeleteRowAsync(VelocityRowId rowId)
Parameters
rowIdVelocityRowId
Returns
- Task
DeleteRowAsync(string)
Deletes a row by primary key (requires primary key configuration)
public Task DeleteRowAsync(string primaryKey)
Parameters
primaryKeystring
Returns
- Task
DeleteRowsAsync(IEnumerable<string>)
Deletes multiple rows by primary keys (optimized batch operation)
public Task DeleteRowsAsync(IEnumerable<string> primaryKeys)
Parameters
primaryKeysIEnumerable<string>
Returns
- Task
Dispose()
Disposes the VelocityDataBlock and releases all resources
public void Dispose()
DropDuplicates(KeepDuplicateMode)
Adds a DropDuplicates operation to the query plan that removes duplicate rows based on all columns. This operation is evaluated lazily when Execute() is called.
public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep = KeepDuplicateMode.First)
Parameters
keepKeepDuplicateModeSpecifies which duplicates to keep (First, Last, or None). Defaults to First.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.DropDuplicates(KeepDuplicateMode.First)
.Where("Status", "Active")
.Execute();
DropDuplicates(KeepDuplicateMode, params string[])
Adds a DropDuplicates operation to the query plan that removes duplicate rows based on specific columns. This operation is evaluated lazily when Execute() is called.
public VelocityDataBlock DropDuplicates(KeepDuplicateMode keep, params string[] columns)
Parameters
keepKeepDuplicateModeSpecifies which duplicates to keep (First, Last, or None).
columnsstring[]The columns to consider when identifying duplicates.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Remove duplicates based on 'Name' and 'Email', keep first
var result = velocityDataBlock
.DropDuplicates(KeepDuplicateMode.First, "Name", "Email")
.Execute();
Exceptions
- ArgumentException
Thrown when no columns are specified or columns don't exist.
DropNulls(DropNullMode)
Returns a new DataBlock with rows dropped based on null values.
public DataBlock DropNulls(DropNullMode dropMode)
Parameters
dropModeDropNullModeSpecifies the criteria for dropping rows.
Returns
- DataBlock
A new DataBlock with rows dropped according to the specified criteria.
DropNulls(params string[])
Removes rows that contain null values in any of the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient null checking.
public VelocityDataBlock DropNulls(params string[] columnNames)
Parameters
columnNamesstring[]The names of the columns to check for null values. If null or empty, all columns are checked.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.DropNulls("Name", "Email")
.Where("Status", "Active")
.Execute();
Exceptions
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
EnableBackgroundCompaction(bool)
Enables background compaction for non-blocking optimization
public void EnableBackgroundCompaction(bool enabled = true)
Parameters
enabledbool
Execute()
Executes the accumulated query operations and returns a materialized DataBlock. Uses DFC columnar optimizations when possible for maximum performance.
public DataBlock Execute()
Returns
FillNulls(FillMethod, object, params string[])
Adds a FillNulls operation to the query plan that will fill null values with a constant value. This operation is evaluated lazily when Execute() is called.
public VelocityDataBlock FillNulls(FillMethod method, object constantValue, params string[] columnNames)
Parameters
methodFillMethodThe fill method to use (typically FillMethod.ConstantValue).
constantValueobjectThe constant value to use for filling nulls.
columnNamesstring[]The columns to apply the fill operation to.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Fill nulls with constant value
var result = velocityDataBlock
.Where("Status", "Active", ComparisonOperator.Equals)
.FillNulls(FillMethod.ConstantValue, 0.0, "sales", "revenue")
.Execute();
Exceptions
- ArgumentException
Thrown when no columns are specified.
- ObjectDisposedException
Thrown when the data block has been disposed.
FillNulls(FillMethod, params string[])
Adds a FillNulls operation to the query plan that will fill null values according to the specified method. This operation is evaluated lazily when Execute() is called.
public VelocityDataBlock FillNulls(FillMethod method, params string[] columnNames)
Parameters
methodFillMethodThe fill method to use.
columnNamesstring[]The columns to apply the fill operation to.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Country", "USA", ComparisonOperator.Equals)
.Select("Date", "Temperature", "Humidity")
.FillNulls(FillMethod.ForwardFill, "Temperature")
.FillNulls(FillMethod.Mean, "Humidity")
.Execute();
Exceptions
- ArgumentException
Thrown when no columns are specified.
- ObjectDisposedException
Thrown when the data block has been disposed.
Filter(Func<Dictionary<string, object>, bool>, params string[])
Filters rows based on a predicate and projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering and projection.
public VelocityDataBlock Filter(Func<Dictionary<string, object>, bool> predicate, params string[] columnNames)
Parameters
predicateFunc<Dictionary<string, object>, bool>A function that determines whether a row should be included based on its values.
columnNamesstring[]The names of the columns to include. If null or empty, all columns are included.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Filter(row => (int)row["Age"] > 25 && (string)row["City"] == "New York", "Name", "Age")
.Where("Department", "Engineering")
.Execute();
Exceptions
- ArgumentNullException
Thrown when the predicate is null.
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
FindRowIdAsync(string)
Finds a row ID by primary key value
public Task<VelocityRowId?> FindRowIdAsync(string primaryKey)
Parameters
primaryKeystring
Returns
- Task<VelocityRowId?>
FlushAsync()
Flushes any pending changes to storage
public Task FlushAsync()
Returns
- Task
GetColumn(string)
Gets a column by name
public DataColumn? GetColumn(string columnName)
Parameters
columnNamestring
Returns
GetPrimaryKeyIndexStats()
Gets performance statistics about the primary key index for benchmarking
public (int IndexedKeys, bool IndexBuilt, int Segments) GetPrimaryKeyIndexStats()
Returns
- (int IndexedKeys, bool IndexBuilt, int Segments)
GetRowCursor()
Gets a row cursor for iterating through active rows
public IDataRowCursor GetRowCursor()
Returns
GetRowCursor(params string[])
Gets a row cursor for specific columns
public IDataRowCursor GetRowCursor(params string[] columnNames)
Parameters
columnNamesstring[]
Returns
GetStorageStatsAsync()
Gets comprehensive storage statistics
public Task<StorageStats> GetStorageStatsAsync()
Returns
- Task<StorageStats>
GetValue(VelocityRowId, int)
Gets a value by row ID with automatic update following
public object? GetValue(VelocityRowId rowId, int columnIndex)
Parameters
rowIdVelocityRowIdcolumnIndexint
Returns
- object
GetValue(VelocityRowId, string)
Gets a value by row ID and column name with automatic update following
public object? GetValue(VelocityRowId rowId, string columnName)
Parameters
rowIdVelocityRowIdcolumnNamestring
Returns
- object
GetValue(int, int)
Gets a value from the data block (legacy row-based access)
public object? GetValue(int rowIndex, int columnIndex)
Parameters
rowIndexintcolumnIndexint
Returns
- object
GroupBy(string)
Groups the rows by the specified column name. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping.
public VelocityDataBlock GroupBy(string columnName)
Parameters
columnNamestringThe name of the column to group by.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.GroupBy("Department")
.Execute();
Exceptions
- ArgumentException
Thrown when the specified column does not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
GroupByAggregate(string, Dictionary<string, AggregationType>, Dictionary<string, string>)
Groups the data by the specified column and applies multiple aggregation functions. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.
public VelocityDataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations, Dictionary<string, string> resultColumnNames = null)
Parameters
groupByColumnstringThe column to group by.
aggregationsDictionary<string, AggregationType>Dictionary mapping column names to aggregation types.
resultColumnNamesDictionary<string, string>Optional dictionary mapping aggregate columns to custom result column names.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
var aggregations = new Dictionary<string, AggregationType>
{
["session_duration"] = AggregationType.Mean,
["page_views"] = AggregationType.Sum,
["user_id"] = AggregationType.Count
};
var result = velocityDataBlock.GroupByAggregate("user_type", aggregations).Execute();
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation types are invalid.
- ObjectDisposedException
Thrown when the data block has been disposed.
GroupByAggregate(string, string, AggregationType, string)
Groups the data by the specified column and applies an aggregation function to another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient grouping and aggregation.
public VelocityDataBlock GroupByAggregate(string groupByColumn, string aggregateColumn, AggregationType aggregationType, string resultColumnName = null)
Parameters
groupByColumnstringThe column to group by.
aggregateColumnstringThe column to aggregate.
aggregationTypeAggregationTypeThe type of aggregation to perform.
resultColumnNamestringOptional custom name for the result column. If null, uses pattern like "avg_columnName".
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.GroupByAggregate("user_type", "session_duration", AggregationType.Mean, "avg_duration")
.Execute();
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation type is invalid for the column type.
- ObjectDisposedException
Thrown when the data block has been disposed.
HasColumn(string)
Checks if a column exists
public bool HasColumn(string columnName)
Parameters
columnNamestring
Returns
- bool
Head(int)
Returns the first rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient head operations.
public VelocityDataBlock Head(int rowCount)
Parameters
rowCountintThe number of rows to return.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Sort(SortDirection.Ascending, "Name")
.Head(10)
.Execute();
Exceptions
- ArgumentOutOfRangeException
Thrown when rowCount is less than zero.
- ObjectDisposedException
Thrown when the data block has been disposed.
Info()
Generates and returns a new DataBlock that contains summary information
similar to the Info output in Pandas and Microsoft's DataFrame.
The resulting DataBlock will include columns for column names, data types,
non-null counts, and memory usage.
This implementation is optimized for DFC format by reading null bitmaps directly.
public DataBlock Info()
Returns
- DataBlock
A new DataBlock containing summary information.
InsertRow(int, object[])
Inserts a row at specific index (not supported efficiently in segmented storage)
public void InsertRow(int index, object[] values)
Parameters
indexintvaluesobject[]
IsRowDeleted(VelocityRowId)
Checks if a row is deleted by row ID
public bool IsRowDeleted(VelocityRowId rowId)
Parameters
rowIdVelocityRowId
Returns
- bool
Max(params string[])
Calculates the maximum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock Max(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
Mean(params string[])
Calculates the mean value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock Mean(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
Melt(IEnumerable<string>, string, string)
Unpivots the VelocityDataBlock from wide format to long format by keeping the specified fixed columns and converting the remaining columns into key-value pairs. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient melting operations.
public VelocityDataBlock Melt(IEnumerable<string> fixedColumns, string meltedColumnName, string meltedValueName)
Parameters
fixedColumnsIEnumerable<string>A collection of column names to remain fixed in the output.
meltedColumnNamestringThe name of the column that will hold the original column names that were melted.
meltedValueNamestringThe name of the column that will hold the values from the melted columns.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Melt(new[] { "ID", "Name" }, "Attribute", "Value")
.Execute();
Exceptions
- ArgumentNullException
Thrown when fixedColumns is null.
- ArgumentException
Thrown when any fixed column does not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
Merge(DataBlock, string, string, MergeMode)
Merges the current VelocityDataBlock with another DataBlock based on specified key columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient join operations.
public VelocityDataBlock Merge(DataBlock other, string keyColumn, string keyColumnOther, MergeMode mergeMode)
Parameters
otherDataBlockThe other DataBlock to merge with.
keyColumnstringThe name of the key column in this VelocityDataBlock.
keyColumnOtherstringThe name of the key column in the other DataBlock.
mergeModeMergeModeThe merge mode specifying the type of join (Left, Right, Full, Inner).
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Merge(otherDataBlock, "ID", "UserID", MergeMode.Inner)
.Select("Name", "Email", "Department")
.Execute();
Exceptions
- ArgumentException
Thrown if a specified key column is not present in the respective DataBlock.
- ObjectDisposedException
Thrown when the data block has been disposed.
Min(params string[])
Calculates the minimum value for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock Min(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
OpenAsync(string, VelocityOptions?)
Opens a DFC file as an enterprise VelocityDataBlock with full CRUD support.
public static Task<VelocityDataBlock> OpenAsync(string pathOrId, VelocityOptions? options = null)
Parameters
pathOrIdstringThe file path or identifier of the DFC file.
optionsVelocityOptionsEnterprise storage options.
Returns
- Task<VelocityDataBlock>
A new VelocityDataBlock instance with segmented storage.
Pivot(IEnumerable<string>, string, string, AggregationType, string)
Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column, using multiple index columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.
public VelocityDataBlock Pivot(IEnumerable<string> indexColumns, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum, string columnNameFormat = "{pivot}_{value}")
Parameters
indexColumnsIEnumerable<string>The columns to use as row identifiers (become row keys in output).
pivotColumnstringThe column whose unique values become new column names.
valueColumnstringThe column containing values to aggregate.
aggregationTypeAggregationTypeThe aggregation function to apply when multiple values exist for the same index/pivot combination.
columnNameFormatstringFormat string for generated column names. Use {pivot} for pivot value and {value} for value column name.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Pivot with multiple index columns
var result = velocityDataBlock
.Pivot(new[] { "Year", "Category" }, "Region", "Sales", AggregationType.Sum)
.Execute();
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation type is invalid for the column type.
- ObjectDisposedException
Thrown when the data block has been disposed.
Pivot(string, string, string, AggregationType)
Transforms the VelocityDataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient pivot operations.
public VelocityDataBlock Pivot(string indexColumn, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum)
Parameters
indexColumnstringThe column to use as row identifier (becomes row keys in output).
pivotColumnstringThe column whose unique values become new column names.
valueColumnstringThe column containing values to aggregate.
aggregationTypeAggregationTypeThe aggregation function to apply when multiple values exist for the same index/pivot combination.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Year", 2024)
.Pivot("Category", "Region", "Sales", AggregationType.Sum)
.Execute();
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation type is invalid for the column type.
- ObjectDisposedException
Thrown when the data block has been disposed.
RemoveColumn(params string[])
Removes columns (not supported in segmented storage)
public void RemoveColumn(params string[] columnNames)
Parameters
columnNamesstring[]
RemoveRow(int)
Removes a row by index (legacy interface support)
public void RemoveRow(int rowIndex)
Parameters
rowIndexint
Sample(int, int?)
Returns a random sample of rowCount rows from the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sampling.
public VelocityDataBlock Sample(int rowCount, int? seed = null)
Parameters
rowCountintThe number of rows to include in the sample.
seedint?Optional seed for random number generation.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Sample(100, seed: 42)
.Select("Name", "Age")
.Execute();
Exceptions
- ArgumentOutOfRangeException
Thrown when rowCount is less than zero or greater than the total number of rows.
- ObjectDisposedException
Thrown when the data block has been disposed.
SaveAsync(string, IDataBlock, VelocityOptions?)
Saves a DataBlock to enterprise DFC segmented storage with full CRUD support.
public static Task<VelocityDataBlock> SaveAsync(string pathOrId, IDataBlock source, VelocityOptions? options = null)
Parameters
pathOrIdstringThe target file path or identifier.
sourceIDataBlockThe source DataBlock to save.
optionsVelocityOptionsEnterprise storage options.
Returns
- Task<VelocityDataBlock>
A new VelocityDataBlock instance with enterprise features.
Select(params string[])
Projects the data block to include only the specified columns. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient column selection.
public VelocityDataBlock Select(params string[] columnNames)
Parameters
columnNamesstring[]The names of the columns to include. If null or empty, all columns are included.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Select("Name", "Age", "City")
.Where("Age", 25, ComparisonOperator.GreaterThan)
.Execute();
Exceptions
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
Sort(SortDirection, string)
Sorts the data in the VelocityDataBlock based on the specified column and sort direction. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient sorting.
public VelocityDataBlock Sort(SortDirection direction, string columnName)
Parameters
directionSortDirectionThe direction to sort the data (Ascending or Descending).
columnNamestringThe name of the column to sort by.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Sort(SortDirection.Ascending, "Name")
.Select("Name", "Age")
.Execute();
Exceptions
- ArgumentException
Thrown when the specified column does not exist.
- ObjectDisposedException
Thrown when the data block has been disposed.
StandardDeviation(params string[])
Calculates the standard deviation for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock StandardDeviation(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
Sum(params string[])
Calculates the sum for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock Sum(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
Tail(int)
Returns the last rowCount rows of the VelocityDataBlock. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient tail operations.
public VelocityDataBlock Tail(int rowCount)
Parameters
rowCountintThe number of rows to return.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Status", "Active")
.Sort(SortDirection.Ascending, "Name")
.Tail(10)
.Execute();
Exceptions
- ArgumentOutOfRangeException
Thrown when rowCount is less than zero.
- ObjectDisposedException
Thrown when the data block has been disposed.
Transpose(string)
Transposes the rows and columns of the VelocityDataBlock. Converts rows into columns and columns into rows. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient transposition.
public VelocityDataBlock Transpose(string headerColumnName = null)
Parameters
headerColumnNamestringOptional. The name of the column to use as headers for the transposed data. If not provided, the first row will be used as headers.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Type", "Metric")
.Transpose("MetricName")
.Execute();
Exceptions
- ArgumentException
Thrown when the specified header column does not exist.
- ObjectDisposedException
Thrown when the data block has been disposed.
UpdateRow(int, object[])
Updates a row by index (legacy interface support)
public void UpdateRow(int rowIndex, object[] values)
Parameters
rowIndexintvaluesobject[]
UpdateRowAsync(VelocityRowId, object[])
Updates a row by internal row ID (O(1) performance)
public Task UpdateRowAsync(VelocityRowId rowId, object[] newValues)
Parameters
rowIdVelocityRowIdnewValuesobject[]
Returns
- Task
UpdateRowAsync(string, object[])
Updates a row by primary key (requires primary key configuration)
public Task UpdateRowAsync(string primaryKey, object[] newValues)
Parameters
primaryKeystringnewValuesobject[]
Returns
- Task
UpdateRowsAsync(Dictionary<string, object[]>)
Updates multiple rows by primary keys (optimized batch operation)
public Task UpdateRowsAsync(Dictionary<string, object[]> updates)
Parameters
updatesDictionary<string, object[]>
Returns
- Task
UpdateRowsByIndexAsync(Dictionary<int, object[]>)
Updates multiple rows by row index (optimized batch operation using internal row IDs) Ideal when no primary key is configured or when updating by position.
public Task UpdateRowsByIndexAsync(Dictionary<int, object[]> updates)
Parameters
updatesDictionary<int, object[]>
Returns
- Task
ValidateExpression(string, out string)
Validates an expression against the current VelocityDataBlock schema.
public bool ValidateExpression(string expression, out string error)
Parameters
expressionstringThe expression to validate
errorstringOutput parameter containing error message if validation fails
Returns
- bool
True if expression is valid, false otherwise
Variance(params string[])
Calculates the variance for each specified column or all columns containing numeric primitive data types. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient aggregation.
public VelocityDataBlock Variance(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
Where(string, object, ComparisonOperator)
Filters the data block to include only rows where the specified column matches the given value. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations for efficient filtering.
public VelocityDataBlock Where(string columnName, object value, ComparisonOperator op = ComparisonOperator.Equals)
Parameters
columnNamestringThe name of the column to filter on.
valueobjectThe value to compare against.
opComparisonOperatorThe comparison operator to use (default: Equals).
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Examples
// Chain operations with fluent API
var result = velocityDataBlock
.Where("Age", 25, ComparisonOperator.GreaterThan)
.Where("City", "New York")
.Select("Name", "Age")
.Execute();
Exceptions
- ArgumentNullException
Thrown when columnName is null.
- ArgumentException
Thrown when the specified column does not exist in the data block.
- ObjectDisposedException
Thrown when the data block has been disposed.
WhereContains(string, string)
Filters the data block for string columns containing the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance. This method is part of the fluent query plan and does not execute immediately.
public VelocityDataBlock WhereContains(string columnName, string pattern)
Parameters
columnNamestringThe name of the string column to search in.
patternstringThe pattern to search for.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Exceptions
- ArgumentNullException
Thrown when columnName or pattern is null.
- ArgumentException
Thrown when the specified column does not exist.
- ObjectDisposedException
Thrown when the data block has been disposed.
WhereEndsWith(string, string)
Filters the data block for string columns ending with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.
public VelocityDataBlock WhereEndsWith(string columnName, string pattern)
Parameters
columnNamestringThe name of the string column to search in.
patternstringThe pattern to search for.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
WhereStartsWith(string, string)
Filters the data block for string columns starting with the specified pattern. Uses Intel AVX2-optimized pattern matching for maximum performance.
public VelocityDataBlock WhereStartsWith(string columnName, string pattern)
Parameters
columnNamestringThe name of the string column to search in.
patternstringThe pattern to search for.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Window(string, WindowFunctionType, int?, string, string, string[], WindowFrame, double?, object)
Applies a window function over the dataset, computing values based on a sliding or cumulative window. This method is part of the fluent query plan and does not execute immediately. When executed, it will use DFC columnar optimizations with SIMD vectorization for maximum performance.
public VelocityDataBlock Window(string columnName, WindowFunctionType functionType, int? windowSize = null, string resultColumnName = null, string orderByColumn = null, string[] partitionByColumns = null, WindowFrame frameSpec = null, double? percentile = null, object defaultValue = null)
Parameters
columnNamestringThe column to apply the window function to (null for RowNumber)
functionTypeWindowFunctionTypeThe type of window function to apply
windowSizeint?Window size for moving functions (required for Moving* functions). Ignored if frameSpec is provided.
resultColumnNamestringName for the result column (auto-generated if null)
orderByColumnstringColumn to order by before applying window function
partitionByColumnsstring[]Columns to partition by (applies window function within each partition)
frameSpecWindowFrameOptional window frame specification (ROWS BETWEEN syntax). If null and windowSize is provided, auto-creates frame (N PRECEDING AND CURRENT ROW).
percentiledouble?Percentile value for MovingPercentile function (0.0 to 1.0). Required for MovingPercentile, ignored for other functions.
defaultValueobjectDefault value to use when Lag/Lead functions reference rows that don't exist. If null, these functions will return null for out-of-bounds references.
Returns
- VelocityDataBlock
This VelocityDataBlock instance for method chaining.
Remarks
Performance characteristics:
- Moving functions (Average, Sum, Min, Max): 30-100M values/sec with SIMD
- Cumulative functions: 50-150M values/sec with SIMD prefix sum algorithms
- Lag/Lead: Near-memory-bandwidth with vectorized copying
- Ranking functions: Optimized with parallel processing for large datasets
The three-tier optimization strategy:
- DFC stats metadata (when available) - metadata-only, 1000x+ faster
- SIMD vectorization (numeric types) - 10-50x faster than scalar
- Parallel processing (large datasets) - scales with core count
- Scalar fallback - maintains correctness for all data types
Exceptions
- ObjectDisposedException
Thrown when the data block has been disposed.
- ArgumentException
Thrown when parameters are invalid for the specified function type.