Class DataBlock

Namespace: Datafication.Core.Data

Assembly: Datafication.Core.dll

Represents a block of data with rows and columns.

public class DataBlock : IDataBlock

Inheritance: object

DataBlock

Implements: IDataBlock

Extension Methods: HtmlTableSinkExtension.HtmlTable(DataBlock, int, bool)

HtmlTableSinkExtension.HtmlTableAsync(DataBlock, int, bool)

TextTableSinkExtension.TextTableAsync(DataBlock, int)

CsvStringSinkExtension.CsvStringSink(DataBlock)

CsvStringSinkExtension.CsvStringSinkAsync(DataBlock)

JsonStringSinkExtension.JsonStringSink(DataBlock)

JsonStringSinkExtension.JsonStringSinkAsync(DataBlock)

ParquetSinkExtension.ParquetSink(DataBlock, CompressionMethod)

ParquetSinkExtension.ParquetSink(DataBlock, out List<string>, CompressionMethod)

ParquetSinkExtension.ParquetSinkAsync(DataBlock, CompressionMethod)

ParquetSinkExtension.ParquetSinkWithSkippedColumns(DataBlock, CompressionMethod)

ParquetSinkExtension.ParquetSinkWithSkippedColumnsAsync(DataBlock, CompressionMethod)

ScreenshotSinkExtension.Screenshot(DataBlock, int, string?)

ScreenshotSinkExtension.ScreenshotAsync(DataBlock, int, string?)

ScreenshotSinkExtension.ScreenshotHighResAsync(DataBlock, int, string?)

ScreenshotSinkExtension.ScreenshotToFile(DataBlock, string, int, string?, ScreenshotFormat, int?)

ScreenshotSinkExtension.ScreenshotToFileAsync(DataBlock, string, int, string?, ScreenshotFormat, int?)

Constructors

DataBlock()

Initializes a new instance of the DataBlock class.

public DataBlock()

DataBlock(DataBlockSnapshot)

public DataBlock(DataBlockSnapshot snapshot)

Parameters

snapshot DataBlockSnapshot

Properties

Connector

public static ConnectorExtensions Connector { get; }

Property Value

ConnectorExtensions

IsDisposed

Gets a value indicating whether this DataBlock has been disposed.

public bool IsDisposed { get; }

Property Value

bool

this[int, string]

Gets or sets the value for a specified row and column.

public object this[int row, string columnName] { get; set; }

Parameters

row int: The row index.
columnName string: The column name.

Property Value

object: The value at the specified row and column.

this[string]

Gets a column by its name.

public DataColumn this[string columnName] { get; }

Parameters

columnName string: The name of the column to retrieve.

Property Value

DataColumn: The IDataColumn instance representing the column.

RowCount

Gets the number of rows in the data block.

public int RowCount { get; }

Property Value

int

Schema

Gets the schema of the data block.

public DataSchema Schema { get; }

Property Value

DataSchema

Methods

AddColumn(DataColumn)

Adds a column to the data block.

public void AddColumn(DataColumn column)

Parameters

column DataColumn: The column to add.

AddRow(object[])

Adds a row to the data block by updating the values in each column.

public void AddRow(object[] values)

Parameters

values object[]: The values to add as a new row.

AppendRowsBatch(DataBlock)

Efficiently appends rows from another DataBlock using batch operations.

public void AppendRowsBatch(DataBlock source)

Parameters

source DataBlock: The source DataBlock to append from.

Exceptions

InvalidOperationException: Thrown when schemas don't match.

Clone()

Creates a clone of the current data block.

public DataBlock Clone()

Returns

DataBlock: A new DataBlock instance that is a copy of the current block.

Compute(string, string)

Adds a computed column to the DataBlock based on an expression. Returns a new DataBlock with all existing columns plus the computed column. PERFORMANCE: Uses column reference sharing to avoid redundant copying when chaining.

public DataBlock Compute(string columnName, string expression)

Parameters

columnName string: The name for the computed column
expression string: The expression to evaluate (e.g., "Total Profit / Total Revenue")

Returns

DataBlock: A new DataBlock with the computed column added

Examples

var result = dataBlock
    .Select("Total Profit", "Total Revenue", "Country")
    .Compute("Profit Margin", "Total Profit / Total Revenue")
    .Where("Profit Margin", 0.25, ComparisonOperator.GreaterThan)
    .Execute();

CopyRowRange(int, int)

Efficiently copies a range of rows using direct column operations.

public DataBlock CopyRowRange(int startRow, int rowCount)

Parameters

startRow int: The starting row index.
rowCount int: The number of rows to copy.

Returns

DataBlock: A new DataBlock containing the specified row range.

Exceptions

ArgumentOutOfRangeException: Thrown when startRow or rowCount are invalid.

Dispose()

Releases all resources used by the DataBlock.

public void Dispose()

Dispose(bool)

Releases the unmanaged resources used by the DataBlock and optionally releases the managed resources.

protected virtual void Dispose(bool disposing)

Parameters

disposing bool: true to release both managed and unmanaged resources; false to release only unmanaged resources.

DropDuplicates(KeepDuplicateMode)

Returns a new DataBlock with duplicate rows removed based on all columns.

public DataBlock DropDuplicates(KeepDuplicateMode keep = KeepDuplicateMode.First)

Parameters

keep KeepDuplicateMode: Specifies which duplicates to keep (First, Last, or None). Defaults to First.

Returns

DataBlock: A new DataBlock with duplicates removed.

Examples

// Keep first occurrence of duplicates
var result = dataBlock.DropDuplicates();

// Keep last occurrence of duplicates
var result = dataBlock.DropDuplicates(KeepDuplicateMode.Last);

// Remove all duplicates (keep only unique rows)
var result = dataBlock.DropDuplicates(KeepDuplicateMode.None);

DropDuplicates(KeepDuplicateMode, params string[])

Returns a new DataBlock with duplicate rows removed based on specific columns.

public DataBlock DropDuplicates(KeepDuplicateMode keep, params string[] columns)

Parameters

keep KeepDuplicateMode: Specifies which duplicates to keep (First, Last, or None).
columns string[]: The columns to consider when identifying duplicates.

Returns

DataBlock: A new DataBlock with duplicates removed.

Examples

// Remove duplicates based on 'Name' column, keep first
var result = dataBlock.DropDuplicates(KeepDuplicateMode.First, "Name");

// Remove duplicates based on 'Name' and 'Email', keep last
var result = dataBlock.DropDuplicates(KeepDuplicateMode.Last, "Name", "Email");

Exceptions

ArgumentException: Thrown when no columns are specified or columns don't exist.

DropNulls(DropNullMode)

Returns a new DataBlock with rows dropped based on null values.

public DataBlock DropNulls(DropNullMode dropMode)

Parameters

dropMode DropNullMode: Specifies the criteria for dropping rows.

Returns

DataBlock: A new DataBlock with rows dropped according to the specified criteria.

FillNulls(FillMethod, object, params string[])

Returns a new DataBlock with null values filled with a constant value.

public DataBlock FillNulls(FillMethod method, object constantValue, params string[] columnNames)

Parameters

method FillMethod: The fill method to use (should be FillMethod.ConstantValue).
constantValue object: The constant value to use for filling nulls.
columnNames string[]: The columns to apply the fill operation to.

Returns

DataBlock: A new DataBlock with filled values.

Exceptions

ArgumentException: Thrown when no columns are specified or columns don't exist.

FillNulls(FillMethod, params string[])

Returns a new DataBlock with null values filled according to the specified method.

public DataBlock FillNulls(FillMethod method, params string[] columnNames)

Parameters

method FillMethod: The fill method to use.
columnNames string[]: The columns to apply the fill operation to.

Returns

DataBlock: A new DataBlock with filled values.

Exceptions

ArgumentException: Thrown when no columns are specified or columns don't exist.

Filter(Func<Dictionary<string, object>, bool>, params string[])

Filters rows based on a predicate and projects the data block to include only the specified columns.

public DataBlock Filter(Func<Dictionary<string, object>, bool> predicate, params string[] columnNames)

Parameters

predicate Func<Dictionary<string, object>, bool>: A function that determines whether a row should be included based on its values.
columnNames string[]: The names of the columns to include. If null or empty, all columns are included.

Returns

DataBlock: A new DataBlock containing only the filtered rows and specified columns.

Examples

// Create a data block with multiple columns
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));

// Add some rows
dataBlock.AddRow(new object[] { "John", 25, "New York" });
dataBlock.AddRow(new object[] { "Jane", 30, "London" });
dataBlock.AddRow(new object[] { "Bob", 22, "Paris" });

// Filter rows where Age > 25 and project only Name and Age columns
var filteredBlock = dataBlock.Filter(
    row => (int)row["Age"] > 25,
    "Name", "Age"
);

Exceptions

ArgumentNullException: Thrown when the predicate is null.
ArgumentException: Thrown when any of the specified column names do not exist in the data block.

FilterWithCursor(Func<IDataRowCursor, bool>, params string[])

Filters rows using a cursor-based predicate and projects the specified columns.

public DataBlock FilterWithCursor(Func<IDataRowCursor, bool> predicate, params string[] columnNames)

Parameters

predicate Func<IDataRowCursor, bool>: A function to test each row using a cursor. The cursor provides access to all column values for the current row.
columnNames string[]: The names of the columns to include in the result. If null or empty, all columns are included.

Returns

DataBlock: A new DataBlock containing only the rows that satisfy the predicate and the specified columns.

Examples

// Create a data block with sample data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));
dataBlock.AddRow(new object[] { "John", 30, "London" });
dataBlock.AddRow(new object[] { "Jane", 25, "Paris" });

// Filter rows where Age > 25 and City starts with 'L', project all columns
var filteredBlock = dataBlock.FilterWithCursor(
    cursor => (int)cursor.GetValue("Age") > 25 && ((string)cursor.GetValue("City")).StartsWith("L")
);

Exceptions

ArgumentNullException: Thrown when the predicate is null.
ArgumentException: Thrown when any of the specified column names do not exist in the data block.

GetColumn(string)

Gets a column by its name.

public DataColumn GetColumn(string columnName)

Parameters

columnName string: The name of the column to retrieve.

Returns

DataColumn: The IDataColumn instance representing the column.

GetRowCursor(params string[])

Gets a row cursor for iterating over rows with specified columns.

public IDataRowCursor GetRowCursor(params string[] columnNames)

Parameters

columnNames string[]: The names of the columns to include in the cursor.

Returns

IDataRowCursor: A IDataRowCursor that allows iteration over the rows.

GroupBy(string)

Groups the rows by the specified column name. Uses index-based grouping for 3-5x better performance by avoiding per-row array allocations.

public DataBlockGroup GroupBy(string columnName)

Parameters

columnName string: The name of the column to group by.

Returns

DataBlockGroup: A DataBlockGroup containing the grouped data.

GroupByAggregate(string, Dictionary<string, AggregationType>)

Groups the data by the specified column and applies multiple aggregation functions. This method provides SQL-style GROUP BY functionality with multiple aggregations in a single operation.

public DataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations)

Parameters

groupByColumn string: The column to group by.
aggregations Dictionary<string, AggregationType>: Dictionary mapping column names to aggregation types.

Returns

DataBlock: A new DataBlock with group keys and multiple aggregated columns.

Examples

var aggregations = new Dictionary<string, AggregationType>
{
    ["session_duration"] = AggregationType.Mean,
    ["page_views"] = AggregationType.Sum,
    ["user_id"] = AggregationType.Count
};
var result = dataBlock.GroupByAggregate("user_type", aggregations);

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation types are invalid.

GroupByAggregate(string, Dictionary<string, AggregationType>, Dictionary<string, string>)

Groups the data by the specified column and applies multiple aggregation functions with custom result column names. This method provides SQL-style GROUP BY functionality with multiple aggregations in a single operation.

public DataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations, Dictionary<string, string> resultColumnNames)

Parameters

groupByColumn string: The column to group by.
aggregations Dictionary<string, AggregationType>: Dictionary mapping column names to aggregation types.
resultColumnNames Dictionary<string, string>: Dictionary mapping aggregate columns to custom result column names.

Returns

DataBlock: A new DataBlock with group keys and multiple aggregated columns.

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation types are invalid.

GroupByAggregate(string, string, AggregationType, string)

Groups the data by the specified column and applies an aggregation function to another column. This method provides SQL-style GROUP BY functionality with aggregation in a single operation.

public DataBlock GroupByAggregate(string groupByColumn, string aggregateColumn, AggregationType aggregationType, string resultColumnName = null)

Parameters

groupByColumn string: The column to group by.
aggregateColumn string: The column to aggregate.
aggregationType AggregationType: The type of aggregation to perform.
resultColumnName string: Optional custom name for the result column. If null, uses pattern like "avg_columnName".

Returns

DataBlock: A new DataBlock with group keys and aggregated values.

Examples

// SQL: SELECT user_type, AVG(session_duration) AS avg_duration FROM user_sessions GROUP BY user_type
var result = dataBlock.GroupByAggregate("user_type", "session_duration", AggregationType.Mean, "avg_duration");

Exceptions

ArgumentException: Thrown when columns don't exist or aggregation type is invalid for the column type.

HasColumn(string)

Determines whether the data block contains a column with the specified name.

public bool HasColumn(string columnName)

Parameters

columnName string: The name of the column to check.

Returns

bool: true if the column exists; otherwise, false.

Head(int)

Returns a new DataBlock containing the first rowCount rows of the current DataBlock.

public DataBlock Head(int rowCount)

Parameters

rowCount int: The number of rows to return.

Returns

DataBlock: A new DataBlock containing the first rowCount rows.

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero.

Info()

Generates and returns a new DataBlock that contains summary information similar to the Info output in Pandas and Microsoft's DataFrame. The resulting DataBlock will include columns for column names, data types, non-null counts, and memory usage.

public DataBlock Info()

Returns

DataBlock: A new DataBlock containing summary information.

InsertRow(int, object[])

Inserts a row at a specific index by updating the values in each column.

public void InsertRow(int index, object[] values)

Parameters

index int: The index to insert the row at.
values object[]: The values for the row.

Max(params string[])

Calculates the maximum value for each specified column or all columns containing numeric primitive data types.

public DataBlock Max(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the maximum values.

Mean(params string[])

Calculates the mean value for each specified column or all columns containing numeric primitive data types.

public DataBlock Mean(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the mean values.

Melt(IEnumerable<string>, string, string)

Unpivots the DataBlock from wide format to long format by keeping the specified fixed columns and converting the remaining columns into key-value pairs.

public DataBlock Melt(IEnumerable<string> fixedColumns, string meltedColumnName, string meltedValueName)

Parameters

fixedColumns IEnumerable<string>: A collection of column names to remain fixed in the output.
meltedColumnName string: The name of the column that will hold the original column names that were melted.
meltedValueName string: The name of the column that will hold the values from the melted columns.

Returns

DataBlock: A new DataBlock that is the result of the melt operation.

Merge(DataBlock, string, MergeMode)

Merges the current DataBlock with another DataBlock using a single key column for both.

public DataBlock Merge(DataBlock other, string keyColumn, MergeMode mergeMode)

Parameters

other DataBlock: The other DataBlock to merge with.
keyColumn string: The name of the key column to join on.
mergeMode MergeMode: The merge mode specifying the type of join (Left, Right, Full, Inner).

Returns

DataBlock: A new DataBlock containing the result of the merge operation.

Merge(DataBlock, string, string, MergeMode)

Merges the current DataBlock with another DataBlock based on specified key columns.

public DataBlock Merge(DataBlock other, string keyColumn, string keyColumnOther, MergeMode mergeMode)

Parameters

other DataBlock: The other DataBlock to merge with.
keyColumn string: The name of the key column in this DataBlock.
keyColumnOther string: The name of the key column in the other DataBlock.
mergeMode MergeMode: The merge mode specifying the type of join (Left, Right, Full, Inner).

Returns

DataBlock: A new DataBlock containing the result of the merge operation.

Exceptions

ArgumentException: Thrown if a specified key column is not present in the respective DataBlock.

Min(params string[])

Calculates the minimum value for each specified column or all columns containing numeric primitive data types.

public DataBlock Min(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the minimum values.

Percentile(double, params string[])

Calculates the specified percentile for each specified column or all columns containing numeric primitive data types.

public DataBlock Percentile(double percentile, params string[] fields)

Parameters

percentile double: The percentile to calculate (e.g., 0.5 for median, 0.95 for 95th percentile).
fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the percentile values.

Pivot(IEnumerable<string>, string, string, AggregationType, string)

Transforms the DataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column, using multiple index columns.

public DataBlock Pivot(IEnumerable<string> indexColumns, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum, string columnNameFormat = "{pivot}_{value}")

Parameters

indexColumns IEnumerable<string>: The columns to use as row identifiers (become row keys in output).
pivotColumn string: The column whose unique values become new column names.
valueColumn string: The column containing values to aggregate.
aggregationType AggregationType: The aggregation function to apply when multiple values exist for the same index/pivot combination.
columnNameFormat string: Format string for generated column names. Use {pivot} for pivot value and {value} for value column name.

Returns

DataBlock: A new DataBlock with pivoted data.

Examples

// Pivot with multiple index columns
var pivoted = dataBlock.Pivot(
    new[] { "Year", "Category" },
    "Region",
    "Sales",
    AggregationType.Sum
);

Exceptions

ArgumentException: Thrown when columns don't exist or value column is non-numeric for numeric aggregations.

Pivot(string, string, string, AggregationType)

Transforms the DataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column.

public DataBlock Pivot(string indexColumn, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum)

Parameters

indexColumn string: The column to use as row identifier (becomes row keys in output).
pivotColumn string: The column whose unique values become new column names.
valueColumn string: The column containing values to aggregate.
aggregationType AggregationType: The aggregation function to apply when multiple values exist for the same index/pivot combination.

Returns

DataBlock: A new DataBlock with pivoted data.

Examples

// Input:
// Category | Region | Sales
// A        | East   | 100
// A        | West   | 150
// B        | East   | 200
//
// Output (Sum aggregation):
// Category | East_Sales | West_Sales
// A        | 100        | 150
// B        | 200        | null

var pivoted = dataBlock.Pivot("Category", "Region", "Sales", AggregationType.Sum);

Exceptions

ArgumentException: Thrown when columns don't exist or value column is non-numeric for numeric aggregations.

RegisterDataBlockFormatter()

public static void RegisterDataBlockFormatter()

RemoveColumn(params string[])

Removes one or more columns by their names.

public void RemoveColumn(params string[] columnNames)

Parameters

columnNames string[]: An array of column names to remove.

RemoveRow(int)

Removes a row by index.

public void RemoveRow(int index)

Parameters

index int: The index of the row to remove.

Sample(int, int?)

Returns a new DataBlock containing a random sample of rowCount rows from the current DataBlock.

public DataBlock Sample(int rowCount, int? seed = null)

Parameters

rowCount int: The number of rows to include in the sample.
seed int?: Optional seed for random number generation.

Returns

DataBlock: A new DataBlock containing a random sample of rowCount rows.

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero or greater than the total number of rows.

Select(params string[])

Projects the data block to include only the specified columns.

public DataBlock Select(params string[] columnNames)

Parameters

columnNames string[]: The names of the columns to include. If null or empty, all columns are included.

Returns

DataBlock: A new DataBlock containing only the specified columns.

Examples

// Create a data block with multiple columns
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));

// Add some rows
dataBlock.AddRow(new object[] { "John", 25, "New York" });
dataBlock.AddRow(new object[] { "Jane", 30, "London" });

// Project only Name and Age columns
var projectedBlock = dataBlock.Select("Name", "Age");

Exceptions

ArgumentException: Thrown when any of the specified column names do not exist in the data block.

Size(params string[])

Calculates the size (count of elements) for each specified column or all columns.

public DataBlock Size(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all columns will be counted.

Returns

DataBlock: A new DataBlock with the count values.

Sort(SortDirection, string)

Sorts the data in the DataBlock based on the specified column and sort direction. Uses index-based sorting with Array.Sort for 2-4x better performance.

public DataBlock Sort(SortDirection direction, string columnName)

Parameters

direction SortDirection: The direction to sort the data (Ascending or Descending).
columnName string: The name of the column to sort by.

Returns

DataBlock: A new DataBlock instance with the sorted data.

Exceptions

ArgumentException: Thrown when the specified column does not exist.

StandardDeviation(params string[])

Calculates the standard deviation for each specified column or all columns containing numeric primitive data types.

public DataBlock StandardDeviation(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the standard deviation values.

Sum(params string[])

Calculates the sum for each specified column or all columns containing numeric primitive data types.

public DataBlock Sum(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the sum values.

Tail(int)

Returns a new DataBlock containing the last rowCount rows of the current DataBlock.

public DataBlock Tail(int rowCount)

Parameters

rowCount int: The number of rows to return.

Returns

DataBlock: A new DataBlock containing the last rowCount rows.

Exceptions

ArgumentOutOfRangeException: Thrown when rowCount is less than zero.

Transpose(string)

Transposes the rows and columns of the data block. Converts rows into columns and columns into rows using the internal _columns collection.

public DataBlock Transpose(string headerColumnName = null)

Parameters

headerColumnName string: Optional. The name of the column to use as headers for the transposed data. If not provided, the first row will be used as headers.

Returns

DataBlock: The transposed DataBlock. If data types within a row are consistent, returns this instance. Otherwise, returns a new DataBlock with columns of type object.

Remarks

If the data within a row has mixed types, the method will return a new DataBlock with columns of type object. Otherwise, the method modifies and returns the current DataBlock instance.

UpdateRow(int, object[])

Updates a row at a specific index.

public void UpdateRow(int index, object[] values)

Parameters

index int: The index of the row to update.
values object[]: The new values for the row.

ValidateExpression(string, out string)

Validates an expression against the current DataBlock schema.

public bool ValidateExpression(string expression, out string error)

Parameters

expression string: The expression to validate
error string: Output parameter containing error message if validation fails

Returns

bool: True if expression is valid, false otherwise

Variance(params string[])

Calculates the variance for each specified column or all columns containing numeric primitive data types.

public DataBlock Variance(params string[] fields)

Parameters

fields string[]: The fields to aggregate. If null or empty, all numeric fields will be aggregated.

Returns

DataBlock: A new DataBlock with the variance values.

Where(string, object, ComparisonOperator)

Filters the data block to include only rows where the specified column matches the given value using the specified comparison operator.

public DataBlock Where(string columnName, object value, ComparisonOperator op = ComparisonOperator.Equals)

Parameters

columnName string: The name of the column to filter on.
value object: The value to compare against.
op ComparisonOperator: The comparison operator to use. Defaults to Equals.

Returns

DataBlock: A new DataBlock containing only the rows that satisfy the condition.

Examples

// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));

// Add some rows
dataBlock.AddRow(new object[] { "John", 30, "Engineering" });
dataBlock.AddRow(new object[] { "Jane", 25, "Marketing" });
dataBlock.AddRow(new object[] { "Bob", 35, "Engineering" });

// Filter for employees in Engineering department
var engineeringEmployees = dataBlock.Where("Department", "Engineering");

// Filter for employees older than 28
var seniorEmployees = dataBlock.Where("Age", 28, ComparisonOperator.GreaterThan);

// Filter for names starting with 'J'
var jNames = dataBlock.Where("Name", "J", ComparisonOperator.StartsWith);

Exceptions

ArgumentException: Thrown when the specified column name does not exist in the data block.
ArgumentNullException: Thrown when the column name is null or empty.

WhereIn(string, IEnumerable<object>)

Filters the data block to include only rows where the specified column value is contained in the given collection of values.

public DataBlock WhereIn(string columnName, IEnumerable<object> values)

Parameters

columnName string: The name of the column to filter on.
values IEnumerable<object>: The collection of values to check against.

Returns

DataBlock: A new DataBlock containing only the rows where the column value is in the specified collection.

Examples

// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));

// Add some rows
dataBlock.AddRow(new object[] { "John", "Engineering", 30 });
dataBlock.AddRow(new object[] { "Jane", "Marketing", 25 });
dataBlock.AddRow(new object[] { "Bob", "Sales", 35 });
dataBlock.AddRow(new object[] { "Alice", "Engineering", 28 });

// Filter for employees in specific departments
var techEmployees = dataBlock.WhereIn("Department", new[] { "Engineering", "IT", "Data Science" });

// Filter for employees with specific ages
var targetAges = dataBlock.WhereIn("Age", new[] { 25, 30, 35 });

Exceptions

ArgumentException: Thrown when the specified column name does not exist in the data block.
ArgumentNullException: Thrown when the column name or values collection is null.

WhereNot(string, object)

Filters the data block to exclude rows where the specified column matches the given value. This is equivalent to using Where(string, object, ComparisonOperator) with NotEquals.

public DataBlock WhereNot(string columnName, object value)

Parameters

columnName string: The name of the column to filter on.
value object: The value to exclude.

Returns

DataBlock: A new DataBlock containing only the rows where the column value does not equal the specified value.

Examples

// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));
dataBlock.AddColumn(new DataColumn("IsActive", typeof(bool)));

// Add some rows
dataBlock.AddRow(new object[] { "John", "Engineering", true });
dataBlock.AddRow(new object[] { "Jane", "Marketing", false });
dataBlock.AddRow(new object[] { "Bob", "Engineering", true });

// Filter to exclude Marketing department
var nonMarketingEmployees = dataBlock.WhereNot("Department", "Marketing");

// Filter to exclude inactive employees
var activeEmployees = dataBlock.WhereNot("IsActive", false);

Exceptions

ArgumentException: Thrown when the specified column name does not exist in the data block.
ArgumentNullException: Thrown when the column name is null or empty.

Window(string, WindowFunctionType, int?, string, string, string[], WindowFrame, double?, object)

Applies a window function to the specified column. Window functions compute values over a set of table rows that are related to the current row.

public DataBlock Window(string columnName, WindowFunctionType functionType, int? windowSize = null, string resultColumnName = null, string orderByColumn = null, string[] partitionByColumns = null, WindowFrame frameSpec = null, double? percentile = null, object defaultValue = null)

Parameters

columnName string: The column to apply window function to (null for RowNumber)
functionType WindowFunctionType: The type of window function to apply
windowSize int?: Window size for moving functions, or offset for Lag/Lead/NthValue. Use null for unbounded windows (e.g., cumulative functions). Ignored if frameSpec is provided.
resultColumnName string: Optional name for result column (auto-generated if null)
orderByColumn string: Optional column to order by for ranking and cumulative functions. If null, uses row order.
partitionByColumns string[]: Optional columns to partition by (null for no partitioning)
frameSpec WindowFrame: Optional window frame specification (ROWS BETWEEN syntax). If null and windowSize is provided, auto-creates frame (N PRECEDING AND CURRENT ROW).
percentile double?: Percentile value for MovingPercentile function (0.0 to 1.0). Required for MovingPercentile, ignored for other functions.
defaultValue object: Default value to use when Lag/Lead functions reference rows that don't exist. If null, these functions will return null for out-of-bounds references.

Returns

DataBlock: New DataBlock with window function result column added

Exceptions

ArgumentException: Thrown when parameters are invalid

Table of Contents

Class DataBlock

Constructors

DataBlock()

DataBlock(DataBlockSnapshot)

Parameters

Properties

Connector

Property Value

IsDisposed

Property Value

this[int, string]

Parameters

Property Value

this[string]

Parameters

Property Value

RowCount

Property Value

Schema

Property Value

Methods

AddColumn(DataColumn)

Parameters

AddRow(object[])

Parameters

AppendRowsBatch(DataBlock)

Parameters

Exceptions

Clone()

Returns

Compute(string, string)

Parameters

Returns

Examples

CopyRowRange(int, int)

Parameters

Returns

Exceptions

Dispose()

Dispose(bool)

Parameters

DropDuplicates(KeepDuplicateMode)

Parameters

Returns

Examples

DropDuplicates(KeepDuplicateMode, params string[])

Parameters

Returns

Examples

Exceptions

DropNulls(DropNullMode)

Parameters

Returns

FillNulls(FillMethod, object, params string[])

Parameters

Returns

Exceptions

FillNulls(FillMethod, params string[])

Parameters

Returns

Exceptions

Filter(Func<Dictionary<string, object>, bool>, params string[])

Parameters

Returns

Examples

Exceptions

FilterWithCursor(Func<IDataRowCursor, bool>, params string[])

Parameters

Returns

Examples

Exceptions

GetColumn(string)

Parameters

Returns

GetRowCursor(params string[])

Parameters

Returns

GroupBy(string)

Parameters