Class DataBlock
- Namespace
- Datafication.Core.Data
- Assembly
- Datafication.Core.dll
Represents a block of data with rows and columns.
public class DataBlock : IDataBlock
- Inheritance
-
objectDataBlock
- Implements
- Extension Methods
Constructors
DataBlock()
Initializes a new instance of the DataBlock class.
public DataBlock()
DataBlock(DataBlockSnapshot)
public DataBlock(DataBlockSnapshot snapshot)
Parameters
snapshotDataBlockSnapshot
Properties
Connector
public static ConnectorExtensions Connector { get; }
Property Value
IsDisposed
Gets a value indicating whether this DataBlock has been disposed.
public bool IsDisposed { get; }
Property Value
- bool
this[int, string]
Gets or sets the value for a specified row and column.
public object this[int row, string columnName] { get; set; }
Parameters
rowintThe row index.
columnNamestringThe column name.
Property Value
- object
The value at the specified row and column.
this[string]
Gets a column by its name.
public DataColumn this[string columnName] { get; }
Parameters
columnNamestringThe name of the column to retrieve.
Property Value
- DataColumn
The IDataColumn instance representing the column.
RowCount
Gets the number of rows in the data block.
public int RowCount { get; }
Property Value
- int
Schema
Gets the schema of the data block.
public DataSchema Schema { get; }
Property Value
Methods
AddColumn(DataColumn)
Adds a column to the data block.
public void AddColumn(DataColumn column)
Parameters
columnDataColumnThe column to add.
AddRow(object[])
Adds a row to the data block by updating the values in each column.
public void AddRow(object[] values)
Parameters
valuesobject[]The values to add as a new row.
AppendRowsBatch(DataBlock)
Efficiently appends rows from another DataBlock using batch operations.
public void AppendRowsBatch(DataBlock source)
Parameters
sourceDataBlockThe source DataBlock to append from.
Exceptions
- InvalidOperationException
Thrown when schemas don't match.
Clone()
Creates a clone of the current data block.
public DataBlock Clone()
Returns
Compute(string, string)
Adds a computed column to the DataBlock based on an expression. Returns a new DataBlock with all existing columns plus the computed column. PERFORMANCE: Uses column reference sharing to avoid redundant copying when chaining.
public DataBlock Compute(string columnName, string expression)
Parameters
columnNamestringThe name for the computed column
expressionstringThe expression to evaluate (e.g., "Total Profit / Total Revenue")
Returns
- DataBlock
A new DataBlock with the computed column added
Examples
var result = dataBlock
.Select("Total Profit", "Total Revenue", "Country")
.Compute("Profit Margin", "Total Profit / Total Revenue")
.Where("Profit Margin", 0.25, ComparisonOperator.GreaterThan)
.Execute();
CopyRowRange(int, int)
Efficiently copies a range of rows using direct column operations.
public DataBlock CopyRowRange(int startRow, int rowCount)
Parameters
startRowintThe starting row index.
rowCountintThe number of rows to copy.
Returns
- DataBlock
A new DataBlock containing the specified row range.
Exceptions
- ArgumentOutOfRangeException
Thrown when startRow or rowCount are invalid.
Dispose()
Releases all resources used by the DataBlock.
public void Dispose()
Dispose(bool)
Releases the unmanaged resources used by the DataBlock and optionally releases the managed resources.
protected virtual void Dispose(bool disposing)
Parameters
disposingbooltrue to release both managed and unmanaged resources; false to release only unmanaged resources.
DropDuplicates(KeepDuplicateMode)
Returns a new DataBlock with duplicate rows removed based on all columns.
public DataBlock DropDuplicates(KeepDuplicateMode keep = KeepDuplicateMode.First)
Parameters
keepKeepDuplicateModeSpecifies which duplicates to keep (First, Last, or None). Defaults to First.
Returns
- DataBlock
A new DataBlock with duplicates removed.
Examples
// Keep first occurrence of duplicates
var result = dataBlock.DropDuplicates();
// Keep last occurrence of duplicates
var result = dataBlock.DropDuplicates(KeepDuplicateMode.Last);
// Remove all duplicates (keep only unique rows)
var result = dataBlock.DropDuplicates(KeepDuplicateMode.None);
DropDuplicates(KeepDuplicateMode, params string[])
Returns a new DataBlock with duplicate rows removed based on specific columns.
public DataBlock DropDuplicates(KeepDuplicateMode keep, params string[] columns)
Parameters
keepKeepDuplicateModeSpecifies which duplicates to keep (First, Last, or None).
columnsstring[]The columns to consider when identifying duplicates.
Returns
- DataBlock
A new DataBlock with duplicates removed.
Examples
// Remove duplicates based on 'Name' column, keep first
var result = dataBlock.DropDuplicates(KeepDuplicateMode.First, "Name");
// Remove duplicates based on 'Name' and 'Email', keep last
var result = dataBlock.DropDuplicates(KeepDuplicateMode.Last, "Name", "Email");
Exceptions
- ArgumentException
Thrown when no columns are specified or columns don't exist.
DropNulls(DropNullMode)
Returns a new DataBlock with rows dropped based on null values.
public DataBlock DropNulls(DropNullMode dropMode)
Parameters
dropModeDropNullModeSpecifies the criteria for dropping rows.
Returns
- DataBlock
A new DataBlock with rows dropped according to the specified criteria.
FillNulls(FillMethod, object, params string[])
Returns a new DataBlock with null values filled with a constant value.
public DataBlock FillNulls(FillMethod method, object constantValue, params string[] columnNames)
Parameters
methodFillMethodThe fill method to use (should be FillMethod.ConstantValue).
constantValueobjectThe constant value to use for filling nulls.
columnNamesstring[]The columns to apply the fill operation to.
Returns
- DataBlock
A new DataBlock with filled values.
Exceptions
- ArgumentException
Thrown when no columns are specified or columns don't exist.
FillNulls(FillMethod, params string[])
Returns a new DataBlock with null values filled according to the specified method.
public DataBlock FillNulls(FillMethod method, params string[] columnNames)
Parameters
methodFillMethodThe fill method to use.
columnNamesstring[]The columns to apply the fill operation to.
Returns
- DataBlock
A new DataBlock with filled values.
Exceptions
- ArgumentException
Thrown when no columns are specified or columns don't exist.
Filter(Func<Dictionary<string, object>, bool>, params string[])
Filters rows based on a predicate and projects the data block to include only the specified columns.
public DataBlock Filter(Func<Dictionary<string, object>, bool> predicate, params string[] columnNames)
Parameters
predicateFunc<Dictionary<string, object>, bool>A function that determines whether a row should be included based on its values.
columnNamesstring[]The names of the columns to include. If null or empty, all columns are included.
Returns
Examples
// Create a data block with multiple columns
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));
// Add some rows
dataBlock.AddRow(new object[] { "John", 25, "New York" });
dataBlock.AddRow(new object[] { "Jane", 30, "London" });
dataBlock.AddRow(new object[] { "Bob", 22, "Paris" });
// Filter rows where Age > 25 and project only Name and Age columns
var filteredBlock = dataBlock.Filter(
row => (int)row["Age"] > 25,
"Name", "Age"
);
Exceptions
- ArgumentNullException
Thrown when the predicate is null.
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
FilterWithCursor(Func<IDataRowCursor, bool>, params string[])
Filters rows using a cursor-based predicate and projects the specified columns.
public DataBlock FilterWithCursor(Func<IDataRowCursor, bool> predicate, params string[] columnNames)
Parameters
predicateFunc<IDataRowCursor, bool>A function to test each row using a cursor. The cursor provides access to all column values for the current row.
columnNamesstring[]The names of the columns to include in the result. If null or empty, all columns are included.
Returns
- DataBlock
A new DataBlock containing only the rows that satisfy the predicate and the specified columns.
Examples
// Create a data block with sample data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));
dataBlock.AddRow(new object[] { "John", 30, "London" });
dataBlock.AddRow(new object[] { "Jane", 25, "Paris" });
// Filter rows where Age > 25 and City starts with 'L', project all columns
var filteredBlock = dataBlock.FilterWithCursor(
cursor => (int)cursor.GetValue("Age") > 25 && ((string)cursor.GetValue("City")).StartsWith("L")
);
Exceptions
- ArgumentNullException
Thrown when the predicate is null.
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
GetColumn(string)
Gets a column by its name.
public DataColumn GetColumn(string columnName)
Parameters
columnNamestringThe name of the column to retrieve.
Returns
- DataColumn
The IDataColumn instance representing the column.
GetRowCursor(params string[])
Gets a row cursor for iterating over rows with specified columns.
public IDataRowCursor GetRowCursor(params string[] columnNames)
Parameters
columnNamesstring[]The names of the columns to include in the cursor.
Returns
- IDataRowCursor
A IDataRowCursor that allows iteration over the rows.
GroupBy(string)
Groups the rows by the specified column name.
public DataBlockGroup GroupBy(string columnName)
Parameters
columnNamestringThe name of the column to group by.
Returns
- DataBlockGroup
A DataBlockGroup containing the grouped data.
GroupByAggregate(string, Dictionary<string, AggregationType>)
Groups the data by the specified column and applies multiple aggregation functions. This method provides SQL-style GROUP BY functionality with multiple aggregations in a single operation.
public DataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations)
Parameters
groupByColumnstringThe column to group by.
aggregationsDictionary<string, AggregationType>Dictionary mapping column names to aggregation types.
Returns
- DataBlock
A new DataBlock with group keys and multiple aggregated columns.
Examples
var aggregations = new Dictionary<string, AggregationType>
{
["session_duration"] = AggregationType.Mean,
["page_views"] = AggregationType.Sum,
["user_id"] = AggregationType.Count
};
var result = dataBlock.GroupByAggregate("user_type", aggregations);
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation types are invalid.
GroupByAggregate(string, Dictionary<string, AggregationType>, Dictionary<string, string>)
Groups the data by the specified column and applies multiple aggregation functions with custom result column names. This method provides SQL-style GROUP BY functionality with multiple aggregations in a single operation.
public DataBlock GroupByAggregate(string groupByColumn, Dictionary<string, AggregationType> aggregations, Dictionary<string, string> resultColumnNames)
Parameters
groupByColumnstringThe column to group by.
aggregationsDictionary<string, AggregationType>Dictionary mapping column names to aggregation types.
resultColumnNamesDictionary<string, string>Dictionary mapping aggregate columns to custom result column names.
Returns
- DataBlock
A new DataBlock with group keys and multiple aggregated columns.
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation types are invalid.
GroupByAggregate(string, string, AggregationType, string)
Groups the data by the specified column and applies an aggregation function to another column. This method provides SQL-style GROUP BY functionality with aggregation in a single operation.
public DataBlock GroupByAggregate(string groupByColumn, string aggregateColumn, AggregationType aggregationType, string resultColumnName = null)
Parameters
groupByColumnstringThe column to group by.
aggregateColumnstringThe column to aggregate.
aggregationTypeAggregationTypeThe type of aggregation to perform.
resultColumnNamestringOptional custom name for the result column. If null, uses pattern like "avg_columnName".
Returns
- DataBlock
A new DataBlock with group keys and aggregated values.
Examples
// SQL: SELECT user_type, AVG(session_duration) AS avg_duration FROM user_sessions GROUP BY user_type
var result = dataBlock.GroupByAggregate("user_type", "session_duration", AggregationType.Mean, "avg_duration");
Exceptions
- ArgumentException
Thrown when columns don't exist or aggregation type is invalid for the column type.
HasColumn(string)
Determines whether the data block contains a column with the specified name.
public bool HasColumn(string columnName)
Parameters
columnNamestringThe name of the column to check.
Returns
- bool
trueif the column exists; otherwise,false.
Head(int)
Returns a new DataBlock containing the first rowCount rows of the current DataBlock.
public DataBlock Head(int rowCount)
Parameters
rowCountintThe number of rows to return.
Returns
- DataBlock
A new DataBlock containing the first
rowCountrows.
Exceptions
- ArgumentOutOfRangeException
Thrown when
rowCountis less than zero.
Info()
Generates and returns a new DataBlock that contains summary information
similar to the Info output in Pandas and Microsoft's DataFrame.
The resulting DataBlock will include columns for column names, data types,
non-null counts, and memory usage.
public DataBlock Info()
Returns
- DataBlock
A new DataBlock containing summary information.
InsertRow(int, object[])
Inserts a row at a specific index by updating the values in each column.
public void InsertRow(int index, object[] values)
Parameters
indexintThe index to insert the row at.
valuesobject[]The values for the row.
Max(params string[])
Calculates the maximum value for each specified column or all columns containing numeric primitive data types.
public DataBlock Max(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the maximum values.
Mean(params string[])
Calculates the mean value for each specified column or all columns containing numeric primitive data types.
public DataBlock Mean(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the mean values.
Melt(IEnumerable<string>, string, string)
Unpivots the DataBlock from wide format to long format by keeping the specified fixed columns and converting the remaining columns into key-value pairs.
public DataBlock Melt(IEnumerable<string> fixedColumns, string meltedColumnName, string meltedValueName)
Parameters
fixedColumnsIEnumerable<string>A collection of column names to remain fixed in the output.
meltedColumnNamestringThe name of the column that will hold the original column names that were melted.
meltedValueNamestringThe name of the column that will hold the values from the melted columns.
Returns
- DataBlock
A new DataBlock that is the result of the melt operation.
Merge(DataBlock, string, MergeMode)
Merges the current DataBlock with another DataBlock using a single key column for both.
public DataBlock Merge(DataBlock other, string keyColumn, MergeMode mergeMode)
Parameters
otherDataBlockThe other DataBlock to merge with.
keyColumnstringThe name of the key column to join on.
mergeModeMergeModeThe merge mode specifying the type of join (Left, Right, Full, Inner).
Returns
- DataBlock
A new DataBlock containing the result of the merge operation.
Merge(DataBlock, string, string, MergeMode)
Merges the current DataBlock with another DataBlock based on specified key columns.
public DataBlock Merge(DataBlock other, string keyColumn, string keyColumnOther, MergeMode mergeMode)
Parameters
otherDataBlockThe other DataBlock to merge with.
keyColumnstringThe name of the key column in this DataBlock.
keyColumnOtherstringThe name of the key column in the other DataBlock.
mergeModeMergeModeThe merge mode specifying the type of join (Left, Right, Full, Inner).
Returns
- DataBlock
A new DataBlock containing the result of the merge operation.
Exceptions
- ArgumentException
Thrown if a specified key column is not present in the respective DataBlock.
Min(params string[])
Calculates the minimum value for each specified column or all columns containing numeric primitive data types.
public DataBlock Min(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the minimum values.
Percentile(double, params string[])
Calculates the specified percentile for each specified column or all columns containing numeric primitive data types.
public DataBlock Percentile(double percentile, params string[] fields)
Parameters
percentiledoubleThe percentile to calculate (e.g., 0.5 for median, 0.95 for 95th percentile).
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the percentile values.
Pivot(IEnumerable<string>, string, string, AggregationType, string)
Transforms the DataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column, using multiple index columns.
public DataBlock Pivot(IEnumerable<string> indexColumns, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum, string columnNameFormat = "{pivot}_{value}")
Parameters
indexColumnsIEnumerable<string>The columns to use as row identifiers (become row keys in output).
pivotColumnstringThe column whose unique values become new column names.
valueColumnstringThe column containing values to aggregate.
aggregationTypeAggregationTypeThe aggregation function to apply when multiple values exist for the same index/pivot combination.
columnNameFormatstringFormat string for generated column names. Use {pivot} for pivot value and {value} for value column name.
Returns
- DataBlock
A new DataBlock with pivoted data.
Examples
// Pivot with multiple index columns
var pivoted = dataBlock.Pivot(
new[] { "Year", "Category" },
"Region",
"Sales",
AggregationType.Sum
);
Exceptions
- ArgumentException
Thrown when columns don't exist or value column is non-numeric for numeric aggregations.
Pivot(string, string, string, AggregationType)
Transforms the DataBlock from long format to wide format by pivoting values from a column into new columns based on unique values in another column.
public DataBlock Pivot(string indexColumn, string pivotColumn, string valueColumn, AggregationType aggregationType = AggregationType.Sum)
Parameters
indexColumnstringThe column to use as row identifier (becomes row keys in output).
pivotColumnstringThe column whose unique values become new column names.
valueColumnstringThe column containing values to aggregate.
aggregationTypeAggregationTypeThe aggregation function to apply when multiple values exist for the same index/pivot combination.
Returns
- DataBlock
A new DataBlock with pivoted data.
Examples
// Input:
// Category | Region | Sales
// A | East | 100
// A | West | 150
// B | East | 200
//
// Output (Sum aggregation):
// Category | East_Sales | West_Sales
// A | 100 | 150
// B | 200 | null
var pivoted = dataBlock.Pivot("Category", "Region", "Sales", AggregationType.Sum);
Exceptions
- ArgumentException
Thrown when columns don't exist or value column is non-numeric for numeric aggregations.
RegisterDataBlockFormatter()
public static void RegisterDataBlockFormatter()
RemoveColumn(params string[])
Removes one or more columns by their names.
public void RemoveColumn(params string[] columnNames)
Parameters
columnNamesstring[]An array of column names to remove.
RemoveRow(int)
Removes a row by index.
public void RemoveRow(int index)
Parameters
indexintThe index of the row to remove.
Sample(int, int?)
Returns a new DataBlock containing a random sample of rowCount rows from the current DataBlock.
public DataBlock Sample(int rowCount, int? seed = null)
Parameters
rowCountintThe number of rows to include in the sample.
seedint?Optional seed for random number generation.
Returns
- DataBlock
A new DataBlock containing a random sample of
rowCountrows.
Exceptions
- ArgumentOutOfRangeException
Thrown when
rowCountis less than zero or greater than the total number of rows.
Select(params string[])
Projects the data block to include only the specified columns.
public DataBlock Select(params string[] columnNames)
Parameters
columnNamesstring[]The names of the columns to include. If null or empty, all columns are included.
Returns
Examples
// Create a data block with multiple columns
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("City", typeof(string)));
// Add some rows
dataBlock.AddRow(new object[] { "John", 25, "New York" });
dataBlock.AddRow(new object[] { "Jane", 30, "London" });
// Project only Name and Age columns
var projectedBlock = dataBlock.Select("Name", "Age");
Exceptions
- ArgumentException
Thrown when any of the specified column names do not exist in the data block.
Size(params string[])
Calculates the size (count of elements) for each specified column or all columns.
public DataBlock Size(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all columns will be counted.
Returns
- DataBlock
A new DataBlock with the count values.
Sort(SortDirection, string)
Sorts the data in the DataBlock based on the specified column and sort direction.
public DataBlock Sort(SortDirection direction, string columnName)
Parameters
directionSortDirectionThe direction to sort the data (Ascending or Descending).
columnNamestringThe name of the column to sort by.
Returns
- DataBlock
A new DataBlock instance with the sorted data.
Exceptions
- ArgumentException
Thrown when the specified column does not exist.
StandardDeviation(params string[])
Calculates the standard deviation for each specified column or all columns containing numeric primitive data types.
public DataBlock StandardDeviation(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the standard deviation values.
Sum(params string[])
Calculates the sum for each specified column or all columns containing numeric primitive data types.
public DataBlock Sum(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the sum values.
Tail(int)
Returns a new DataBlock containing the last rowCount rows of the current DataBlock.
public DataBlock Tail(int rowCount)
Parameters
rowCountintThe number of rows to return.
Returns
- DataBlock
A new DataBlock containing the last
rowCountrows.
Exceptions
- ArgumentOutOfRangeException
Thrown when
rowCountis less than zero.
Transpose(string)
Transposes the rows and columns of the data block. Converts rows into columns and columns into rows using the internal _columns collection.
public DataBlock Transpose(string headerColumnName = null)
Parameters
headerColumnNamestringOptional. The name of the column to use as headers for the transposed data. If not provided, the first row will be used as headers.
Returns
- DataBlock
The transposed DataBlock. If data types within a row are consistent, returns this instance. Otherwise, returns a new DataBlock with columns of type object.
Remarks
If the data within a row has mixed types, the method will return a new DataBlock with columns of type object. Otherwise, the method modifies and returns the current DataBlock instance.
UpdateRow(int, object[])
Updates a row at a specific index.
public void UpdateRow(int index, object[] values)
Parameters
indexintThe index of the row to update.
valuesobject[]The new values for the row.
ValidateExpression(string, out string)
Validates an expression against the current DataBlock schema.
public bool ValidateExpression(string expression, out string error)
Parameters
expressionstringThe expression to validate
errorstringOutput parameter containing error message if validation fails
Returns
- bool
True if expression is valid, false otherwise
Variance(params string[])
Calculates the variance for each specified column or all columns containing numeric primitive data types.
public DataBlock Variance(params string[] fields)
Parameters
fieldsstring[]The fields to aggregate. If null or empty, all numeric fields will be aggregated.
Returns
- DataBlock
A new DataBlock with the variance values.
Where(string, object, ComparisonOperator)
Filters the data block to include only rows where the specified column matches the given value using the specified comparison operator.
public DataBlock Where(string columnName, object value, ComparisonOperator op = ComparisonOperator.Equals)
Parameters
columnNamestringThe name of the column to filter on.
valueobjectThe value to compare against.
opComparisonOperatorThe comparison operator to use. Defaults to Equals.
Returns
Examples
// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));
// Add some rows
dataBlock.AddRow(new object[] { "John", 30, "Engineering" });
dataBlock.AddRow(new object[] { "Jane", 25, "Marketing" });
dataBlock.AddRow(new object[] { "Bob", 35, "Engineering" });
// Filter for employees in Engineering department
var engineeringEmployees = dataBlock.Where("Department", "Engineering");
// Filter for employees older than 28
var seniorEmployees = dataBlock.Where("Age", 28, ComparisonOperator.GreaterThan);
// Filter for names starting with 'J'
var jNames = dataBlock.Where("Name", "J", ComparisonOperator.StartsWith);
Exceptions
- ArgumentException
Thrown when the specified column name does not exist in the data block.
- ArgumentNullException
Thrown when the column name is null or empty.
WhereIn(string, IEnumerable<object>)
Filters the data block to include only rows where the specified column value is contained in the given collection of values.
public DataBlock WhereIn(string columnName, IEnumerable<object> values)
Parameters
columnNamestringThe name of the column to filter on.
valuesIEnumerable<object>The collection of values to check against.
Returns
- DataBlock
A new DataBlock containing only the rows where the column value is in the specified collection.
Examples
// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));
dataBlock.AddColumn(new DataColumn("Age", typeof(int)));
// Add some rows
dataBlock.AddRow(new object[] { "John", "Engineering", 30 });
dataBlock.AddRow(new object[] { "Jane", "Marketing", 25 });
dataBlock.AddRow(new object[] { "Bob", "Sales", 35 });
dataBlock.AddRow(new object[] { "Alice", "Engineering", 28 });
// Filter for employees in specific departments
var techEmployees = dataBlock.WhereIn("Department", new[] { "Engineering", "IT", "Data Science" });
// Filter for employees with specific ages
var targetAges = dataBlock.WhereIn("Age", new[] { 25, 30, 35 });
Exceptions
- ArgumentException
Thrown when the specified column name does not exist in the data block.
- ArgumentNullException
Thrown when the column name or values collection is null.
WhereNot(string, object)
Filters the data block to exclude rows where the specified column matches the given value. This is equivalent to using Where(string, object, ComparisonOperator) with NotEquals.
public DataBlock WhereNot(string columnName, object value)
Parameters
columnNamestringThe name of the column to filter on.
valueobjectThe value to exclude.
Returns
- DataBlock
A new DataBlock containing only the rows where the column value does not equal the specified value.
Examples
// Create a data block with employee data
var dataBlock = new DataBlock();
dataBlock.AddColumn(new DataColumn("Name", typeof(string)));
dataBlock.AddColumn(new DataColumn("Department", typeof(string)));
dataBlock.AddColumn(new DataColumn("IsActive", typeof(bool)));
// Add some rows
dataBlock.AddRow(new object[] { "John", "Engineering", true });
dataBlock.AddRow(new object[] { "Jane", "Marketing", false });
dataBlock.AddRow(new object[] { "Bob", "Engineering", true });
// Filter to exclude Marketing department
var nonMarketingEmployees = dataBlock.WhereNot("Department", "Marketing");
// Filter to exclude inactive employees
var activeEmployees = dataBlock.WhereNot("IsActive", false);
Exceptions
- ArgumentException
Thrown when the specified column name does not exist in the data block.
- ArgumentNullException
Thrown when the column name is null or empty.
Window(string, WindowFunctionType, int?, string, string, string[], WindowFrame, double?, object)
Applies a window function to the specified column. Window functions compute values over a set of table rows that are related to the current row.
public DataBlock Window(string columnName, WindowFunctionType functionType, int? windowSize = null, string resultColumnName = null, string orderByColumn = null, string[] partitionByColumns = null, WindowFrame frameSpec = null, double? percentile = null, object defaultValue = null)
Parameters
columnNamestringThe column to apply window function to (null for RowNumber)
functionTypeWindowFunctionTypeThe type of window function to apply
windowSizeint?Window size for moving functions, or offset for Lag/Lead/NthValue. Use null for unbounded windows (e.g., cumulative functions). Ignored if frameSpec is provided.
resultColumnNamestringOptional name for result column (auto-generated if null)
orderByColumnstringOptional column to order by for ranking and cumulative functions. If null, uses row order.
partitionByColumnsstring[]Optional columns to partition by (null for no partitioning)
frameSpecWindowFrameOptional window frame specification (ROWS BETWEEN syntax). If null and windowSize is provided, auto-creates frame (N PRECEDING AND CURRENT ROW).
percentiledouble?Percentile value for MovingPercentile function (0.0 to 1.0). Required for MovingPercentile, ignored for other functions.
defaultValueobjectDefault value to use when Lag/Lead functions reference rows that don't exist. If null, these functions will return null for out-of-bounds references.
Returns
- DataBlock
New DataBlock with window function result column added
Exceptions
- ArgumentException
Thrown when parameters are invalid