Table of Contents

Class HtmlTableConnectorConfiguration

Namespace
Datafication.Connectors.WebConnector.Connectors
Assembly
Datafication.WebConnector.dll

Configuration for the HTML table connector.

public class HtmlTableConnectorConfiguration : WebConnectorConfigurationBase, IDataConnectorConfiguration
Inheritance
object
HtmlTableConnectorConfiguration
Implements
Inherited Members

Remarks

This configuration controls how HTML tables are extracted from web pages. The connector can extract one or more tables and convert them to DataBlocks.

Properties

FirstRowIsHeader

Gets or sets whether to treat the first row as headers.

public bool FirstRowIsHeader { get; set; }

Property Value

bool

Remarks

When true (default), the first row of each table (typically <th> elements or the first <tr>) is used as column headers. When false, columns are named Column_0, Column_1, etc.

IncludeTableMetadata

Gets or sets whether to include table metadata columns.

public bool IncludeTableMetadata { get; set; }

Property Value

bool

Remarks

When true (default), additional columns are added: TableIndex, TableId, TableClass, and RowIndex. When false, only the table data columns are included.

MergeTables

Gets or sets whether to merge all tables into a single DataBlock.

public bool MergeTables { get; set; }

Property Value

bool

Remarks

When false (default), only the first matching table is returned. When true, all matching tables are merged into a single DataBlock. Tables with different column counts will have missing values filled with empty strings.

SkipEmptyRows

Gets or sets whether to skip empty rows.

public bool SkipEmptyRows { get; set; }

Property Value

bool

Remarks

When true (default), rows where all cells are empty or whitespace are skipped.

TableIndex

Gets or sets the index of a specific table to extract (0-based).

public int? TableIndex { get; set; }

Property Value

int?

Remarks

When null (default), all matching tables are considered. When set, only the table at the specified index is extracted.

TableSelector

Gets or sets the CSS selector to target specific tables.

public string TableSelector { get; set; }

Property Value

string

Remarks

Default is "table" which matches all table elements. Use more specific selectors like "table.data-table" or "#results-table" to target specific tables.

TrimCellValues

Gets or sets whether to trim whitespace from cell values.

public bool TrimCellValues { get; set; }

Property Value

bool

Remarks

When true (default), leading and trailing whitespace is removed from all cell values.

UseTheadForHeaders

Gets or sets whether to use <thead> for headers when available.

public bool UseTheadForHeaders { get; set; }

Property Value

bool

Remarks

When true (default), headers are taken from <thead> if present, otherwise from the first <tr>. When false, always uses the first row.