Class HtmlTableConnectorConfiguration
- Namespace
- Datafication.Connectors.WebConnector.Connectors
- Assembly
- Datafication.WebConnector.dll
Configuration for the HTML table connector.
public class HtmlTableConnectorConfiguration : WebConnectorConfigurationBase, IDataConnectorConfiguration
- Inheritance
-
objectHtmlTableConnectorConfiguration
- Implements
- Inherited Members
Remarks
This configuration controls how HTML tables are extracted from web pages. The connector can extract one or more tables and convert them to DataBlocks.
Properties
FirstRowIsHeader
Gets or sets whether to treat the first row as headers.
public bool FirstRowIsHeader { get; set; }
Property Value
- bool
Remarks
When true (default), the first row of each table (typically <th> elements or the first <tr>) is used as column headers. When false, columns are named Column_0, Column_1, etc.
IncludeTableMetadata
Gets or sets whether to include table metadata columns.
public bool IncludeTableMetadata { get; set; }
Property Value
- bool
Remarks
When true (default), additional columns are added: TableIndex, TableId, TableClass, and RowIndex. When false, only the table data columns are included.
MergeTables
Gets or sets whether to merge all tables into a single DataBlock.
public bool MergeTables { get; set; }
Property Value
- bool
Remarks
When false (default), only the first matching table is returned. When true, all matching tables are merged into a single DataBlock. Tables with different column counts will have missing values filled with empty strings.
SkipEmptyRows
Gets or sets whether to skip empty rows.
public bool SkipEmptyRows { get; set; }
Property Value
- bool
Remarks
When true (default), rows where all cells are empty or whitespace are skipped.
TableIndex
Gets or sets the index of a specific table to extract (0-based).
public int? TableIndex { get; set; }
Property Value
- int?
Remarks
When null (default), all matching tables are considered. When set, only the table at the specified index is extracted.
TableSelector
Gets or sets the CSS selector to target specific tables.
public string TableSelector { get; set; }
Property Value
- string
Remarks
Default is "table" which matches all table elements. Use more specific selectors like "table.data-table" or "#results-table" to target specific tables.
TrimCellValues
Gets or sets whether to trim whitespace from cell values.
public bool TrimCellValues { get; set; }
Property Value
- bool
Remarks
When true (default), leading and trailing whitespace is removed from all cell values.
UseTheadForHeaders
Gets or sets whether to use <thead> for headers when available.
public bool UseTheadForHeaders { get; set; }
Property Value
- bool
Remarks
When true (default), headers are taken from <thead> if present, otherwise from the first <tr>. When false, always uses the first row.