Table of Contents

Class CssSelectorConnectorConfiguration

Namespace
Datafication.Connectors.WebConnector.Connectors
Assembly
Datafication.WebConnector.dll

Configuration for the CSS selector connector.

public class CssSelectorConnectorConfiguration : WebConnectorConfigurationBase, IDataConnectorConfiguration
Inheritance
object
CssSelectorConnectorConfiguration
Implements
Inherited Members

Remarks

This is a flexible connector that allows extracting any data from web pages using CSS selectors. It's ideal for scraping structured content like product listings, articles, or any repeating elements.

Properties

AttributeSubSelectors

Gets or sets sub-selectors for extracting attribute values.

public Dictionary<string, string> AttributeSubSelectors { get; set; }

Property Value

Dictionary<string, string>

Remarks

Similar to SubSelectors, but extracts an attribute value instead of text content. Format: { "ColumnName": "selector|attribute" } Example: { "ImageUrl": "img|src", "ProductLink": "a.details|href" }

Attributes

Gets or sets the list of attribute names to extract from each element.

public List<string> Attributes { get; set; }

Property Value

List<string>

Remarks

For each attribute name, a column is created containing that attribute's value. Common attributes include "id", "class", "href", "src", "data-*".

IncludeElementIndex

Gets or sets whether to include the ElementIndex column.

public bool IncludeElementIndex { get; set; }

Property Value

bool

Remarks

When true (default), includes a column with the 0-based index of each element.

IncludeInnerHtml

Gets or sets whether to include the InnerHtml column.

public bool IncludeInnerHtml { get; set; }

Property Value

bool

Remarks

When true, includes a column containing the HTML content inside each element.

IncludeInnerText

Gets or sets whether to include the InnerText column.

public bool IncludeInnerText { get; set; }

Property Value

bool

Remarks

When true (default), includes a column containing the text content of each element.

IncludeOuterHtml

Gets or sets whether to include the OuterHtml column.

public bool IncludeOuterHtml { get; set; }

Property Value

bool

Remarks

When true, includes a column containing the full HTML of each element.

IncludeTagName

Gets or sets whether to include the TagName column.

public bool IncludeTagName { get; set; }

Property Value

bool

Remarks

When true (default), includes a column with the HTML tag name of each element.

MaxElements

Gets or sets the maximum number of elements to return.

public int? MaxElements { get; set; }

Property Value

int?

Remarks

When null (default), all matching elements are returned. Set to a positive number to limit results.

Selector

Gets or sets the primary CSS selector to match elements.

public string Selector { get; set; }

Property Value

string

Remarks

Each matched element becomes a row in the resulting DataBlock. Example: ".product-card" to match all product cards on a page.

SubSelectors

Gets or sets custom sub-selectors relative to matched elements.

public Dictionary<string, string> SubSelectors { get; set; }

Property Value

Dictionary<string, string>

Remarks

The key is the column name, the value is a CSS selector relative to each matched element. The text content of the first matching sub-element is used. Example: { "Title": "h2.title", "Price": ".price-value" }

TrimValues

Gets or sets whether to trim whitespace from text values.

public bool TrimValues { get; set; }

Property Value

bool