Table of Contents

Class LinkExtractorConnector

Namespace
Datafication.Connectors.WebConnector.Connectors
Assembly
Datafication.WebConnector.dll

Connector for extracting links from web pages.

public class LinkExtractorConnector : WebConnectorBase<LinkExtractorConnectorConfiguration>, IDataConnector
Inheritance
object
LinkExtractorConnector
Implements
Inherited Members
Extension Methods

Remarks

This connector extracts anchor elements from HTML pages and converts them to a DataBlock with columns for URL, text, and various attributes. It supports filtering by internal/external, URL patterns, and deduplication.

Constructors

LinkExtractorConnector(LinkExtractorConnectorConfiguration)

Initializes a new instance of the LinkExtractorConnector class.

public LinkExtractorConnector(LinkExtractorConnectorConfiguration configuration)

Parameters

configuration LinkExtractorConnectorConfiguration

The configuration for this connector.

Exceptions

ArgumentNullException

Thrown when configuration is null.

ArgumentException

Thrown when configuration is invalid.

Methods

GetDataAsync()

Extracts links from the configured URL and returns them as a DataBlock.

public override Task<DataBlock> GetDataAsync()

Returns

Task<DataBlock>

A DataBlock containing the extracted link data.