Class LinkExtractorConnector
- Namespace
- Datafication.Connectors.WebConnector.Connectors
- Assembly
- Datafication.WebConnector.dll
Connector for extracting links from web pages.
public class LinkExtractorConnector : WebConnectorBase<LinkExtractorConnectorConfiguration>, IDataConnector
- Inheritance
-
objectLinkExtractorConnector
- Implements
- Inherited Members
- Extension Methods
Remarks
This connector extracts anchor elements from HTML pages and converts them to a DataBlock with columns for URL, text, and various attributes. It supports filtering by internal/external, URL patterns, and deduplication.
Constructors
LinkExtractorConnector(LinkExtractorConnectorConfiguration)
Initializes a new instance of the LinkExtractorConnector class.
public LinkExtractorConnector(LinkExtractorConnectorConfiguration configuration)
Parameters
configurationLinkExtractorConnectorConfigurationThe configuration for this connector.
Exceptions
- ArgumentNullException
Thrown when configuration is null.
- ArgumentException
Thrown when configuration is invalid.
Methods
GetDataAsync()
Extracts links from the configured URL and returns them as a DataBlock.
public override Task<DataBlock> GetDataAsync()
Returns
- Task<DataBlock>
A DataBlock containing the extracted link data.