Table of Contents

Class WebConnectorBase<TConfig>

Namespace
Datafication.Connectors.WebConnector.Connectors
Assembly
Datafication.WebConnector.dll

Abstract base class providing shared HTML fetching logic for web connectors.

public abstract class WebConnectorBase<TConfig> : IDataConnector where TConfig : WebConnectorConfigurationBase

Type Parameters

TConfig

The configuration type for this connector.

Inheritance
object
WebConnectorBase<TConfig>
Implements
Derived
Extension Methods

Remarks

This class handles both HTTP-only and browser-based page fetching, controlled by the UseBrowser setting. Derived classes implement the specific parsing logic for their data type.

Constructors

WebConnectorBase(TConfig)

Initializes a new instance of the WebConnectorBase<TConfig> class.

protected WebConnectorBase(TConfig configuration)

Parameters

configuration TConfig

The configuration for this connector.

Exceptions

ArgumentNullException

Thrown when configuration is null.

Properties

Configuration

Gets the configuration for this connector.

protected TConfig Configuration { get; }

Property Value

TConfig

Methods

FetchDocumentAsync()

Fetches and parses HTML from the configured source.

protected Task<IDocument> FetchDocumentAsync()

Returns

Task<IDocument>

A parsed HTML document.

Exceptions

HttpRequestException

Thrown when the HTTP request fails.

TimeoutException

Thrown when the request times out.

FetchHtmlAsync()

Fetches the raw HTML content from the configured source.

protected Task<string> FetchHtmlAsync()

Returns

Task<string>

The HTML content as a string.

GetConnectorId()

Gets the connector identifier from the configuration.

public string GetConnectorId()

Returns

string

The unique identifier for this connector instance.

GetDataAsync()

Gets data from the web source as a DataBlock.

public abstract Task<DataBlock> GetDataAsync()

Returns

Task<DataBlock>

A task that represents the asynchronous operation, containing the resulting DataBlock.

GetStorageDataAsync(IStorageDataBlock, int)

Gets data from the web source and appends it to a storage DataBlock.

public virtual Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)

Parameters

target IStorageDataBlock

The target storage DataBlock to append data to.

batchSize int

The batch size for processing (not typically used for web connectors).

Returns

Task<IStorageDataBlock>

A task that represents the asynchronous operation, containing the storage DataBlock.

IsExternalUrl(string?)

Determines whether a URL is external (different host from source).

protected bool IsExternalUrl(string? url)

Parameters

url string

The URL to check.

Returns

bool

True if the URL is external; otherwise, false.

ResolveUrl(string?)

Resolves a relative URL against the source URL.

protected string ResolveUrl(string? url)

Parameters

url string

The URL to resolve.

Returns

string

The absolute URL.