Class WebConnectorBase<TConfig>
- Namespace
- Datafication.Connectors.WebConnector.Connectors
- Assembly
- Datafication.WebConnector.dll
Abstract base class providing shared HTML fetching logic for web connectors.
public abstract class WebConnectorBase<TConfig> : IDataConnector where TConfig : WebConnectorConfigurationBase
Type Parameters
TConfigThe configuration type for this connector.
- Inheritance
-
objectWebConnectorBase<TConfig>
- Implements
- Derived
- Extension Methods
Remarks
This class handles both HTTP-only and browser-based page fetching, controlled by the UseBrowser setting. Derived classes implement the specific parsing logic for their data type.
Constructors
WebConnectorBase(TConfig)
Initializes a new instance of the WebConnectorBase<TConfig> class.
protected WebConnectorBase(TConfig configuration)
Parameters
configurationTConfigThe configuration for this connector.
Exceptions
- ArgumentNullException
Thrown when configuration is null.
Properties
Configuration
Gets the configuration for this connector.
protected TConfig Configuration { get; }
Property Value
- TConfig
Methods
FetchDocumentAsync()
Fetches and parses HTML from the configured source.
protected Task<IDocument> FetchDocumentAsync()
Returns
- Task<IDocument>
A parsed HTML document.
Exceptions
- HttpRequestException
Thrown when the HTTP request fails.
- TimeoutException
Thrown when the request times out.
FetchHtmlAsync()
Fetches the raw HTML content from the configured source.
protected Task<string> FetchHtmlAsync()
Returns
- Task<string>
The HTML content as a string.
GetConnectorId()
Gets the connector identifier from the configuration.
public string GetConnectorId()
Returns
- string
The unique identifier for this connector instance.
GetDataAsync()
Gets data from the web source as a DataBlock.
public abstract Task<DataBlock> GetDataAsync()
Returns
- Task<DataBlock>
A task that represents the asynchronous operation, containing the resulting DataBlock.
GetStorageDataAsync(IStorageDataBlock, int)
Gets data from the web source and appends it to a storage DataBlock.
public virtual Task<IStorageDataBlock> GetStorageDataAsync(IStorageDataBlock target, int batchSize = 10000)
Parameters
targetIStorageDataBlockThe target storage DataBlock to append data to.
batchSizeintThe batch size for processing (not typically used for web connectors).
Returns
- Task<IStorageDataBlock>
A task that represents the asynchronous operation, containing the storage DataBlock.
IsExternalUrl(string?)
Determines whether a URL is external (different host from source).
protected bool IsExternalUrl(string? url)
Parameters
urlstringThe URL to check.
Returns
- bool
True if the URL is external; otherwise, false.
ResolveUrl(string?)
Resolves a relative URL against the source URL.
protected string ResolveUrl(string? url)
Parameters
urlstringThe URL to resolve.
Returns
- string
The absolute URL.