Table of Contents

Class ImageExtractorConnectorConfiguration

Namespace
Datafication.Connectors.WebConnector.Connectors
Assembly
Datafication.WebConnector.dll

Configuration for the image extractor connector.

public class ImageExtractorConnectorConfiguration : WebConnectorConfigurationBase, IDataConnectorConfiguration
Inheritance
object
ImageExtractorConnectorConfiguration
Implements
Inherited Members

Remarks

This connector extracts image elements and their metadata from web pages. It supports filtering by size, extension, and can extract lazy-loaded images.

Properties

AllowedExtensions

Gets or sets file extensions to include.

public List<string> AllowedExtensions { get; set; }

Property Value

List<string>

Remarks

When not empty, only images with these extensions are included. Extensions should include the dot (e.g., ".jpg", ".png"). Case-insensitive matching.

BackgroundImageSelector

Gets or sets the CSS selector to scope background image extraction.

public string BackgroundImageSelector { get; set; }

Property Value

string

Remarks

Only used when IncludeBackgroundImages is true. Default is "*" which checks all elements.

ExcludedExtensions

Gets or sets file extensions to exclude.

public List<string> ExcludedExtensions { get; set; }

Property Value

List<string>

Remarks

Images with these extensions are excluded from results. Default excludes SVG and ICO files which are typically icons.

ImageSelector

Gets or sets the CSS selector for images.

public string ImageSelector { get; set; }

Property Value

string

Remarks

Default matches img elements and picture source elements.

IncludeBackgroundImages

Gets or sets whether to extract CSS background images.

public bool IncludeBackgroundImages { get; set; }

Property Value

bool

Remarks

When true, scans elements for background-image CSS property. Note: This requires UseBrowser = true for computed styles. When UseBrowser is false, only inline styles are checked.

IncludeDataSrc

Gets or sets whether to extract data-src for lazy-loaded images.

public bool IncludeDataSrc { get; set; }

Property Value

bool

Remarks

When true (default), checks for data-src, data-lazy, data-original, and other common lazy-loading attributes.

IncludeParentInfo

Gets or sets whether to include parent element information.

public bool IncludeParentInfo { get; set; }

Property Value

bool

IncludeSrcset

Gets or sets whether to extract srcset attribute.

public bool IncludeSrcset { get; set; }

Property Value

bool

MinHeight

Gets or sets the minimum height to include.

public int? MinHeight { get; set; }

Property Value

int?

Remarks

Images with height less than this value are excluded. Only applies to images with explicit height attributes. Null (default) means no minimum.

MinWidth

Gets or sets the minimum width to include.

public int? MinWidth { get; set; }

Property Value

int?

Remarks

Images with width less than this value are excluded. Only applies to images with explicit width attributes. Null (default) means no minimum.

RemoveDuplicates

Gets or sets whether to remove duplicate image URLs.

public bool RemoveDuplicates { get; set; }

Property Value

bool

ResolveUrls

Gets or sets whether to resolve relative URLs to absolute.

public bool ResolveUrls { get; set; }

Property Value

bool