Class ImageExtractorConnectorConfiguration
- Namespace
- Datafication.Connectors.WebConnector.Connectors
- Assembly
- Datafication.WebConnector.dll
Configuration for the image extractor connector.
public class ImageExtractorConnectorConfiguration : WebConnectorConfigurationBase, IDataConnectorConfiguration
- Inheritance
-
objectImageExtractorConnectorConfiguration
- Implements
- Inherited Members
Remarks
This connector extracts image elements and their metadata from web pages. It supports filtering by size, extension, and can extract lazy-loaded images.
Properties
AllowedExtensions
Gets or sets file extensions to include.
public List<string> AllowedExtensions { get; set; }
Property Value
- List<string>
Remarks
When not empty, only images with these extensions are included. Extensions should include the dot (e.g., ".jpg", ".png"). Case-insensitive matching.
BackgroundImageSelector
Gets or sets the CSS selector to scope background image extraction.
public string BackgroundImageSelector { get; set; }
Property Value
- string
Remarks
Only used when IncludeBackgroundImages is true. Default is "*" which checks all elements.
ExcludedExtensions
Gets or sets file extensions to exclude.
public List<string> ExcludedExtensions { get; set; }
Property Value
- List<string>
Remarks
Images with these extensions are excluded from results. Default excludes SVG and ICO files which are typically icons.
ImageSelector
Gets or sets the CSS selector for images.
public string ImageSelector { get; set; }
Property Value
- string
Remarks
Default matches img elements and picture source elements.
IncludeBackgroundImages
Gets or sets whether to extract CSS background images.
public bool IncludeBackgroundImages { get; set; }
Property Value
- bool
Remarks
When true, scans elements for background-image CSS property. Note: This requires UseBrowser = true for computed styles. When UseBrowser is false, only inline styles are checked.
IncludeDataSrc
Gets or sets whether to extract data-src for lazy-loaded images.
public bool IncludeDataSrc { get; set; }
Property Value
- bool
Remarks
When true (default), checks for data-src, data-lazy, data-original, and other common lazy-loading attributes.
IncludeParentInfo
Gets or sets whether to include parent element information.
public bool IncludeParentInfo { get; set; }
Property Value
- bool
IncludeSrcset
Gets or sets whether to extract srcset attribute.
public bool IncludeSrcset { get; set; }
Property Value
- bool
MinHeight
Gets or sets the minimum height to include.
public int? MinHeight { get; set; }
Property Value
- int?
Remarks
Images with height less than this value are excluded. Only applies to images with explicit height attributes. Null (default) means no minimum.
MinWidth
Gets or sets the minimum width to include.
public int? MinWidth { get; set; }
Property Value
- int?
Remarks
Images with width less than this value are excluded. Only applies to images with explicit width attributes. Null (default) means no minimum.
RemoveDuplicates
Gets or sets whether to remove duplicate image URLs.
public bool RemoveDuplicates { get; set; }
Property Value
- bool
ResolveUrls
Gets or sets whether to resolve relative URLs to absolute.
public bool ResolveUrls { get; set; }
Property Value
- bool