Plugin API
Plugins are the site-specific half of Ladon. A plugin bundles a Source
(discovers top-level refs), one or more Expanders (fan out through the
URL tree), and a Sink (fetches each leaf and returns a record). All
three are structural protocols — no inheritance from Ladon is required.
Protocols
typing.Protocol definitions for Ladon crawl plugins.
Adapters implement these protocols by structural subtyping — no inheritance from this module is required. This keeps third-party plugins decoupled from Ladon internals.
All adapters receive a configured HttpClient instance. They must not
construct their own HTTP sessions or import requests directly.
The three-layer pipeline is:
Source → [Expander, ...] → Sink
Source produces top-level refs. Each Expander takes a ref and
returns an Expansion (record + child refs). Sink takes a leaf
ref and returns a final record. CrawlPlugin bundles all three.
Source
Bases: Protocol
Discover top-level refs from an external source.
Source code in src/ladon/plugins/protocol.py
Expander
Bases: Protocol
Expand one ref into a record plus child refs.
Source code in src/ladon/plugins/protocol.py
expand(ref, client)
Fetch ref, return its record and the child refs to process next.
Raises:
| Type | Description |
|---|---|
ExpansionNotReadyError
|
ref is not yet ready to be expanded. |
PartialExpansionError
|
child list is incomplete. |
ChildListUnavailableError
|
child list could not be retrieved. |
Source code in src/ladon/plugins/protocol.py
Sink
Bases: Protocol
Consume a leaf ref and return its final record.
Source code in src/ladon/plugins/protocol.py
consume(ref, client)
Fetch and parse one leaf ref, returning a complete record.
Context for the leaf (e.g. parent data) flows through
ref.raw — no parent-record parameter is needed here.
Raises:
| Type | Description |
|---|---|
LeafUnavailableError
|
ref could not be fetched or parsed. |
Source code in src/ladon/plugins/protocol.py
CrawlPlugin
Bases: Protocol
Bundle of all adapters for one crawl domain.
name is a short identifier used in log lines and error messages
(e.g. "christies_online", "sothebys"). source produces
top-level refs. expanders is an ordered list of expansion steps
(one per tree level above the leaves). sink consumes the leaf
refs produced by the last expander.
CLI convention
When loaded via ladon run --plugin module:Class, the CLI
instantiates the plugin as plugin_cls(client=client). Adapters
intended for CLI use must accept client as a keyword argument
in __init__. This constraint is not enforced by the Protocol
check — it is a CLI convention only and not part of this protocol.
Source code in src/ladon/plugins/protocol.py
Data models
Immutable data models for Ladon plugin adapters.
All models are frozen dataclasses. Adapters produce them; the runner
consumes them. The raw field on Ref carries house-specific data
that does not fit the shared schema.
Expansion is returned by an Expander and carries the record for
the current node plus the child refs to be expanded or consumed next.
Ref
dataclass
Generic reference to any crawlable resource.
url is the canonical URL of the resource. raw carries any
house-specific data discovered alongside the URL (e.g. an ID or
code needed by the expander).
Source code in src/ladon/plugins/models.py
Expansion
dataclass
Result of an Expander.expand() call.
Carries the record for the expanded node plus the child refs to be processed next (either expanded further or consumed by a Sink).
Source code in src/ladon/plugins/models.py
Errors
Error taxonomy for Ladon house plugins.
Each exception maps to a specific runner behaviour. Keeping these distinct prevents the catch-all except-Exception pattern that masked real failures in pre-Ladon crawlers.
PluginError
ExpansionNotReadyError
Bases: PluginError
The ref is not yet ready to be expanded (e.g. content not live).
The runner should skip this ref without writing to DB or disk. Do not retry during the same run; the ref will be discovered again on the next scheduled run.
Source code in src/ladon/plugins/errors.py
PartialExpansionError
Bases: PluginError
The child list was fetched but is incomplete (e.g. a paginated response returned fewer items than the declared total).
Runner behaviour: non-fatal for non-first expanders — the affected
branch is isolated and recorded in RunResult.errors. Propagates
unchanged from the first expander.
Raise this instead of ChildListUnavailableError when the HTTP
response was valid but the payload signals an incomplete result.
Raise ChildListUnavailableError when the response could not be
parsed or the request itself failed.
Source code in src/ladon/plugins/errors.py
ChildListUnavailableError
Bases: PluginError
The child list could not be retrieved.
Fatal for this ref's run. Raised when the network request succeeded but the response cannot be parsed into a usable child list.
LeafUnavailableError
Bases: PluginError
A single leaf ref could not be fetched or parsed.
Non-fatal. The runner logs the failure, increments leaves_failed, and continues to the next leaf.
AssetDownloadError
Bases: PluginError
An asset download failed.
Not caught by the runner — propagates as a fatal error that aborts the run. Plugins requiring non-fatal handling must catch this exception internally before returning from the Sink or Expander.