Skip to content

Plugin API

Plugins are the site-specific half of Ladon. A plugin bundles a Source (discovers top-level refs), one or more Expanders (fan out through the URL tree), and a Sink (fetches each leaf and returns a record). All three are structural protocols — no inheritance from Ladon is required.

Protocols

typing.Protocol definitions for Ladon crawl plugins.

Adapters implement these protocols by structural subtyping — no inheritance from this module is required. This keeps third-party plugins decoupled from Ladon internals.

All adapters receive a configured HttpClient instance. They must not construct their own HTTP sessions or import requests directly.

The three-layer pipeline is:

Source  →  [Expander, ...]  →  Sink

Source produces top-level refs. Each Expander takes a ref and returns an Expansion (record + child refs). Sink takes a leaf ref and returns a final record. CrawlPlugin bundles all three.

Source

Bases: Protocol

Discover top-level refs from an external source.

Source code in src/ladon/plugins/protocol.py
@runtime_checkable
class Source(Protocol):
    """Discover top-level refs from an external source."""

    def discover(self, client: HttpClient) -> Sequence[object]:
        """Return all discoverable top-level references."""
        ...

discover(client)

Return all discoverable top-level references.

Source code in src/ladon/plugins/protocol.py
def discover(self, client: HttpClient) -> Sequence[object]:
    """Return all discoverable top-level references."""
    ...

Expander

Bases: Protocol

Expand one ref into a record plus child refs.

Source code in src/ladon/plugins/protocol.py
@runtime_checkable
class Expander(Protocol):
    """Expand one ref into a record plus child refs."""

    def expand(self, ref: object, client: HttpClient) -> Expansion:
        """Fetch ref, return its record and the child refs to process next.

        Raises:
            ExpansionNotReadyError: ref is not yet ready to be expanded.
            PartialExpansionError: child list is incomplete.
            ChildListUnavailableError: child list could not be retrieved.
        """
        ...

expand(ref, client)

Fetch ref, return its record and the child refs to process next.

Raises:

Type Description
ExpansionNotReadyError

ref is not yet ready to be expanded.

PartialExpansionError

child list is incomplete.

ChildListUnavailableError

child list could not be retrieved.

Source code in src/ladon/plugins/protocol.py
def expand(self, ref: object, client: HttpClient) -> Expansion:
    """Fetch ref, return its record and the child refs to process next.

    Raises:
        ExpansionNotReadyError: ref is not yet ready to be expanded.
        PartialExpansionError: child list is incomplete.
        ChildListUnavailableError: child list could not be retrieved.
    """
    ...

Sink

Bases: Protocol

Consume a leaf ref and return its final record.

Source code in src/ladon/plugins/protocol.py
@runtime_checkable
class Sink(Protocol):
    """Consume a leaf ref and return its final record."""

    def consume(self, ref: object, client: HttpClient) -> object:
        """Fetch and parse one leaf ref, returning a complete record.

        Context for the leaf (e.g. parent data) flows through
        ``ref.raw`` — no parent-record parameter is needed here.

        Raises:
            LeafUnavailableError: ref could not be fetched or parsed.
        """
        ...

consume(ref, client)

Fetch and parse one leaf ref, returning a complete record.

Context for the leaf (e.g. parent data) flows through ref.raw — no parent-record parameter is needed here.

Raises:

Type Description
LeafUnavailableError

ref could not be fetched or parsed.

Source code in src/ladon/plugins/protocol.py
def consume(self, ref: object, client: HttpClient) -> object:
    """Fetch and parse one leaf ref, returning a complete record.

    Context for the leaf (e.g. parent data) flows through
    ``ref.raw`` — no parent-record parameter is needed here.

    Raises:
        LeafUnavailableError: ref could not be fetched or parsed.
    """
    ...

CrawlPlugin

Bases: Protocol

Bundle of all adapters for one crawl domain.

name is a short identifier used in log lines and error messages (e.g. "christies_online", "sothebys"). source produces top-level refs. expanders is an ordered list of expansion steps (one per tree level above the leaves). sink consumes the leaf refs produced by the last expander.

CLI convention

When loaded via ladon run --plugin module:Class, the CLI instantiates the plugin as plugin_cls(client=client). Adapters intended for CLI use must accept client as a keyword argument in __init__. This constraint is not enforced by the Protocol check — it is a CLI convention only and not part of this protocol.

Source code in src/ladon/plugins/protocol.py
@runtime_checkable
class CrawlPlugin(Protocol):
    """Bundle of all adapters for one crawl domain.

    ``name`` is a short identifier used in log lines and error messages
    (e.g. ``"christies_online"``, ``"sothebys"``). ``source`` produces
    top-level refs. ``expanders`` is an ordered list of expansion steps
    (one per tree level above the leaves). ``sink`` consumes the leaf
    refs produced by the last expander.

    CLI convention
    --------------
    When loaded via ``ladon run --plugin module:Class``, the CLI
    instantiates the plugin as ``plugin_cls(client=client)``.  Adapters
    intended for CLI use **must** accept ``client`` as a keyword argument
    in ``__init__``.  This constraint is not enforced by the Protocol
    check — it is a CLI convention only and not part of this protocol.
    """

    @property
    def name(self) -> str: ...

    @property
    def source(self) -> Source: ...

    @property
    def expanders(self) -> Sequence[Expander]: ...

    @property
    def sink(self) -> Sink: ...

Data models

Immutable data models for Ladon plugin adapters.

All models are frozen dataclasses. Adapters produce them; the runner consumes them. The raw field on Ref carries house-specific data that does not fit the shared schema.

Expansion is returned by an Expander and carries the record for the current node plus the child refs to be expanded or consumed next.

Ref dataclass

Generic reference to any crawlable resource.

url is the canonical URL of the resource. raw carries any house-specific data discovered alongside the URL (e.g. an ID or code needed by the expander).

Source code in src/ladon/plugins/models.py
@dataclass(frozen=True)
class Ref:
    """Generic reference to any crawlable resource.

    ``url`` is the canonical URL of the resource. ``raw`` carries any
    house-specific data discovered alongside the URL (e.g. an ID or
    code needed by the expander).
    """

    url: str
    raw: Mapping[str, object] = field(default_factory=_empty_raw)

Expansion dataclass

Result of an Expander.expand() call.

Carries the record for the expanded node plus the child refs to be processed next (either expanded further or consumed by a Sink).

Source code in src/ladon/plugins/models.py
@dataclass(frozen=True)
class Expansion:
    """Result of an Expander.expand() call.

    Carries the record for the expanded node plus the child refs to be
    processed next (either expanded further or consumed by a Sink).
    """

    record: object
    child_refs: Sequence[object]

Errors

Error taxonomy for Ladon house plugins.

Each exception maps to a specific runner behaviour. Keeping these distinct prevents the catch-all except-Exception pattern that masked real failures in pre-Ladon crawlers.

PluginError

Bases: Exception

Base class for all plugin-level errors.

Source code in src/ladon/plugins/errors.py
class PluginError(Exception):
    """Base class for all plugin-level errors."""

ExpansionNotReadyError

Bases: PluginError

The ref is not yet ready to be expanded (e.g. content not live).

The runner should skip this ref without writing to DB or disk. Do not retry during the same run; the ref will be discovered again on the next scheduled run.

Source code in src/ladon/plugins/errors.py
class ExpansionNotReadyError(PluginError):
    """The ref is not yet ready to be expanded (e.g. content not live).

    The runner should skip this ref without writing to DB or disk.
    Do not retry during the same run; the ref will be discovered again
    on the next scheduled run.
    """

PartialExpansionError

Bases: PluginError

The child list was fetched but is incomplete (e.g. a paginated response returned fewer items than the declared total).

Runner behaviour: non-fatal for non-first expanders — the affected branch is isolated and recorded in RunResult.errors. Propagates unchanged from the first expander.

Raise this instead of ChildListUnavailableError when the HTTP response was valid but the payload signals an incomplete result. Raise ChildListUnavailableError when the response could not be parsed or the request itself failed.

Source code in src/ladon/plugins/errors.py
class PartialExpansionError(PluginError):
    """The child list was fetched but is incomplete (e.g. a paginated
    response returned fewer items than the declared total).

    Runner behaviour: non-fatal for non-first expanders — the affected
    branch is isolated and recorded in ``RunResult.errors``. Propagates
    unchanged from the first expander.

    Raise this instead of ``ChildListUnavailableError`` when the HTTP
    response was valid but the payload signals an incomplete result.
    Raise ``ChildListUnavailableError`` when the response could not be
    parsed or the request itself failed.
    """

ChildListUnavailableError

Bases: PluginError

The child list could not be retrieved.

Fatal for this ref's run. Raised when the network request succeeded but the response cannot be parsed into a usable child list.

Source code in src/ladon/plugins/errors.py
class ChildListUnavailableError(PluginError):
    """The child list could not be retrieved.

    Fatal for this ref's run. Raised when the network request succeeded
    but the response cannot be parsed into a usable child list.
    """

LeafUnavailableError

Bases: PluginError

A single leaf ref could not be fetched or parsed.

Non-fatal. The runner logs the failure, increments leaves_failed, and continues to the next leaf.

Source code in src/ladon/plugins/errors.py
class LeafUnavailableError(PluginError):
    """A single leaf ref could not be fetched or parsed.

    Non-fatal. The runner logs the failure, increments leaves_failed,
    and continues to the next leaf.
    """

AssetDownloadError

Bases: PluginError

An asset download failed.

Not caught by the runner — propagates as a fatal error that aborts the run. Plugins requiring non-fatal handling must catch this exception internally before returning from the Sink or Expander.

Source code in src/ladon/plugins/errors.py
class AssetDownloadError(PluginError):
    """An asset download failed.

    **Not caught by the runner** — propagates as a fatal error that aborts
    the run.  Plugins requiring non-fatal handling must catch this exception
    internally before returning from the Sink or Expander.
    """