Provenance Data Model and Query Patterns

Reference material for search_provenance. Consult this page when you need to interpret raw provenance text, understand the data model, or construct specialised query patterns.


Provenance Text Format (AAM Standard)

The Rijksmuseum's provenance text follows the AAM (American Alliance of Museums) standard for provenance notation:

The Rijksmuseum extends the AAM standard with Dutch-language keywords (schenking = gift, bruikleen = loan, verwerving = acquisition, aangekocht = purchased) and institutional conventions (Inv. = inventory reference, L. = Lugt collector mark number).


Transfer Types (CMOA/PLOD-aligned Vocabulary)

Transfer types follow the CMOA Art Tracks thesaurus (Carnegie Museum of Art), aligned with the PLOD framework (Art Institute of Chicago).

TypeCategoryDescription
saleownershipSale, purchase, or auction (includes "purchased by", "acquired from"). Includes unsold lots — check unsold flag.
by_descentownershipInheritance to a named relative (son, daughter, nephew, heir, etc.)
widowhoodownershipInheritance specifically to a widow or widower
inheritanceownershipGeneric inheritance (no specific relationship identified)
bequestownershipTestamentary gift
giftownershipDonation, gift, or presentation
commissionownershipCommissioned creation
exchangeownershipExchange or swap
confiscationownershipSeized by authority
theftownershipStolen
lootingownershipLooted (wartime)
recuperationownershipRecovered by Allied forces (post-WWII, distinct from restitution)
restitutionownershipLegally returned to original owner
collectionownershipBare-name ownership period (AAM convention — no transfer keyword)
inventoryownershipDocumented in an estate inventory or attestation
loancustodyOn loan (temporary custody, no ownership change)
depositcustodyOn deposit or in storage
transferambiguousAdministrative or intra-organisational transfer
unknownambiguousParser could not classify (includes cross-references)

Extensions beyond CMOA: recuperation (physical recovery vs legal return), collection (AAM bare-name convention), inventory (attestation events).

Event Flags

FlagOn typeMeaning
unsold: truesaleAuction lot was unsold, bought in, or withdrawn. No ownership transfer occurred. Filter these when analysing actual sales.
batchPrice: trueany with priceThe price is an en bloc / batch total for multiple artworks, not an individual price. Always filter these when ranking or comparing prices — they massively distort rankings (e.g. fl. 6,350,000 for 393 Mannheimer objects).

Inheritance granularity: use by_descent for works inherited by named relatives, widowhood for widow/widower inheritance specifically, or inheritance for the generic case. To catch all inheritance-related transfers: transferType: ["by_descent", "widowhood", "inheritance"].


Party Roles and Positions

Each party in a provenance event has a role (what they did) and a position (their side of the transfer):

RolePositionContext
buyerreceiverSale events
sellersenderSale events
donorsenderGift events
recipientreceiverGift, transfer, deposit events
heirreceiverBequest, inheritance events
lendersenderLoan events
borrowerreceiverLoan events
patronreceiverCommission events
collectorreceiverCollection events
dealer / intermediary / auctioneeragentFacilitated without owning

Positions (sender/receiver/agent) are derived from roles via deterministic mapping, with LLM enrichment for ambiguous cases. The positionMethod field tracks how each party's position was determined: role_mapping (deterministic), llm_enrichment (LLM-classified), or llm_disambiguation (LLM-resolved merged party text).

Coverage: ~86K parties extracted across ~101K events. Not all events have named parties — bare-name collection events and cross-references often lack structured party data.


Date Representation

Dates use qualified single years with temporal bounds, similar to the EDTF (Extended Date/Time Format) approach:

ExpressiondateYeardateQualifierInterpretation
18081808Exact year
c. 17001700circaApproximate (±10 years in Layer 2)
before 1800 / by 18001800beforeTerminal bound only
after 19451945afterStart bound only
1560-701565Midpoint of range
possibly 17671767circaUncertain attribution
1858 or earlier1858beforeTerminal bound

Historical Currencies

Prices are stored in their original historical currency — no inflation adjustment or cross-currency conversion is performed. Supported currency values:

guilders, pounds, francs, livres, napoléons, guineas, belgian_francs, deutschmarks, reichsmarks, swiss_francs, euros, dollars, yen, marks, louis_d_or.

Pre-decimal notations (£.s.d, fl. X:Y:-) are converted to decimal equivalents of the base currency unit.

Note on batch prices: en bloc prices (a batch sold together, e.g. the entire Mannheimer collection for fl. 6,350,000) are attributed to every individual item in the batch. These events are flagged with batchPrice: true — always filter them out when ranking or comparing prices.


Enrichment Provenance

Every record carries provenance-of-provenance metadata tracking how it was determined:

When results contain LLM-enriched records, search_provenance provides a URL to an enrichment review page showing the full methodology and reasoning for each decision. Always show this URL to the user.

Querying by enrichment method

# Artworks where transfer type was classified by LLM (170 events)
search_provenance(categoryMethod="llm_enrichment", maxResults=10)

# Artworks where party position was assigned by LLM
search_provenance(positionMethod="llm_enrichment", maxResults=10)

# Artworks where merged party text was decomposed by LLM (204 splits)
search_provenance(positionMethod="llm_disambiguation", maxResults=10)

For collection-wide distribution of methods:

collection_stats(dimension="categoryMethod")
collection_stats(dimension="positionMethod")
collection_stats(dimension="parseMethod")

Parse Method Values

The parseMethod field records how each provenance record was processed:

ValueShareDescription
peg~80%PEG grammar parser — highest confidence
cross_ref~20%Cross-reference links
credit_line~0.1%Inferred from the museum's credit line field when the provenance chain lacked acquisition information
regex_fallbackLegacy, currently unused

credit_line events are particularly useful: they recover acquisition context (donor name, purchase fund) that the provenance text omits.


Provenance Facets

search_provenance supports facets: true, returning 5 facet dimensions alongside chain results:

All entries include count and percentage.

# Faceted overview of wartime confiscations
search_provenance(transferType="confiscation", dateFrom=1933, dateTo=1945, facets=true)

# Faceted overview of a dealer's activity
search_provenance(party="Goudstikker", facets=true)

# Faceted audit of LLM-classified events
search_provenance(categoryMethod="llm_enrichment", facets=true)

Tested Query Patterns

Collector profiling

# All works associated with a collector across all event types
search_provenance(party="Mannheimer", maxResults=50)

# Collector as seller specifically
search_provenance(party="Goupil", maxResults=20)
# Then filter results: role="seller" + position="sender" to map direction of trade

# Ownership durations for a family name
search_provenance(layer="periods", ownerName="Six",
                  sortBy="duration", sortOrder="desc", maxResults=20)

Acquisition channel analysis

# creditLine covers ~358K artworks — far more than parsed provenance (~48K)
search_artwork(creditLine="Drucker-Fraser", compact=true)
search_artwork(creditLine="Vereniging Rembrandt", type="painting", compact=true)

Wartime provenance

# Anti-join: confiscated but never restituted
search_provenance(transferType="confiscation",
                  excludeTransferType="restitution", maxResults=20)

# Works with documented gaps + events in the wartime period
search_provenance(hasGap=true, creator="Rembrandt",
                  dateFrom=1933, dateTo=1945, maxResults=20)

# Recuperation events (Allied recovery, distinct from legal restitution)
search_provenance(transferType="recuperation", maxResults=20)

Price and market history

# Most expensive recorded transactions in guilders
# Check batchPrice on results — true means en bloc total, not individual price
search_provenance(hasPrice=true, currency="guilders",
                  sortBy="price", sortOrder="desc", maxResults=20)

# Price history for a specific work
search_provenance(objectNumber="SK-A-2344", layer="events")
# Check unsold flag: unsold lots have prices but no sale occurred

Multi-generation family collections

search_provenance(transferType=["by_descent", "widowhood"],
                  layer="periods", minDuration=50,
                  sortBy="duration", sortOrder="desc", maxResults=20)

Chronological exploration

# Earliest documented provenance events in the collection
search_provenance(dateFrom=1400, dateTo=1500, sortBy="dateYear",
                  sortOrder="asc", maxResults=20)

Decade-level time series

Use collection_stats for single-call time series — no manual pagination loop needed:

# Sale events per decade 1600–1900
collection_stats(dimension="provenanceDecade", transferType="sale",
                 dateFrom=1600, dateTo=1900)

# Half-century bins
collection_stats(dimension="provenanceDecade", transferType="sale",
                 dateFrom=1600, dateTo=1900, binWidth=50)

# Confiscation events by decade (wartime distribution)
collection_stats(dimension="provenanceDecade", transferType="confiscation")
# Reveals 1790s + 1940s bimodal pattern (French Revolution + WWII)