The Rijksmuseum's collection metadata references over 32,000 distinct places — production sites, depicted locations, and artist birthplaces spanning five continents and five centuries. Most arrive as bare names without coordinates. rijksmuseum-mcp+ geocodes these places by cascading through multiple authority databases, enabling proximity search, geographic analysis, and spatial reasoning over the entire collection.
Places appear in four roles across the collection, linking artworks to geography through
the vocabulary database's mappings table. But the Rijksmuseum's source data provides
coordinates for only a fraction of them. Most arrive as text labels with an authority URI
(Wikidata, Getty TGN, or GeoNames) but no latitude or longitude.
A single place like "Amsterdam" may appear in all four roles across thousands of artworks.
Production places come from OAI-PMH dcterms:spatial; depicted places from
dc:subject; birth/death places from Linked Art person records.
Each place in the vocabulary database carries an external_id linking it to one or more
authority databases. The geocoding pipeline resolves coordinates by querying these authorities
in order of reliability, falling through to the next source when a query returns no result.
Places with no authority link are sent by name to the Wikidata and WHG reconciliation APIs,
which return fuzzy-matched candidates; a local scoring layer then combines string similarity,
geographic type, and coordinate availability to accept, flag for review, or reject each match.
SPARQL batch queries against the Getty Thesaurus of Geographic Names. 200 IDs per request.
SPARQL queries for P625 (coordinate location). 400 QIDs per batch. Fallback to P159, P131, P276.
REST API lookups by GeoNames ID. One-by-one at 1 req/sec.
Fuzzy name matching via World Historical Gazetteer. Resolves historical variants.
Places without coordinates inherit from their nearest geocoded parent via broader_id.
Each source contributes coordinates with different strengths. The pipeline queries them in order of precision and batch efficiency.
| Source | Method | Places | Geocoded | Strengths |
|---|---|---|---|---|
| Getty TGN | SPARQL endpoint foaf:focus → wgs84:lat/long |
8,884 | 8,882 (99.98%) | Art-world standard. Curated by Getty. Includes historical place names and hierarchies. |
| Wikidata | SPARQL endpoint P625 + P159/P131/P276 fallbacks |
10,308 | 10,264 (99.6%) | Largest coverage. Fallback properties (headquarters, admin territory) resolve institutions and districts. |
| GeoNames | REST JSON API by numeric ID |
1,215 | 1,201 (98.8%) | Good for modern administrative units. Free tier rate-limited (1,000 req/hour). |
| World Historical Gazetteer | Reconciliation API fuzzy name matching |
~2,283 | ~2,283 (accepted) | Resolves historical variants (Batavia→Jakarta, Leyden→Leiden). Bridges to other authorities. |
| Hierarchy inheritance | Recursive CTE broader_id → parent coords |
~9,460 | ~9,460 | Streets, buildings, landmarks inherit city-level coordinates from parent place. 79% of places have a parent. |
The geocoding script (geocode_places.py) runs post-harvest in six sequential phases,
each targeting a different category of unresolved places.
Batch SPARQL queries to Getty TGN and Wikidata using existing authority IDs from the harvest. GeoNames IDs resolved via REST API. Handles ~20,000 places in minutes.
When P625 (coordinates) is missing, follow indirect paths: P159 (headquarters location), P131 (located in admin territory), P276 (location). Catches institutions and districts.
Map Getty TGN IDs to Wikidata via P1667 (TGN ID property), then query Wikidata for coordinates. Recovers places where TGN itself lacks coordinates but Wikidata has them.
Some vocabulary entries reference other entries that have already been geocoded. Copy coordinates from the resolved entry. Handles aliases and duplicate records.
Places with no authority ID are matched by name against Wikidata and the World Historical Gazetteer. Confidence scoring: auto-accept ≥0.85, flag 0.50–0.85 for review, reject <0.50.
Systematic checks for hemisphere errors, null island false positives (0°N, 0°E), latitude/longitude swaps, and authority misidentifications detected via production-place vs depicted-place distance outliers.
The vocabulary database links places into a parent–child tree via broader_id,
derived from the Rijksmuseum's place hierarchy (P89_falls_within in CIDOC-CRM).
79% of places have a parent. This hierarchy serves two purposes.
Streets, buildings, and landmarks that no authority can geocode directly inherit coordinates from their nearest geocoded ancestor. Precision drops to city-level, but coverage jumps from 64% to 96%.
The expandPlaceHierarchy flag on search_artwork recursively
expands a place to include all descendants — so "Netherlands" also returns
artworks produced in Amsterdam, Delft, Haarlem, and every other Dutch location.
search_artwork(productionPlace: "Netherlands",
expandPlaceHierarchy: true)A collection spanning the 15th to 21st century uses place names from many eras. The World Historical Gazetteer is particularly valuable here — its datasets include Dutch colonial names, VOC-era toponyms, and medieval variants that modern gazetteers do not cover.
| Historical name | Modern name | Resolved via | Context |
|---|---|---|---|
| Batavia | Jakarta | WHG | VOC headquarters, 1619–1942 |
| Leyden | Leiden | WHG | Historical English/Latin spelling |
| 's-Gravenhage | Den Haag / The Hague | Wikidata | Formal Dutch name |
| Cochin | Kochi | WHG | VOC trading post, Kerala |
| Elmina | Elmina | WHG | WIC fort, Gold Coast (Ghana) |
| Amstelledamme | Amsterdam | WHG | Medieval Dutch |
With coordinates in the database, rijksmuseum-mcp+ supports spatial queries that would be impossible with place names alone.
Find artworks produced in, depicting, or connected to places within a radius of a named location. Uses a custom Haversine distance function in SQLite with a bounding-box pre-filter for performance.
search_artwork(nearPlace: "Oude Kerk Amsterdam",
nearPlaceRadius: 0.5)Ambiguous queries like "Paleis van Justitie Den Haag" are resolved via progressive token splitting: try the full string, then progressively drop right-side tokens, using the remainder as geographic context for disambiguation.
Proximity queries double as a data quality tool: outlier distances between an artwork's production place and depicted place can surface authority misidentifications that would otherwise be invisible in text-only metadata.
Radii from 100 metres to 500 km. Useful for both fine-grained queries ("artworks from this street") and regional surveys ("production in the Rhine valley").
nearPlaceRadius: 0.1 → 100m (a building)nearPlaceRadius: 25 → 25km (a city region)nearPlaceRadius: 200 → 200km (a province)
Each phase of the pipeline adds coordinates to a new tranche of places. Authority ID resolution handles the bulk; hierarchy inheritance closes the long tail.
The 1,314 ungeocoded places are primarily orphan vocabulary entries (not linked to any artwork), extremely local landmarks, and a small number of places with authority IDs but no coordinates in any source.
Geocoding at this scale surfaces errors that are invisible in text-only metadata. Phase 4 runs systematic validation checks; several caught real errors in the source data.
Tewkesbury (England) linked to Getty TGN 7821058 — which is Tewkesbury, Tasmania. Invisible without coordinates; detected by distance outlier analysis (16,000 km from co-occurring English places).
6 Caribbean/Dutch locations had latitude/longitude swaps placing them near the Falkland Islands. 2 negative-latitude signs inverted Northern Hemisphere locations to the Southern. All 8 caught by systematic hemisphere checks.
Places geocoded to 0°N, 0°E (a point in the Gulf of Guinea) indicate a missing-data default, not a real location. Flagged and excluded.
A vocabulary entry for "Moon" was geocoded to Moon Township, Pennsylvania. Manually corrected.