Suscripción gratuita
The Daily Barcelona

Barcelona news, every day

News

Barcelona Bets on AI Screening to Kill Duplicate Images in Its Booming Digital Archive — and It's Ahead of the Pack

As cities from Amsterdam to Seoul race to clean up bloated visual databases, Barcelona's municipal archive and tourism bodies are deploying automated tools that other European capitals are still planning on paper.

By Barcelona News Desk · Published 4 July 2026, 8:40 pm

3 min read

Barcelona Bets on AI Screening to Kill Duplicate Images in Its Booming Digital Archive — and It's Ahead of the Pack
Photo: Photo by Masi on Pexels
Traduciendo…

Barcelona's Institut de Cultura de Barcelona confirmed earlier this year that its digitisation drive — covering more than 400,000 images held across the Arxiu Fotogràfic de Barcelona in the Gòtic district and satellite collections in Poblenou — had produced a secondary problem nobody had fully anticipated: tens of thousands of duplicate or near-duplicate image files clogging servers and complicating public search tools. The city is now mid-way through a phased clean-up programme using perceptual-hash detection software, a technology that compares pixel fingerprints rather than file names to identify redundant copies.

The timing matters. Barcelona's digital archive push accelerated after 2022, when the municipal government began migrating paper records to support Mayor Jaume Collboni's open-data commitments. That expansion, combined with image contributions from Turisme de Barcelona and a parallel project run by the Consorci de l'Auditori, flooded internal servers with overlapping files sourced from multiple departments that had never been coordinated. The result was a duplication rate that archivists internally estimated at roughly one in five image files — a ratio that, left unaddressed, directly inflates storage costs and degrades the accuracy of the public-facing search portal on bcn.cat.

How Barcelona Compares With Amsterdam and Seoul

The city is not alone in grappling with this, but it is moving faster than many peers. Amsterdam's Stadsarchief — one of Europe's most celebrated municipal archives, housed in the Bazel building on Vijzelstraat — began a comparable deduplication audit in 2024 but has not yet publicly reported completion of even its first collection tier. The Seoul Metropolitan Archives launched a pilot programme in the first quarter of 2025 covering its post-2000 digital holdings, but the project covers a narrower scope than Barcelona's, which extends back to digitised analogue material from the 1880s held at the Arxiu Nacional de Catalunya on Carrer dels Almogàvers in Poblenou.

Barcelona's approach borrows from a framework developed partly in collaboration with the Universitat Politècnica de Catalunya, whose researchers have published on perceptual-hashing accuracy in cultural heritage contexts. The software flags potential duplicates for human review rather than auto-deleting, a safeguard that distinguishes the city's method from more aggressive automated pipelines used in commercial contexts. That conservative approach adds time — the current phase is scheduled to run through the fourth quarter of 2026 — but reduces the risk of accidentally purging images that are similar but not identical, such as different exposures from the same photographic session.

Madrid's Archivo Regional de la Comunidad de Madrid has taken a different route, relying primarily on metadata-matching — comparing file names, creation dates and descriptive tags — rather than pixel-level analysis. Archivists familiar with both systems note that metadata matching is faster but catches fewer duplicates when files have been renamed or reformatted across departments, a common occurrence in large municipal bureaucracies. Barcelona's pixel-based method is slower but expected to recover a higher proportion of genuinely redundant files.

What Residents and Researchers Should Know

The practical consequence for anyone who uses the Arxiu Fotogràfic's public portal — accessible from its reading room on Plaça de Pons i Clerch, just off Via Laietana — is that search results should become more coherent by early 2027. Currently, the same historical image of, say, La Barceloneta beach can appear three or four times in results under different catalogue numbers, a frustration reported by academic researchers and journalists alike.

Turisme de Barcelona, which maintains its own image bank used by travel media and commercial licensees, expects the deduplication to reduce its active library from roughly 85,000 files to somewhere closer to 60,000 — trimming licensing administration costs in the process. Fees for commercial use of archive images in that library currently start at €95 per image for editorial use, a rate that has not changed since 2023.

For researchers planning to access collections at either the Arxiu Fotogràfic or the Arxiu Nacional before the clean-up is complete, archivists advise cross-referencing catalogue numbers manually and flagging suspected duplicates through the online contact form. The deduplicated database is expected to go live in its first public version in February 2027, assuming the current project timeline holds.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Barcelona

This article was produced by the The Daily Barcelona editorial desk and covers news in Barcelona. See our editorial standards for how we use AI.

The Daily Barcelona brief

The day's Barcelona news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Barcelona news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Barcelona

More in News

Enjoyed this story? Get tomorrow's briefing free.