Suscripción gratuita
The Daily Barcelona

Barcelona news, every day

News

Barcelona's Municipal Archive Moves to Fix Duplicate Image Problem That Has Plagued Digital Records for Years

A systematic overhaul of the city's photo catalogue is under way after thousands of mislabelled and repeated files were found clogging the public digital archive.

By Barcelona News Desk · Published 4 July 2026, 9:00 pm

3 min read

Barcelona's Municipal Archive Moves to Fix Duplicate Image Problem That Has Plagued Digital Records for Years
Photo: Lawson, W. R. (William Ramage) / Public domain (Wikimedia Commons)
Traduciendo…

Barcelona's Arxiu Municipal de Barcelona confirmed this week that it has launched a dedicated cleanup operation targeting thousands of duplicate and mislabelled image files that have accumulated across its digital collections since the first mass digitisation push began in 2009. The problem, long acknowledged internally, has now reached a scale that is actively hampering public access to historical records.

The issue matters now because the archive is in the middle of a broader digitalisation drive tied to the city's 2025–2028 Digital Barcelona Plan, which commits the Ajuntament de Barcelona to open, searchable public data. Duplicate images inflate storage costs, produce false results in public search tools, and in some cases have caused the same photograph to be attributed to different dates or locations — a particular problem for records relating to the Eixample district and the Gothic Quarter, two areas with the heaviest photographic documentation going back to the late nineteenth century.

The operational hub for the project sits at the Arxiu Municipal Contemporani, on Carrer de Sant Pau in the Raval neighbourhood, which holds the largest single collection of twentieth-century municipal photographs. Staff there are working alongside technicians from the Institut Municipal d'Informàtica, the city's own IT agency, to run automated deduplication software across an estimated 340,000 digitised items. A parallel manual review is focused on approximately 12,000 files flagged as high-priority — images tied to legal records, urban planning permits, and cultural heritage listings where a cataloguing error carries real administrative consequences.

How Duplicates Built Up

The root cause is straightforward: successive digitisation campaigns over nearly two decades used different file-naming conventions, metadata standards, and scanning resolutions. When collections were merged into a single content management system — a process that accelerated between 2018 and 2022 — matching logic failed to catch near-identical images that had been scanned twice at different resolutions or cropped slightly differently. The result was a catalogue where a single 1960s photograph of the Mercat de Sant Antoni renovation could appear under three separate reference numbers with three different credited dates.

The Sant Antoni market case is not hypothetical. It has been cited internally as a textbook example of how the problem compounds: researchers requesting images for academic publication, including from institutions such as the Universitat de Barcelona's history faculty on Gran Via de les Corts Catalanes, have on at least two occasions in the past three years received duplicate files without realising it, leading to corrections in published work.

Storage is a measurable cost. Municipal IT procurement documents from 2024 put the annual bill for archive cloud storage at just under €280,000. Officials have indicated — without giving a precise figure — that eliminating confirmed duplicates could trim that figure meaningfully before the next budget cycle, which begins in January 2027.

What Comes Next for Researchers and the Public

The deduplication project is expected to run through the end of October 2026. Once the automated phase is complete, the Arxiu Municipal plans to update its public search portal — accessible at arxiu.barcelona.cat — with corrected metadata and consolidated file entries. Users who have saved direct links to specific archive images should expect some URLs to change when duplicate records are merged and a single canonical entry is kept.

For anyone with ongoing research projects dependent on the archive, the practical advice is to download and locally save any images already in use, noting the current reference numbers. The archive's reading room on Carrer de Sant Pau will remain open during the process, and staff are handling queries about specific collections on a case-by-case basis.

The wider lesson is one that other European city archives have been wrestling with for years. Madrid's Archivo de Villa completed a similar deduplication exercise in 2023. Barcelona's effort is more complex by volume, but the methodology being applied — a hybrid of perceptual hashing algorithms and human review — is now considered standard practice. The goal is a cleaner, faster public catalogue by the time Barcelona hosts its next major wave of visitors. That deadline is self-imposed, and the archive team is not getting more staff to meet it.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Barcelona

This article was produced by the The Daily Barcelona editorial desk and covers news in Barcelona. See our editorial standards for how we use AI.

The Daily Barcelona brief

The day's Barcelona news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Barcelona news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Barcelona

More in News

Enjoyed this story? Get tomorrow's briefing free.