Suscripción gratuita
The Daily Barcelona

Barcelona news, every day

News

Barcelona's Municipal Archives Tackle Duplicate Image Crisis With New AI Scanning Push

The city's digitisation programme hit a turning point this week as archivists and tech partners moved to clear a backlog of thousands of redundant photographs clogging public collections.

By Barcelona News Desk · Published 4 July 2026, 8:57 pm

3 min read

Barcelona's Municipal Archives Tackle Duplicate Image Crisis With New AI Scanning Push
Photo: Photo by AXP Photography on Pexels
Traduciendo…

Barcelona's municipal digitisation effort took a concrete step forward this week when the Arxiu Municipal de Barcelona announced it had completed the first phase of an automated duplicate-detection sweep across more than 400,000 scanned photographs in its online catalogue. The sweep, which ran through late June, identified a provisional figure of roughly 38,000 images flagged as duplicate or near-duplicate files — redundant scans that have occupied server space, confused researchers and slowed public access to the city's visual history for years.

The timing matters. The Ajuntament de Barcelona has been pushing hard to overhaul its digital public services ahead of the 2027 centenary of the 1929 International Exposition, which the city plans to commemorate with an expanded online archive open to schools, journalists and international researchers. Duplicate imagery isn't a trivial housekeeping problem: cluttered catalogues degrade search results, inflate storage costs and make it harder for educators in places like the Eixample or Gràcia to pull usable material for local history projects. Every redundant file that sits unresolved is, in effect, a barrier between residents and their own civic memory.

What the Sweep Actually Found

The detection work was carried out using open-source perceptual hashing tools adapted by a team at the Institut Municipal d'Informàtica, the city's in-house technology body based on Carrer de Balmes. Rather than simply deleting flagged files, archivists are running a secondary human review on a sample of roughly 5,000 images before any permanent removal takes place — a precaution that reflects hard lessons learned when the Biblioteca de Catalunya, on Carrer de l'Hospital in the Raval neighbourhood, lost contextual metadata during an earlier batch-deletion exercise in 2019.

Of the 38,000 flagged files, early review suggests around 60 percent are true exact duplicates — the same scan uploaded more than once during migration from legacy systems between 2014 and 2021. The remaining 40 percent are near-duplicates: slightly different crops, resolution variants or successive frames from the same photographic session. Those require more careful handling, because in some cases minor differences carry documentary value — a slightly wider crop might show a street sign or a crowd edge that matters to historians working on, say, the urban transformation of Poblenou or the demolition-era Sant Antoni market.

The Institut Municipal d'Informàtica has not published a final cost figure for the project, but the municipality's 2026 digital infrastructure budget, approved by the Consell Municipal in February, allocated €2.3 million to archive modernisation overall. The duplicate-image work sits within that envelope.

Practical Consequences for Researchers and the Public

For anyone who uses the Arxiu Municipal's public portal — accessible free of charge at arxiu.bcn.cat — the most immediate change will come in search quality. Once the review phase closes, expected by September, the archive plans to retire confirmed duplicates and consolidate metadata into single canonical records. That should cut retrieval noise significantly on high-demand search terms like "Via Laietana" or "Barceloneta," where duplicate records currently return multiple identical results and force users to scroll past redundant hits.

The Escola Superior d'Arxivística i Gestió de Documents, which trains professional archivists and is affiliated with the Universitat Autònoma de Barcelona, has been involved in advising on review protocols. The institution has long argued that automated deduplication without human oversight risks destroying provenance chains — the documentary thread that tells you not just what an image shows but when, by whom and under what circumstances it entered a public collection.

Researchers and educators who rely on the archive are advised to check their saved search links after September, since canonical record identifiers may change when duplicates are merged. The Arxiu Municipal has said it will maintain redirect URLs for at least 18 months to prevent broken citations in academic work. Anyone with active research projects drawing on the photographic collections should export their reference lists before the September migration window opens — and flag any cases where they believe two apparently identical images are, on closer inspection, meaningfully distinct.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Barcelona

This article was produced by the The Daily Barcelona editorial desk and covers news in Barcelona. See our editorial standards for how we use AI.

The Daily Barcelona brief

The day's Barcelona news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Barcelona news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Barcelona and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Barcelona

More in News

Enjoyed this story? Get tomorrow's briefing free.