Barcelona's municipal archive, the Arxiu Municipal de Barcelona, confirmed this week that an internal audit launched in late June has identified more than 14,000 duplicate image files across its publicly accessible digital collections — a backlog that has slowed researcher access and inflated storage costs at the institution's Sant Pau repository for at least two years.
The problem matters now because the city is mid-way through a broader digitisation push tied to the 2026 municipal cultural budget, which allocated funds specifically for expanding online access to historical photographic collections covering the Eixample district's modernista heritage and the 1992 Olympic legacy archive. When duplicate files sit unresolved in a database, catalogue searches return redundant results, metadata tags conflict, and the underlying storage bill grows — costs that ultimately fall on public accounts already stretched by competing municipal priorities under Mayor Jaume Collboni's administration.
What the Audit Found in the Gràcia and Poblenou Collections
The duplicate problem is concentrated in two collections. The first is the Gràcia neighbourhood street photography series, digitised between 2019 and 2022, where scanning contractors uploaded batches without cross-referencing existing entries. The second is the Poblenou industrial heritage set, assembled partly from donations by the 22@ technology district's urban development office. In both cases, images were ingested multiple times under slightly different file names — a common error when several digitisation teams work in parallel without a shared deduplication protocol.
The Consorci de Serveis Universitaris de Catalunya, which provides shared digital infrastructure to several Catalan public institutions, has been brought in to run automated hash-matching software across the affected servers. Hash matching compares a unique numerical fingerprint generated from each image file; if two fingerprints are identical, one copy is flagged for removal or merging. The process is expected to take until mid-September 2026 to complete across all affected collections.
For researchers using the archive's public portal on Carrer de Santa Llúcia, the practical effect has been a slower search interface and, in some cases, catalogue entries listing the same 1970s photograph of the Mercat de Sant Antoni three or four times under different accession numbers. Librarians at the Biblioteca de Catalunya on Carrer de l'Hospital have fielded complaints from academic users who found cross-referencing between the two institutions' systems unreliable as a result.
Broader Lessons for the City's Digital Infrastructure
The episode has prompted a wider conversation about procurement standards. The Ajuntament de Barcelona's digital services directorate circulated an internal note in June — the existence of which was confirmed by the archive's published meeting minutes from 23 June 2026 — recommending that future digitisation contracts require vendors to submit a deduplication compliance report before final payment is released. The note cited the Poblenou case specifically as the prompt for the policy change.
Barcelona is not alone in confronting this. Madrid's Archivo Regional de la Comunidad de Madrid completed a similar cleanup exercise in 2024, removing roughly 9,000 redundant files from its civil war photography collection. The Barcelona audit is proportionally larger relative to total collection size.
Storage is not cheap. Commercial cloud archiving at the scale the Arxiu Municipal operates — estimated at several hundred terabytes across all collections — runs to significant annual costs, and duplicate files mean institutions pay to store and serve the same data more than once. Industry benchmarks suggest deduplication exercises at comparable European municipal archives have cut active storage requirements by between 8 and 15 percent.
Researchers and institutions that rely on the public portal should expect intermittent reduced functionality on Tuesday and Wednesday mornings through August, when the hash-matching jobs are scheduled to run. The archive's Sant Pau reading room remains open normal hours. Anyone with an active image loan request filed before 1 July 2026 has been told their materials will not be affected by the catalogue restructuring currently underway.