The Arxiu Municipal de Barcelona confirmed this week that a long-overdue deduplication project is now in full operation, targeting an estimated backlog of redundant and duplicate images inside its public-facing digital catalogue. The sweep, which began in earnest on Monday 30 June, is the largest systematic image audit the archive has undertaken since its online portal was expanded in 2019.
The timing matters. Barcelona City Council has been pushing hard to open its municipal records infrastructure to researchers, journalists and residents — part of a broader transparency drive championed under Mayor Jaume Collboni. But bloated digital archives stuffed with duplicate scans slow search tools, inflate storage costs and erode trust in the institution's data quality. The deduplication push is a direct response to those concerns.
What the Week's Work Actually Involved
Technicians working out of the Arxiu Municipal's main facility on Carrer de Sant Pacià, in the Sant Antoni neighbourhood, deployed perceptual hashing software to flag near-identical images across the catalogue's urban photography collections. These collections span everything from early twentieth-century streetscapes of the Eixample grid to construction documentation from the post-Olympic port redevelopment of the 1990s.
The Palau de la Virreina, on La Rambla, which houses one of the city's key photographic collections under the Institut de Cultura de Barcelona (ICUB), is also involved. Coordinators there confirmed this week that its digitised holdings — which run to more than 120,000 images of public festivals and civic events — are being cross-referenced against the Arxiu Municipal's own database to eliminate duplication between the two institutions. Before this week's work, certain iconic images of Mercè festival processions appeared in both catalogues under different file names and metadata tags, creating confusion for researchers trying to licence or cite material.
The practical trigger was a report circulated internally in late May, which found that roughly 18 percent of images in one major urban planning subcollection were flagged as either exact or near-duplicate files. That figure came after a pilot scan of around 40,000 images from the 1975–2005 period, when the archive was digitising film negatives and physical prints in bulk and quality-control processes were less rigorous. Duplicate entries in that tranche alone were consuming an estimated 2.3 terabytes of redundant server space, according to the internal assessment.
Why the Rental and Tourism Economy Sharpened the Pressure
There is a more immediate commercial dimension to the clean-up. With the city's tourist tax expansion under active review — the current rate for cruise passengers arriving at the Port de Barcelona stands at €7 per person per day — city agencies have been under pressure to ensure publicly funded digital assets are clearly documented and licensable. Researchers, publishers and film production companies regularly request rights to historical Barcelona street images. Duplicate entries with conflicting metadata complicate that licensing process and have, in at least two documented cases this year, led to the same image being licensed separately to two different parties.
The ICUB charges a base fee starting at €30 for non-commercial single-use reproduction of archival photographs. For commercial use, rates rise significantly depending on circulation. Duplicate records undermine that revenue model by creating situations where the same asset is effectively untracked across separate catalogue entries.
The deduplication project is scheduled to complete its first full pass by 31 July. After that, a second phase will address metadata standardisation — giving each surviving image a single, consistent record linking photographer, date, neighbourhood and rights status. The Eixample and Gràcia neighbourhood collections are first in line for that second phase, given their high search traffic.
Researchers and institutions seeking access to Barcelona's urban photographic collections can follow the project's progress through the Arxiu Municipal's online portal. The archive advises anyone who has downloaded catalogue exports in the past six months to recheck file references after 1 August, when the cleaned database goes live.