Barcelona's municipal digital archive, managed through the Arxiu Municipal de Barcelona, confirmed this week that it has begun a formal deduplication sweep across its publicly accessible image database — a collection that spans more than 1.2 million photographs, maps, and urban planning documents. The move follows an internal audit completed in late June that flagged a significant volume of duplicate or near-duplicate files slowing search response times and complicating access for researchers and city planners alike.
The timing is deliberate. The archive, headquartered on Carrer de Santa Llúcia in the Gothic Quarter, is in the middle of a five-year digitisation push that accelerated after the COVID-19 pandemic disrupted in-person access to physical records. As more material has been scanned and uploaded — including entire runs of historical urban photography from the Eixample expansion and documentation of waterfront redevelopment around the Port Olímpic zone — file redundancy has multiplied faster than staff could manually address it.
What the Review Found This Week
The audit identified duplicate rates running as high as 18 percent in certain thematic collections, particularly those related to public infrastructure projects along the Diagonal and in Poblenou's 22@ innovation district. Those folders were populated by multiple departments uploading the same project photography independently, with no shared metadata standard in place until a protocol adopted by the Ajuntament de Barcelona in March 2025.
The deduplication process is not simply deleting files. Archive technicians are cross-referencing duplicates against the original acquisition records to determine which version carries the most complete metadata — geolocation tags, date stamps, photographer attribution — before replacing lower-quality copies with a canonical version. Where duplicates differ meaningfully in resolution or crop, both may be retained under a merged record. The review is expected to free up roughly 340 gigabytes of server storage once complete, according to a project brief published on the archive's institutional portal on July 2.
The practical impact reaches beyond internal housekeeping. The archive's public search tool, used by journalists, urban researchers, architects, and students at institutions including the Universitat Politècnica de Catalunya, returns cluttered results when duplicates carry conflicting tags. A search for historical images of the Mercat de Sant Antoni, for instance, currently surfaces the same photograph under at least four different catalogue entries with inconsistent dates — a problem the deduplication project is specifically designed to correct.
Broader Context: Tourism Pressure and Urban Documentation
The archive's workload has grown considerably as the Collboni administration has prioritised documentation of the city's built environment alongside its crackdown on short-term tourist rentals. Legal challenges related to rental licence revocations in the Barceloneta neighbourhood and in parts of Gràcia have generated substantial photographic and survey evidence that must be catalogued alongside historic material. The sheer volume of new intake has strained a system that was built for a slower pace of acquisition.
City archivists have also flagged a related problem: images sourced from external contractors during public works projects in recent years were delivered in multiple formats and resolutions without standardisation, creating cascading duplication when files were ingested automatically. A new vendor protocol, effective from September 2026, will require all contractors delivering visual documentation to the Ajuntament to submit a single master file per subject, tagged to an agreed metadata schema. That requirement applies to projects across all ten districts.
For researchers and journalists who use the archive's online portal, the most immediate change will arrive in August, when the updated catalogue is expected to go live. Searches will return consolidated records rather than scattered duplicates, and each image will link to its acquisition source. The archive's reading room on Carrer de Santa Llúcia will continue operating normal hours — Monday to Friday, 9am to 2pm — throughout the transition, with staff available to assist with manual catalogue queries while the automated review runs in the background.