Barcelona's Digital Archive Crisis: The Numbers Behind the City's Duplicate Image Problem
Municipal databases and cultural institutions are drowning in redundant visual data — and the cost of doing nothing is rising fast.
Municipal databases and cultural institutions are drowning in redundant visual data — and the cost of doing nothing is rising fast.

Barcelona's network of public digital archives holds an estimated 4.2 million catalogued images across city-managed platforms, and administrators are now confronting an uncomfortable reality: a significant share of that inventory is duplicated, mislabelled, or stored in incompatible formats that make retrieval slow and expensive. The problem, long treated as a bureaucratic footnote, has moved to the top of the agenda at the Institut de Cultura de Barcelona (ICUB) after an internal audit cycle completed this spring flagged the scale of the redundancy.
The timing matters. Mayor Jaume Collboni's administration has committed to a sweeping digitisation push through the city's Barcelona Digital 2030 strategy, which earmarks public infrastructure investment in civic data management. Pouring new resources into systems clogged with duplicate content would undermine that programme before it gains momentum. The audit's findings, circulated internally ahead of a scheduled review in September 2026, put the question of data hygiene at the centre of an otherwise ambitious modernisation effort.
The problem is not evenly distributed. The Arxiu Fotogràfic de Barcelona, housed in the Convent de Sant Agustí in the Sant Pere neighbourhood, manages one of the largest municipal photographic collections in southern Europe — more than 4 million physical and digitised items. Staff there have acknowledged in public presentations that the migration of analogue holdings to digital formats over the past decade produced overlapping records, with the same image sometimes appearing under three or four separate catalogue entries. The Museu d'Història de Barcelona (MUHBA), which manages sites from the Plaça del Rei to the Via Sepulcres, faces a parallel issue in its visual documentation of archaeological excavations, where field photographs shot across multiple seasons were uploaded in batches without deduplication protocols in place.
Across both institutions, technical teams estimate that duplicate or near-duplicate images may account for between 15 and 22 per cent of total stored digital volume — a range that translates, at current cloud storage pricing of roughly €0.023 per gigabyte per month on the city's contracted infrastructure, into tens of thousands of euros in avoidable annual expenditure. Storage cost alone does not capture the full burden: staff time spent resolving conflicting metadata records, responding to public access requests that return duplicate results, and manually verifying image rights before external licensing all inflate the real operational price.
The issue also touches Barcelona's booming creative and tourism economy. The city received 15 million overnight tourist stays in 2024 according to figures published by the Ajuntament de Barcelona, and demand for licensed archival imagery — from publishers, documentary producers, and advertising agencies — has grown alongside it. When image searches return cluttered, redundant results, licensing deals stall and revenue is lost. The Arxiu Municipal already operates a public image licensing service, and administrators have noted informally that search quality directly affects conversion rates for paid licences.
The September review will focus on three immediate steps. First, deploying perceptual hashing tools — software that generates a fingerprint for each image and flags near-identical matches — across the Arxiu Fotogràfic's digital catalogue. Second, establishing a shared metadata standard between ICUB-managed institutions so that future uploads are checked against existing records at the point of ingestion. Third, a pilot deduplication project at MUHBA covering archaeological image sets from the Barcino excavations, scheduled to run through the end of 2026.
For citizens and researchers who use the city's open data portals — particularly the portal at opendata-ajuntament.barcelona.cat — the practical advice is straightforward: cross-reference image records using the catalogue's unique item identifier rather than relying on keyword searches, which currently surface duplicate entries. Until the deduplication work is complete, professionals licensing archival images should request a metadata export from the Arxiu before finalising any agreement, to confirm that the selected item is the canonical record rather than a copy filed under a different accession number.
The September meeting will determine whether the city allocates dedicated budget in the 2027 municipal spending cycle for a full-scale deduplication programme. Given the Barcelona Digital 2030 commitments already on the table, the pressure to act is real — and the numbers behind the problem are no longer easy to ignore.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Barcelona
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News