Open data won’t save us

Open data is (or was?) lauded to be key in addressing myriad socioeconomic challenges, with its advocators arguing that such data is essential in developing “data-driven” or “data-based” solutions. It follows that many governments, from supranational levels such as the European Union (EU) to sub-national levels such as city councils, have implemented various open data initiatives, laws, and policies to reap its potential benefits. Notably, the EU introduced the PSI Directive in 2003 (recast as the EU Open Directive 2019), and the US President Obama introduced the Executive Order “Making Open and Machine Readable the New Default for Government Information” in 2013.


The more data, the better (or is it?)

However, in recent years, new data initiatives have been introduced supposedly to supplement open data by facilitating new modes of data sharing. Notably, in the EU, the current attention lies on the Common European Data Spaces. The underlying premise behind this initiative resembles the philosophy around open data: the more data available, the better it is for society. This is evident from the European Commission (EC)’s website, which states, “Common European Data Spaces will make more data available for access and reuse. This will be done in a trustworthy and secure environment for the benefit of European businesses and citizens.”


In one of ODECO’s interactions with officers from the EC, we were told that from 1.8 million open datasets in the European data portal, only a tiny proportion of them are reused. Hence, according to them, Common European Data Spaces is meant to complement open data initiatives by promoting the sharing of data that cannot be provided under an open license. I follow this logic if the data shared within the Data Spaces is like the missing puzzle that will enable or enhance the utility of some of the open data that has already been published. First, is this indeed the reality? But more fundamentally, is more data what we need as a society?


Open data amnesia?

Perhaps over decades of the development of open data (the least restricted way of data sharing), with researchers and practitioners contributing to its progress, we lost the genesis of why open data was promoted in the first place. Perhaps we lost the bigger picture that all this while, the availability (or even reusability) of data is not supposed to be the end in itself but a means. Perhaps also, the typical framing of sociotechnical agents within (open) data networks as either “providers” or “users” contributes to this amnesia, obscuring the bigger socioeconomic ambitions outside the data sphere. In fact, perhaps the preoccupation with “data-based” or “data-driven” solutions led us to overlook that those “solutions” (like data or open data) are not supposed to be the end in itself but a means, and they do not necessarily address structural problems that the society has to confront. Due to this preoccupation, we may also fail to recognize that data itself could be part of the problem or the problem! It is well-documented how data embodies real-world systemic biases and power relations.


(Open) data won’t save us

Even though I may sound like a skeptic, I appreciate and advocate the value of (open) data. There are many examples of open data being put to good use. For instance, open data provided by the Global Forest Watch has been used by indigenous groups in Peru to track and report forest loss in the Amazon and by international researchers to study global forest fire trends over time. Open data facilitated by the Humanitarian OpenStreetMap Team (HOT) has helped numerous disaster responses. Open data provided on the Johns Hopkins COVID-19 Dashboard has helped governments and relevant stakeholders around the world to track the spread of the pandemic.


At the same time, I’m also encouraging us to rethink (1) if more data is necessarily better for society and (2) if data is what we (only) need, in the first place, to tackle some of the grand challenges that we, as a society, have to grapple with. In other words, I’m calling for researchers and practitioners to pay serious attention to the “social” part of socio-technical systems of data—with all its complexity, unpredictability, and ever-changing nature—and to embrace transdisciplinary views beyond a narrow technocratic one.  This endeavor is certainly not easy given all of our individual and organizational limitations.


To end, I illustrate my point with a simple analogy. We know that humans need water, and thus, water is valuable to us. Imagine a sick person who suffered continuous vomiting and stomach ache. Even though, admittedly, she would need to drink water to replenish the water loss from her body, would drinking more and more water necessarily make her better? Or should we investigate and address the root causes of her illness instead? It could be that she suffers from a waterborne disease! Hence, the water itself is the problem!