Interview with Data Scientist Pablo de Pedraza on data needs for growth and in crises

Like in the financial market crisis, the world struggles again in the #coronavirus crisis with data needs to proper respond to the challenges. Why are data so important, and why are social media data not a simple solution? Some insights from an interview with data scientist Pablo de Pedraza.

Some core messages of the interview:

Data is a source of economic growth and innovation. The data flow in a data economy is semicircular – from households and firms to data holders, but not back.
If knowledge extraction from data is a natural monopoly, the amount of knowledge generated is below the socially desirable amount.
Many agents that could generate valuable knowledge do not have enough access to data.
The more citizens are responding to Covid-19 apps, the more data and knowledge we have about the virus.
In a data economy, the race for innovation is a race for data.

A related new research paper of Pablo de Pedraza on the topic is:

GLO Discussion Paper No. 515: The Semicircular Flow of the Data Economy and the Data Sharing Laffer curve – Download PDF by de Pedraza, Pablo & Vollbracht, Ian

The first use of social media data for policy analysis in response to data needs in a huge crisis has been made in the context of the 2009 financial market crisis:

Nikolaos Askitas and Klaus F. Zimmermann, Google Econometrics and Unemployment Forecasting, Applied Economics Quarterly, 55 (2009), pp. 107-120.

GLO Fellow Pablo de Pedraza (European Commission, DG Joint Research Centre) is an Economist interested in the use of web data for economic research. His research interests are web data, life satisfaction and the semicircular flow of the data economy. The scientific output expressed does not imply a European Commission policy position. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use that might be made of this publication.

Middle photo by Mika Baumeister on Unsplash

Interview

GLO: What is the “data economy” and its “semicircular flow”?

Pablo de Pedraza: The “Data economy” is an economy where data plays a crucial role.
From a consumption perspective, personal data is similar to money; citizens pay online services with their data. From a production point of view, data is like oil; a raw material to produce digital services. Data is a source of economic growth and innovation. Theoretical economic concepts also apply to the “data economy” which has its own market failures.
The “semicircular” flow is a theoretical model that aims to simplify the complex reality of the “data economy.” It facilitates discussion about the why and how of data policy.
The general idea is that the data flow is semicircular; from households and firms to data holders, but not in the other direction. Households and firms receive data-driven services that are the result of knowledge extracted from data. If citizens would receive data, they would not have the capacity to process nor extract knowledge from that data. One of the objectives of data policy is to empower the individual.

GLO: Why is the level of knowledge below the socially desirable amount?

Pablo de Pedraza: The question is whether the process of extracting knowledge from data is a natural monopoly, which is an empirical question for which we have no answer yet. We can observe how data holders like large technology companies behave. They are data hungry, in the search for the perfect marketing tool. In econometric terms, they are in a race towards N=All and X=everything.
If knowledge extraction from data is a natural monopoly, and data holders are profit maximization agents and monopoly theory holds, the amount of knowledge generated is below the socially desirable amount. Therefore, public intervention should focus on increasing knowledge generation. What kind of knowledge? Data is a different type of good depending on the kind of knowledge generated. Using data to generate market power is a demerit good. Data is a merit good if used: to deliver nimbler public policy, to protect competitive markets, to forecast economic cycles, to protect consumer’s rights, and study a pandemic.

GLO: Some people say, we have enough data, but not the right one….

Pablo de Pedraza: The data economy has its own sources of access inequalities similar to income inequalities. Many agents that could generate merit knowledge, such as the scientific community, central banks or anti-trust authorities, do not have enough access to data.
For example, research literature shows how online searches can improve forecasting models. One of the main conclusions from that literature is that better understanding of results needs the disclosure of more data. More accurate forecasting is an example of merit knowledge that benefits the whole society, including data holders.

GLO: Can your theory help us to understand the data challenge in the coronavirus crisis?

Pablo de Pedraza: Yes. Think about mobile apps to track Covid-19. The more citizens responding to Covid-19 apps, the more data and knowledge we have about the virus. The semicircular flow of the economy defines the data sharing Laffer curve. It explains the theoretical determinants of optimum data sharing as the point where society generates the maximum amount of data and merit knowledge. Principles that define the curve, such as trust, apply to the covid-19 data challenge. When citizens understand the data dimension of the economy and trust the rule of law, they are more willing to contribute to a solution, install the app, and give consent to share their data. If they do not understand what they are giving their consent for, they will be hesitant to install the app and therefore; data generated will be lower and knowledge will be below the socially desirable amount.

GLO: What are the conclusions for data sharing policies?

Pablo de Pedraza: In a data economy, the race for innovation is the race for data. Leaving data policy only in the hands of data holders will not solve antitrust concerns. The lack of competition stifles innovation although it may initially attract investment. However, excessive intervention discourages investment from data holders and generate surveillance concerns. Countries able to empower well-informed citizens by developing their data literacy, fostering user centric approaches, building strong public data infrastructures and institutions will win the race. Citizens operating in a secure environment will generate more data and increase innovation.
In my opinion, data sharing policies are just as vital and important as fiscal and monetary policies. The semicircular flow of the economy is a data sharing theoretical framework. The data dimension of the covid-19 crisis is an illustrative example of that framework.

*************
With Pablo de Pedraza spoke Klaus F. Zimmermann, GLO President.
Further activities and reports of the GLO Research Cluster on the coronavirus.

Ends;

Interview

Share this: