The Economic Survey of 2018-19 has fascinating ideas on capitalising data to accelerate India’s economic development and improve delivery and access to public services.
The aligning of data as a key driver of economic development and the push for digitising every public interaction with economic agents are bound to have the potential to make us possibly the greatest data source, given the size of our population and economy. The survey highlights many possible benefits to different sections of society from a data-driven approach.
Also read: Better data can improve public education in India – draft National Education Policy says it too
Data as public good
Information, data and statistics had conventionally meant the study of facts systematically gathered from entities. As the Economic Survey reminds us, this is no more so. It has a chapter describing economic policy uncertainty (EPU) measured exclusively using non-conventional data and how it correlates with economic growth.
The EPU index is computed from textual analysis of media use of selected words that indicate concern for economy. We have also heard of sentiment indices being compiled from usage of words expressed in social media interactions that could vaguely suggest a particular sentiment towards an issue or individual.
The idea that data is a public good is not a new idea as it can be shown that it has certain economic properties like non-rivalry in use (use of information by one person does not preclude others from using it) and non-excludability in consumption (same information can be used by several persons simultaneously without affecting its content), which characterise public (in contrast to private) goods. Inherent in this formulation is the absence of property rights on data that allows everyone to use it unrestricted.
It has always been the general assumption that it is the government that gathers data about citizens and their transactions in the economy and society, and consequently, the responsibility for providing the public good rests with the government and not the private players. So, for official data that enables objective assessment of the problems and prospects of the economy as well as society, there is a strong argument for treating it as public good on par with physical and social infrastructure. But the current line of thinking on data appears to go on a different tangent.
The old paradigms of data for evidence-based policymaking or data for measuring outcomes of governance or data for development are getting replaced by the new paradigm that ‘data is development’.
As India moves to a digital economy, we are expected to generate enough data that will find huge markets; only that there should be proper data infrastructure in place capturing every need and deed of the citizens. This infrastructure comes rather cheap now. The talk of data as the new oil hinges on the possibility of monetising data by converting it into ‘subscribed’ services.
Also read: Plagiarism, data manipulation hurting India’s research, govt panel raises alarm
Data of the people
The discussion on harnessing administrative data with the government using easily available technologies depends on the premise that data can be accessed easily from diverse government schemes. While this premise is attractive, India’s past history shows how far we are from this.
Centralised storage of government data was envisaged by Dr N. Seshagiri when he set up the National Informatics Centre (NIC) in the late 1970s. That the ministries and departments of government of India showed no inclination to part with their data made the NIC take up e-governance applications for the government instead; a job they have done commendably.
Much later, the setting of the national data portal by NIC as part of the National Data Sharing and Access Policy (NDSAP) in 2012 has done little to bring government data on a common sharable platform. The portal has a huge amount of processed data already available in the owner’s websites with almost no usable metadata and complete absence of common data standards except the open data formats.
Spatial data with the government has done no better. Efforts during the last 15 years to bring together the spatial data on a single platform through the National Spatial Data Infrastructure (NSDI) are still ongoing. One would have thought this to be an easier task, given that the government organisations involved in collecting and processing spatial data in different areas are some of the best in the world. The bringing together of spatial data is even more important now in the face of serious concerns emerging on environmental issues regarding land, water, human habitat, air, weather, etc.
One area where we can be proud of data sharing is in the field of socio-economic data from surveys. The microdata from most surveys funded by the Centre and, to a limited extent, states, is now available to researchers. However, there are still large-scale data-gathering operations by several ministries not open to researchers. Some of these include the data from cost of cultivation surveys, agricultural censuses, livestock censuses, minor irrigation censuses, etc. Sadly, most of them remain stand-alone exercises with no effort to have any common data standards, making it difficult to link any of these huge databases. The mother of all such data collection operations, the decennial population census, still remains a highly underutilised source of data.
The accessibility of databases maintained by the government as part of enrolment in beneficiary schemes, management information systems developed for projects, or in compliance to statutory provisions is difficult except for the parent organisations or through special permissions. This makes it impossible to comment on the accuracy and completeness of such databases. One such database namely the MCA-21 data used extensively for the new GDP series was in the news recently for all the wrong reasons.
The discussions above are, to use the economic survey terminology, on ‘data of the people’ collected by the government, most often with a specific intent generally known to the respondents. The data ‘by the people’ will mostly originate from their day-to-day transactions or interactions with other agencies and may not require any specific consent from them as is the case of ‘digital footprints’ left behind after such transactions.
Also read: India’s policy on data must focus on access, not physical location
Data by the people
As noted in the Economic Survey, the increased efficiency of gathering data, freely provided by the public in their transactions with government or private agencies, makes such data a very attractive source for data users. The marginal cost of collecting information also falls once the main infrastructure is in place.
Digital nature of these transactions using standard identifiers like PAN, Aadhaar, mobile number, email, social media makes the integration of such data easier than other kinds of data. To allay the fears of citizens regarding the overwhelming capacity of the state to harness the digital data from transactions, there is this option of choosing not to participate in payment services or by not participating in a survey. These options are becoming fewer with the insistence on digital transactions and the impending leap towards a cashless society.
One cannot even escape from surveys when these are conducted under the auspices of the Collection of Statistics Act. To that extent, these data gathering processes clearly lack explicit voluntary consent of the respondents. It can always be said that there are explicit agreements people need to approve before one uses these digital forms of interactions, although most often microscopic wordings that accompany such transactions or the phrase ‘I agree’ are all that are available to the citizens.
The data set provided by the people in this fashion does not form a complete population in a statistical sense. It is a partial set. From a statistician’s point of view, conclusions drawn from this data may not have statistical properties of unbiasedness and consistency that can come only from a well-defined population and proper statistical procedures for data selection and analysis. Many claims regarding savings to the government through cross-checking of multiple beneficiary databases for eliminating duplicates or ineligible beneficiaries are therefore not taken very seriously.
Also read: Modi govt should harness data, make it a public good for better governance: Eco Survey
Data for the people
The main concern is how all the data generated from different, but interlinked, sources can be made useful to Indians and bring efficiency in their economic transactions.
One can cite the example of excluding an individual owning a car from the below poverty line (BPL) list by linking the vehicle registration data and the BPL data, although we do not have the contra situation of including people in the BPL by looking at their absences from databases such as land or property ownership.
The results of the last situation assessment survey of farmers by NSSO show that the percentage of farmers even aware of the minimum support price (MSP) was about 45 per cent for sugarcane and less than 25 per cent for all other crops except rice and wheat. Visualising a marginal Indian farmer using her mobile app to sell her products requires a leap in imagination. The claims on use of e-NAM and other portals need to be backed by data, objectively sampled.
Data needs to be for the people. A villager should be able to find the nearest hospital in case of a family member falling sick or parents should be able to locate the nearest school with the least absenteeism of teachers while admitting their child. It presupposes the availability of choices besides the data infrastructure. Currently, a small minority of citizens in metros may use technology-driven data for food delivery or hotel accommodations. Otherwise, these examples look futuristic in the given context.
The question that comes naturally then is why create multiple databases for different government services at all and then talk of linking them. With the technology available, it should be possible to provide the services by extracting data from fewer available databases.
Bank accounts or other financial instruments are linked to Aadhaar, but then why not update their databases when the Aadhaar details are changed. If 120 crore Indians have Aadhaar numbers, why not prepare voters list for them using the addresses in the Aadhaar card? It is always easier to work on citizens but far more difficult to work on government systems for the people.
The same is the case with updating details in vehicle registration, ration card, PAN, pension card, bank account, passport, driving licence and in all other services that are already linked with Aadhaar. Updating is the responsibility of the citizens, although one would hope that these processes would be automatic in an information-based society. Commendable efforts in these directions have been initiated in some states like Rajasthan (through the Bhamashah scheme). Freeing the citizens from feeding data into different systems will contribute substantially to the ease of living.
The use of data obtained from social media interactions, e-commerce, online searches by private players is on an increase. Clearly, data has economic and commercial value. It is therefore not surprising that the Economic Survey concludes that “if the marginal benefit to a farmer of acquiring price information is higher than the marginal cost of that information, he would pay for that information. Consequently, the private sector would cater to his need by gathering and selling him the information he wants”.
Also read: Modi govt will count hawkers in job data but must improve their Ease of Doing Business too
Data – a means or an end?
As the UN fundamental principles of official statistics articulate, “Official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by statistical agencies to honour citizens entitlement to public information”. It also mandates that the data should be strictly confidential and used exclusively for analytical purposes. The current debate initiated through the Economic Survey goes beyond the concepts of data used as official statistics. It is an all-encompassing idea of information generated in every conceivable way and format.
It must be accepted that the discussion on data has to be directed towards improving the administrative architecture in the governmental system in such a way that the burden on the citizens to provide data to multiple agencies is reduced.
Providing access to analyse this data for better understanding of the functioning of government schemes and their outcomes will be next. This will imply setting up protected environment for data access and sharing.
Privacy of citizens’ data has to be an integral part of this system so that data does not become another instrument of coercion. Ensuring data integrity by adopting common standards can improve the economic value of data. Commercial gains from the data are an attractive proposition, but have the potential to make citizens vulnerable; not just from the state but also from the entities that has the resources to access and process the data.
The author was member and acting Chairperson of National Statistical Commission till he quit in January 2019. Views are personal.