Governance has emerged as an important component to improve public education in India. Critical to improving governance is the manner in which education data is collected and used. Without reliable, timely and authentic data, planning and policy-making are adversely affected.
Despite a lot of energy and investment having gone into building a school-based decentralised data system, there remain several methodological and administrative problems, resulting in unreliable or inadequate data.
A feast of data sources
The District Information System for Education (DISE) set up after Sarva Shiksha Abhiyan (SSA) was launched in 2001, and now Unified-DISE (U-DISE) collects data from 15 lakh schools (government and private) and provides report cards up to the secondary stage for every state, district and school –uploaded on the website on an annual basis.
Education data is also collected from households by panchayats and compiled annually in Village Education Registers (VER). In addition, the Ministry of Human Resource Development commissioned three rounds of household surveys in 2006, 2009 and 2014 that collected information on children in the age group 6 to 13 years. In addition to the NSS and Census of India, several non-government agencies also provide data on education indicators and school participation in some form.
However, in the midst of this ‘feast’ of data sources, we get varied, often contradictory evidence on basic indicators such as the proportion of children out of school, the extent of improvement in retention levels, learning outcomes and quality of education. Even in areas of education finance, such as teacher appointments and salaries, we do not have an authentic database. With multiple sources of data – both governmental and non-government – data neutrality cannot be assumed either.
The anomalies point to the need for methodological as well as administrative reform in the education data regime with greater focus on decentralised management of data.
a) Definitions and methods of estimation
The methodological difficulties begin with the range of definitions and methods of estimation used for important indicators by different agencies collecting data. For instance, the NSS asks, “how many children are currently attending school?” While the Census enumerators ask questions related to “status of attendance in an educational institution”. The MHRD survey claims to follow both sampling and methodology used by the NSS, and yet arrives at vastly different results.
Also, the formats for collecting data are designed centrally and do not take into account local specificities, nor do they adequately train teachers – often the primary data enumerators – to fill the formats.
The dates and periodicity of data collection also vary across sources.
b) Validation and verification of data
Another aspect of data credibility that has proved to be a weak link is the verification and validation of data. While the rules for DISE dictate that 10 per cent of the sample be randomly cross-checked, DISE itself is unable to verify that this process is either regularly or adequately carried out.
In addition, the education departments ignore the evidence presented by other government or non-government sources, to validate and thus improve the credibility of their data.
i) The purpose of generating data
Different agencies plan their data collection for different (and specific) purposes, and not for planning or monitoring education and hence for policy. For example, the education rounds of NSS are part of the survey on social consumption, which in turn is for the purpose of making an assessment of the benefits derived by various sections of society from public expenditure incurred by the government. The population census, on the other hand, is the primary source of basic national population data required for administrative purposes. Only ASER is solely dedicated to education, specifically learning levels. However, it does not tell us how the levels of learning vary with student enrolment or attendance, or any other household factors.
School surveys focus unsurprisingly on collecting information related to a) broad indicators of infrastructure and teacher availability; and b) student enrolment and distribution of incentives. Both these sets of data showcase administrative efforts rather than education progress.
A second conundrum with the purpose and use of education data relates to the fact that planning and policy-making are extremely centralised processes. So, data however collected, plays a limited role in the planning and policy processes. In fact, local data management systems are virtually non-existent, putting paid to the idea of decentralised planning.
ii) Limited state capacity
A second and perhaps overarching problem that faces the data regime in education is that of limited capacities to design, collect, analyse and use data throughout the government structures from the central to the local.
Data that is collected from the ground up, amounts to a process of simple aggregation resulting in the loss of specifics, such that by the time it reaches the central level, it barely reflects the ground realities and could hardly serve the needs of the people.
Further, implicit in the collection process is a conflict of interest, especially with DISE data as it is entirely dependent on formats filled by teachers. It is well established that teachers might be incentivised to misrepresent information to inflate facts, such as student enrolment.
Unfortunately, the personnel involved in collecting and collating information are themselves unable to gauge the importance of it.
The new draft National Education Policy 2019, in recognising the paucity and limitations of the education data regime, has called for “a major effort” in data collection, analysis and organisation. In particular, it proposes the establishment of a new Central Educational Statistics Division (CESD) as an independent and autonomous entity at the NIEPA. It has also suggested the maintenance of a National Repository of Educational Data (NRED) within NIEPA that will include specific indicators common to under-represented groups, in an attempt to track their participation and performance.
Key issues that the CESD and related authorities will nevertheless need to deal with are:
i) Improving definitions, standardising them across sources and using improved methods of collection and estimation of basic indicators.
ii) Developing capacities of the data regime and giving a greater role to data users, especially the education officials at different levels of government ranging from the national to the local.
ii) Developing a local data management system – at the level of the school complex or panchayat would go a long way in tracking change and progress at the school level. It would also be possible to verify the data publicly by the community and improve the system of making school plans.
iii) Validating against different sources can ensure that biases are factored in. Further, a single data set cannot collect information on all relevant issues, because data collection is a very expensive and time-consuming process. So, using different data sets can be economical as well as provide a more holistic image.
iv) Making better use of data through pro-active collaboration of different government and non-government agencies. For instance, if household and school data were available in the same portal, it would maximise their use.
Kiran Bhatty is a senior fellow at CPR. She researches governance issues in elementary education, working to build systems of transparency, accountability, and community monitoring.
This is the fifteenth in a series of articles titled “Policy Challenges 2019-2024” under ThePrint-Centre for Policy Research (CPR) collaboration. A longer version of this piece is available on the CPR website at www.cprindia.org. The full policy document on a range of issues addressed in this series is available on CPR’s website.