Aadhaar Inclusion and Causality Inference

According to a report published today in Economic Times, 92% of the population in India (based on 2018 consensus) with nearly all adults barring 3.5 million have an Aadhaar card. This is a phenomenal achievement in bringing underprivileged people for social and financial inclusion. Aadhaar card’s unique identification number will soon work as a social security number for the population. Making aadhaar card essential for all monetary dealings ultimately aids in widening tax bracket. Over a period of time, casual inference using multivariate analysis will help in measuring the social impact of aadhaar inclusion.

Coming to multivariate analysis which is correlation analysis using multiple variables to measure causality. Although Random Control Trial (RCT) is the gold standard to measure causality between two randomly selected samples sizes i.e. control group and treatment group,  in this case, it will not be possible to make two groups randomly selected i.e. two groups with very similar demographic profile randomly selected, one sample without aadhaar card and one sample with aadhaar card. However, a time series data of the population over a decade before and after aadhaar implementation might lead to useful insights. This form of statistical analysis is not a form of RCT analysis. 

Random Control Trail usually includes two groups selected at random. Let’s say one needs to measure the effect of a vaccine on patients. RCT methodology would randomly select two groups, one where patients are given the intervention ( i.e. they are given vaccine treatment and the other wherein there is no vaccine treatment).  The difference in the health of the patients over a period of time would suggest a causal inference of the treatment i.e. did the vaccine administer better health. There are many statistical tools and methods to measure causality including the difference in difference, however, one has to take into account omitted variable bias etc. to be sure of the implied inference from the data analysis. Omitted Variable Bias means the effect could be because of some other factors which might not have been included in the experiment.

This is the beginning of a series of articles on statistics…