Analysis of Aadhaar enrolment and update patterns across India to identify meaningful trends and insights.
Unlocking Societal Trends in Aadhaar Enrolment and Updates - Identify meaningful patterns, trends, anomalies, or predictive indicators in Aadhaar data and translate them into clear insights or solution frameworks that can support informed decision-making and system improvements.
This analysis uses three types of Aadhaar data:
Count of demographic information updates by age group.
Count of biometric (fingerprint/iris) updates by age group.
Count of new Aadhaar enrolments by age group.
GeoJSON file containing state boundary polygons for map visualizations.
| Column | Type | Description |
|---|---|---|
| date | datetime | Date of activity (DD-MM-YYYY format) |
| state | string | State name (lowercase) |
| district | string | District name |
| pincode | integer | 6-digit postal code |
| demo_age_5_17 | integer | Demographic updates for ages 5-17 |
| demo_age_17_ | integer | Demographic updates for ages 17+ |
| bio_age_5_17 | integer | Biometric updates for ages 5-17 |
| bio_age_17_ | integer | Biometric updates for ages 17+ |
| age_0_5 | integer | New enrolments for ages 0-5 |
| age_5_17 | integer | New enrolments for ages 5-17 |
| age_18_greater | integer | New enrolments for ages 18+ |
- Python 3.8 or higher
- pip package manager
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt
-
Set up data path (optional - defaults to
/content/drive/MyDrive/uidaifor Google Colab):# Linux/Mac export AADHAAR_DATA_PATH="/path/to/your/data" # Windows (PowerShell) $env:AADHAAR_DATA_PATH="C:\path\to\your\data" # Windows (CMD) set AADHAAR_DATA_PATH=C:\path\to\your\data
-
Organize your data:
your_data_path/ ├── combined_dataset2/ │ ├── demo_combined.csv │ ├── bio_combined.csv │ └── enroll_combined.csv └── INDIA_STATES.geojson -
Run the analysis:
jupyter notebook
aadhaar-analysis/
├── config.py # Configuration and paths
├── constants.py # Constants and definitions
├── requirements.txt # Python dependencies
├── README.md # This file
├── utils/ # Utility modules
│ ├── __init__.py
│ ├── data_loader.py # Data loading functions
│ ├── preprocessing.py # Data cleaning & merging
│ ├── features.py # Feature engineering
│ ├── visualizations.py # Plotting functions
│ └── ml_models.py # Machine learning models
└── analysis.ipynb # Analysis notebook
# Import utilities
from utils import (
load_aadhaar_data,
format_dates,
merge_datasets,
clean_merged_data,
add_all_features,
plot_state_map,
plot_bar
)
# Load and prepare data
demo, bio, enroll = load_aadhaar_data()
demo = format_dates(demo)
bio = format_dates(bio)
enroll = format_dates(enroll)
# Merge and clean
df = merge_datasets(demo, bio, enroll)
df = clean_merged_data(df)
# Add features
df = add_all_features(df)
# Visualize
state_totals = df.groupby('state')['total_activity'].sum().reset_index()
plot_state_map(state_totals, 'total_activity', 'Total Activity by State')from utils import train_all_models
# Train all ML models
results = train_all_models(df)
# Access results
print(f"Regression R²: {results['regression']['metrics']['r2_test']:.4f}")
print(f"Classification Accuracy: {results['classification']['metrics']['accuracy_test']:.4f}")
print(f"Clustering Silhouette: {results['clustering']['metrics']['silhouette']:.4f}")The analysis includes:
- Choropleth Maps - State-wise activity distribution
- Bar Charts - Top states by activity type
- Time Series - Daily activity trends
- Heatmaps - Temporal patterns
- Scatter Plots - District-level analysis
The analysis includes three ML models:
- Purpose: Predict total activity based on temporal features
- Features: day, month, is_weekend
- Target: total_activity
- Purpose: Classify dominant activity type
- Features: demo_ratio, bio_ratio, enrol_ratio, day_of_week, month
- Target: activity_type
- Purpose: Group similar activity patterns
- Features: Standardized activity ratios and temporal features
To test the utility functions:
# Test imports
python -c "from utils import *; print('All imports successful')"Last Updated: January 14, 2026