Skip to content
View Kunal-Somani's full-sized avatar

Highlights

  • Pro

Block or report Kunal-Somani

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Kunal-Somani/README.md

Kunal

B.E. Robotics & Artificial Intelligence, Thapar Institute of Engineering & Technology
B.S. Data Science & Applications, Indian Institute of Technology Madras

LinkedIn Email


Pre-final year undergrad building at the intersection of robotics, computer vision, and autonomous systems. My work runs on real hardware and real infrastructure: multimodal ML pipelines, edge AI systems, agentic backends, and production-grade open-source tooling.

Open Source: Active contributor at JdeRobot/RoboticsAcademy with 16 merged PRs. Resolved an FP16 precision crash in the Object Detection pipeline, fixed deployment script bugs across run_academy.sh and develop_academy.sh, refactored the Hardware Abstraction Layer, and shipped 52 unit tests across 5 test classes.

Research at Thapar ELC (Summer 2025): Multimodal CNN and DNN for Parkinson's early detection. Fused MPU9250 tremor signals with voice recordings via late fusion, pushing combined model accuracy from 88% to 91%.

Computer Vision: Toll fraud detection system built on HOG features and LinearSVC. 24K+ images, 97% accuracy on multi-axle vehicle classification.

Robotics Research (ongoing): Audio-Visual-Thermal fusion architecture for autonomous SAR navigation in visually degraded environments, under Dr. Ankit Soni at Thapar, using Isaac Sim and ROS 2.


Featured Projects

Project Description Stack
Archon Production-deployable instruction-to-deployment backend. Takes a natural language prompt, retrieves context via hybrid RAG (Cohere dense + BM25 sparse, RRF fusion), generates schema-validated code using Anthropic Tool Use, and pushes a live site to GitHub Pages via a GitHub App deployer with short-lived installation tokens. FastAPI and Celery orchestrate async task execution; Redis Pub/Sub streams live logs over WebSocket to a React and TypeScript dashboard; Prometheus, Grafana, and OpenTelemetry cover full observability. Full integration and unit test suite with pre-push hooks. FastAPI - Celery - Redis - Cohere - Anthropic API - React - TypeScript - Vite - PostgreSQL - SQLAlchemy - Alembic - Prometheus - Grafana - OpenTelemetry - Docker
Parkinson's Early Detection Multimodal early detection fusing MPU9250 IMU tremor signals with voice recordings. CNN on extracted voice features (88% accuracy), DNN on tremor data, combined via late fusion weighting to reach 91%. Includes a custom ESP32 hardware data collection pipeline from sensor to model inference. TensorFlow - Keras - Librosa - Parselmouth - scikit-learn - SoundDevice
Axon Core Production-deployable fully local tri-modal AI assistant. A BART-MNLI zero-shot semantic router dispatches queries across three processing paths: personal knowledge retrieval via Qdrant and local Gemma, OS-level tool execution with user confirmation, and general conversation. Hybrid RAG with MiniLM dense retrieval and BM25 sparse, reranked by a cross-encoder. GBNF grammar-constrained sampling for tool calling. Next.js frontend with FastAPI orchestrator, fully containerized. FastAPI - LangChain - Qdrant - Ollama - Next.js - Docker - SQLAlchemy
Helix Production-deployable recursive autonomous web agent built on the OODA loop. Playwright handles JS-heavy DOMs, Claude Tool Use synthesizes a Python solution just-in-time, RestrictedPython and SIGALRM sandbox execution, and HTTP submission loops until a terminal state is reached. Durable async jobs via ARQ on Redis with retry. Live run logs via SSE. Prometheus, Loki, and Grafana cover latency and throughput across worker containers. FastAPI - Playwright - Claude API - ARQ - Redis - Prometheus - Loki - Grafana - Docker
TruthTag: Toll-Audit Classical CV pipeline for cross-verifying digital RFID FASTag claims against physical vehicle geometry at toll plazas. 3780-dimensional HOG feature vectors, LinearSVC trained on 24K+ images, 97% accuracy on multi-axle vehicle classification. Includes a cross-modal centroid tracker, MOG2 virtual tripwire, misclassification error analysis, and a simulated audit pipeline with a Streamlit operator dashboard. OpenCV - scikit-learn - HOG - LinearSVC - NumPy - Streamlit - Seaborn - Matplotlib

Active Research

Project Description Status
Canary Rover Autonomous mine inspection rover. PPO-trained locomotion in PyBullet (200K timesteps), ROS 2 sensor nodes for IMU, LiDAR, and BLDC encoders, full 3D visual simulation in NVIDIA Isaac Sim 5.1 with live sensor feeds, SLAM via slam_toolbox and RTAB-Map. Crack detection, gas sensing, spark-proof chassis. Team of 5. Ongoing -- Capstone
MRI Reconstruction Dual-branch physics-guided deep learning framework for accelerated MRI reconstruction. A learned gating network routes k-space information adaptively, removing the need for anatomy labels at inference. Achieves +1.78% SSIM over single-branch baselines at 112ms per inference and 302 GFLOPs on an RTX 4060. Paper authored
Audio-Visual-Thermal SAR Multimodal fusion architecture for autonomous search and rescue in visually degraded environments. Three sensing modalities fused for robust SLAM and object detection, under Dr. Ankit Soni at Thapar. Review paper authored; follow-up work in progress. Review paper authored

Tech Stack

Languages Python C++ C SQL Bash JavaScript TypeScript

Robotics & Edge ROS 2 NVIDIA Jetson Isaac Sim PyBullet Stable Baselines3 Gymnasium Gazebo Eclipse Zenoh MATLAB CUDA

AI & Computer Vision PyTorch PyTorch Lightning TensorFlow Keras OpenCV YOLOv8 LangChain Hugging Face Scikit-learn NumPy Pandas SciPy Librosa Parselmouth einops Matplotlib Seaborn Plotly

Backend & Databases FastAPI Flask Next.js React PostgreSQL SQLAlchemy Alembic Qdrant Playwright Prometheus Streamlit WebSocket Pydantic

DevOps & Infra Docker Kubernetes GitHub Actions Linux AWS Vercel Anaconda


GitHub Stats

GitHub Stats

Streak Stats


Currently working on: Physics-guided deep learning for medical imaging -- RL-based autonomous navigation -- Multimodal sensor fusion for SAR -- Beyond-transformer sequence architectures

Pinned Loading

  1. trajlens trajlens Public

    The quality and synthesis layer for the open robot-learning data ecosystem

    Python 65 6

  2. MRI_Reconstruction MRI_Reconstruction Public

    Python 3

  3. archon archon Public

    Python 2

  4. helix-agent helix-agent Public

    TypeScript 2

  5. eshaansingla/ParkinsonsEarlyPrediction eshaansingla/ParkinsonsEarlyPrediction Public

    Python 1