Embedded Systems Engineer → Systems Architect | Avionics · Autonomous Systems · Safety-Critical Firmware
M.Sc. Scientific Computing — Elite Network of Bavaria, University of Bayreuth
Munich / Bavaria, Germany
I build systems where failure is not recoverable and timing is not negotiable.
Three years across radar imaging, autonomous UAV stacks, and HPC numerical solvers have converged on a single discipline: designing deterministic software architectures across the full hardware-software boundary from register-level silicon up to distributed RTOS microkernel design.
My work lives in the intersection of control theory, real-time OS internals, and embedded compute constraints. I don't integrate systems, I architect them from the physics upward.
Apache NuttX · Bare-Metal C · STM32F411 · HIL Validation
A hard real-time flight stack proving that deterministic avionics behavior is achievable at the microcontroller level without a full flight OS.
- 707 µs hard deadline met consistently via asynchronous I/O decoupling and deliberate scheduler partitioning
- 7.5% SRAM footprint (9.9 KB) through zero-dynamic-allocation architecture
- Custom NuttX character drivers written from scratch to expose raw IMU telemetry to a 7×7 Error-State Kalman Filter with 4th-order Runge-Kutta integration
- Autonomous ToF-driven ground-proximity failsafe with deterministic trigger latency
- Full HIL validation pipeline for sensor-actuator timing verification
→ Codebase & Technical Architecture
Pure C++17 · ARM Cortex-M4/M7 · Zero-Allocation · Active Development
A constrained Model Predictive Control engine built from first principles — no libraries, no heap, no compromises on WCET.
- Warm-started Fast Gradient Method (FGM) with Nesterov acceleration for a 4-state, 1-input reaction-wheel inverted pendulum
- Compile-time matrix dimension resolution via C++17 templates (
std::array); zero runtime overhead, zero heap fragmentation - In-place Cholesky decomposition solver operating entirely on the stack, bypassing matrix inversion in real-time
- Memory layout maps solver kernel to ITCM, state matrices to DTCM for zero-wait-state processing
- AHB bus matrix isolation of display pipeline to eliminate CPU cache contention
QNX RTOS 8.0 · Raspberry Pi 4 · STM32F411 · POSIX VFS
A distributed AMP architecture bridging a microkernel RTOS host with a bare-metal edge controller; modeled after production automotive gateway patterns.
- STM32 edge node mounted directly into the QNX microkernel filesystem at
/dev/asg_sensorvia a formal POSIX VFS Resource Manager - ASIL-aware Independent Watchdog (IWDG) system with graceful degradation state machine
- POSIX UDP telemetry link for real-time network-layer diagnostics
→ Architecture & LaTeX Technical Report
Bare-Metal C · STM32F411CEU6 · DMA · No OS
A complete port of the 1992 Wolfenstein 3D raycasting engine onto bare-metal ARM: no OS, no framebuffer, 128 KB SRAM.
- "Slice & Stream" rendering pipeline: CPU computes column geometry in fixed-point trigonometry, DMA asynchronously transfers 16bpp RGB565 columns to ST7789 display
- Eliminates global framebuffer entirely — physics within silicon constraints
- Raw ADC analog joystick and timer-driven PWM audio through a piezo speaker
| Layer | Technologies |
|---|---|
| RTOS & Kernels | QNX 8.0, Apache NuttX, FreeRTOS (AMP/SMP), Zephyr, Embedded Linux (Yocto/Buildroot) |
| Languages | C, C++17/20 (bare-metal & STL), CUDA C++, Python, SystemVerilog |
| Silicon | STM32 Cortex-M4/M7, ESP32 (Xtensa Dual-Core), NVIDIA Jetson TX2, RISC-V, RPi 4 |
| Protocols | CAN, SPI, I2C, UART, USB, MAVLink, Micro-ROS (XRCE-DDS), UDP/TCP, LoRa |
| Control & Estimation | MPC, EKF/ESKF, Runge-Kutta Integration, Fixed-Point Math, Sensor Fusion |
| HPC | CUDA, OpenMP, MPI, H-Matrix arithmetic (AHMED library) |
| Safety Standards | ISO 26262, DO-178C, IEC 61508 — domain awareness across automotive, avionics, industrial |
| Tools | Git, CMake, Docker, Gazebo, MATLAB, Logic Analyzers, Oscilloscopes |
Research Assistant — HPC & Numerical Methods University of Bayreuth | Nov 2023 – Present
Optimizing Hierarchical Matrix (H-Matrix) arithmetic for GPU-accelerated physics solvers (CUDA C++). Achieved 52% memory compression and 2× speedup over CPU baseline. Bridges dense and sparse linear algebra for large-scale simulations.
Embedded Software Developer — Radar / SAR Imaging Hyspace Technologies | Oct 2022 – Oct 2023
Architected real-time radar signal pipelines on NVIDIA Jetson TX2. Delivered a 10× latency reduction (13s → 1.3s) via CUDA parallelization, DMA, and Network-on-Chip optimization. Sustained 680 Mbps continuous throughput across 6-channel multispectral double-precision sensor streams using zero-copy CPU/GPU memory interfaces.
Embedded Systems & Mechanical Engineer QSON Technologies | Feb 2022 – Sept 2022
Full-cycle firmware development and hardware bring-up on STM32 and AVR platforms. Custom low-level drivers across UART, I2C, SPI.
M.Sc. Scientific Computing — University of Bayreuth, Germany (Oct 2023 – Sept 2026) Elite Network of Bavaria | Focus: HPC, Numerical Methods, GPU-Accelerated Solvers
B.E. Mechanical Engineering — University of Mumbai (2019) 8.82 / 10 | Control Systems, Mechatronics, Machine Design
Published research: Springer (IISc) · IEEE ICNGIS
Open to Werkstudent, and full-time roles in avionics, autonomous systems, space, or defense. German: B2 | Immediate availability | Munich / Bavaria
"Correctness at 707 microseconds. Every cycle accounted for."



