- 🔬 At Meta (Menlo Park, CA), I’m on the Applied AI (AAI) org, building the data, evaluation, and tooling systems that power training and post-training of next-generation large-scale language and multimodal models.
- 🎥 Previously at Meta, I was on the Media Foundation Video team (AI), designing transformer- and diffusion-based models for video enhancement (super-resolution, restoration, denoising), compression, quality optimization, and accessibility across the Family of Apps (Facebook, Instagram, WhatsApp, Messenger), and collaborating with the Meta Superintelligence Lab on large vision models (LVMs) for video generation, curation, and multimodal understanding.
- 🎓 I completed my Ph.D. in Computing and Information Sciences at Rochester Institute of Technology (RIT) in 2025, researching at the Document and Pattern Recognition Lab (DPRL) under Dr. Richard Zanibbi. My dissertation — “Parsing of Math Formulas and Chemical Diagrams using Graph-Based Representation and Attention Models” — designed fast, efficient, and interpretable parsers for recognizing math formulas and chemical diagrams across PDFs, typeset images, and handwritten strokes, using graph attention in a multi-task learning framework.
- 🌐 Research interests: pattern recognition, recognition of graphical structures, computer vision, video understanding & generation, speaker understanding, large language models, multi-modal deep learning, and natural language processing.
- ✍️ I write blog posts reflecting my learnings, mostly on Python and AI.
- ChemScraper — the first parser to extract molecular diagrams directly from born-digital PDF graphics, with no OCR, GPU, or vectorization. It also generates large annotated datasets to train visual parsers for raster (pixel-based) molecule images. Adopted by Pfizer R&D (Groton, CT) for internal document analysis. (IJDAR 2024 · ICDAR 2024 oral)
- Multimodal Chemical Search (ReactionMiner) — a multimodal system for searching chemical reactions, molecular structures, and text in scientific literature, linking visual and textual representations into structured “reaction cards.” (SIGIR ’25)
- MathDeck — a math-aware search system for the ACL Anthology combining formula and text queries through an intuitive formula “chip” interface. (SIGIR ’23)
- MathSeer / Math Formula Extraction — formula detection and recognition from PDFs without OCR, via a multi-task ResNet-50 with line-of-sight graph-based attention. Includes LgEval, a graph-based evaluation framework now widely used in the document recognition community. (ICDAR 2021)
🔗 See all on my projects page.
- Applied Scientist Intern, Amazon — Alexa Speaker Understanding AI (2022): improved speaker identification accuracy by 5% with semi-supervised learning while cutting training time by 80%, and built clustering-based annotation pipelines that labeled 10M+ hours of speech data.
- Machine Learning Engineer, Fusemachines (2019–2020): built handwritten text recognizers, recommendation systems, and AI education course materials.
- 📄 A. K. Shah et al. “Multimodal Search in Chemical Documents and Reactions.”
- 📄 A. K. Shah et al. “ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing.”
- 📄 A. K. Shah and R. Zanibbi. “Line-of-Sight with Graph Attention Parser (LGAP) for Math Formulas.”
- 📄 A. K. Shah, A. Dey, and R. Zanibbi. “A Math Formula Extraction and Evaluation Framework for PDF Documents.”
📖 Full list on my publications page.


