Skip to content
View ayushkumarshah's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@RIT-SWEN610-F21

Block or report ayushkumarshah

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ayushkumarshah/README.md

Hi there 👋, I'm Ayush Kumar Shah — Research Scientist at Meta.

  • 🔬 At Meta (Menlo Park, CA), I’m on the Applied AI (AAI) org, building the data, evaluation, and tooling systems that power training and post-training of next-generation large-scale language and multimodal models.
  • 🎥 Previously at Meta, I was on the Media Foundation Video team (AI), designing transformer- and diffusion-based models for video enhancement (super-resolution, restoration, denoising), compression, quality optimization, and accessibility across the Family of Apps (Facebook, Instagram, WhatsApp, Messenger), and collaborating with the Meta Superintelligence Lab on large vision models (LVMs) for video generation, curation, and multimodal understanding.
  • 🎓 I completed my Ph.D. in Computing and Information Sciences at Rochester Institute of Technology (RIT) in 2025, researching at the Document and Pattern Recognition Lab (DPRL) under Dr. Richard Zanibbi. My dissertation — “Parsing of Math Formulas and Chemical Diagrams using Graph-Based Representation and Attention Models” — designed fast, efficient, and interpretable parsers for recognizing math formulas and chemical diagrams across PDFs, typeset images, and handwritten strokes, using graph attention in a multi-task learning framework.
  • 🌐 Research interests: pattern recognition, recognition of graphical structures, computer vision, video understanding & generation, speaker understanding, large language models, multi-modal deep learning, and natural language processing.
  • ✍️ I write blog posts reflecting my learnings, mostly on Python and AI.

Website   Download CV

Email LinkedIn Google Scholar GitLab ORCID

🔭 Research (Ph.D. @ RIT DPRL)

  • ChemScraper — the first parser to extract molecular diagrams directly from born-digital PDF graphics, with no OCR, GPU, or vectorization. It also generates large annotated datasets to train visual parsers for raster (pixel-based) molecule images. Adopted by Pfizer R&D (Groton, CT) for internal document analysis. (IJDAR 2024 · ICDAR 2024 oral)
  • Multimodal Chemical Search (ReactionMiner) — a multimodal system for searching chemical reactions, molecular structures, and text in scientific literature, linking visual and textual representations into structured “reaction cards.” (SIGIR ’25)
  • MathDeck — a math-aware search system for the ACL Anthology combining formula and text queries through an intuitive formula “chip” interface. (SIGIR ’23)
  • MathSeer / Math Formula Extraction — formula detection and recognition from PDFs without OCR, via a multi-task ResNet-50 with line-of-sight graph-based attention. Includes LgEval, a graph-based evaluation framework now widely used in the document recognition community. (ICDAR 2021)

🔗 See all on my projects page.

💼 Previously

  • Applied Scientist Intern, Amazon — Alexa Speaker Understanding AI (2022): improved speaker identification accuracy by 5% with semi-supervised learning while cutting training time by 80%, and built clustering-based annotation pipelines that labeled 10M+ hours of speech data.
  • Machine Learning Engineer, Fusemachines (2019–2020): built handwritten text recognizers, recommendation systems, and AI education course materials.

📚 Publications

  • 📄 A. K. Shah et al. “Multimodal Search in Chemical Documents and Reactions.”  SIGIR 2025
  • 📄 A. K. Shah et al. “ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing.”  IJDAR 2024
  • 📄 A. K. Shah and R. Zanibbi. “Line-of-Sight with Graph Attention Parser (LGAP) for Math Formulas.”  ICDAR 2023
  • 📄 A. K. Shah, A. Dey, and R. Zanibbi. “A Math Formula Extraction and Evaluation Framework for PDF Documents.”  ICDAR 2021

📖 Full list on my publications page.

Pinned Loading

  1. Guitar-Chords-recognition Guitar-Chords-recognition Public

    An application that predicts the chords when melspectrograms of guitar sound is fed into a CNN.

    Python 139 46

  2. autocar autocar Public

    A self-driving car that can detect lanes, stop sign, traffic light and avoid a collision, built using Canny edge detection, Hough transform, Haar cascade classifier, and Arduino programming.

    Python 5 1

  3. AI-Plays-GTA5 AI-Plays-GTA5 Public

    A bike-riding agent in a virtual environment (GTA5), built using CNN, used for simulating self-driving vehicles.

    Python 7

  4. Deep-Learning-Nanodegree-Udacity Deep-Learning-Nanodegree-Udacity Public

    This repository contains all the projects that I submitted during the completion of the Deep Learning Nano Degree provided by Udacity.

    Jupyter Notebook 2 1

  5. Nepali_Plagiarism_Detection Nepali_Plagiarism_Detection Public

    An application which detects plagiarised Devanagari text files using a self built rule based stemming algorithm and Cosine similarity.

    Jupyter Notebook 6 3

  6. SLR-Parser SLR-Parser Public

    A SLR_Parser which costructs canonical collection of LR(0) items and SLR Parsing table and also parses a given input string.

    Python 5 2