Name
Ayush Shekhar
GitHub Handle
@ayushh0110
Tell us about yourself
hey, I'm Ayush — ML engineer from India. I like building things with local LLMs.
Built ToolForge (fine-tuned a tool router with QLoRA), an autonomous AI agent that's live in production, and now ScreenMind which kinda happened because I got curious if Gemma 4 could do vision + audio + reasoning all at once on a cheap GPU.
I also contribute to LiteLLM, LlamaIndex, and Instructor when I run into bugs while using them. always down to talk about local AI and open source.
Project Name
ScreenMind
Project Repo Link
https://github.com/ayushh0110/ScreenMind.git
Stream Date
Dates
No response
Twitter URL
https://x.com/ayushh_ss
LinkedIn URL
https://www.linkedin.com/in/ayushhss/
Additional Information
ScreenMind got some traction recently — featured on AlternativeTo as a Recall alternative, hit the front page of Hacker News, and was part of the Gemma 4 Challenge on dev.to.
tech stack: Python, llama.cpp (via llama-cpp-python), Gemma 4, Flask, SQLite, MiniLM embeddings. runs on Windows and Linux (X11 + Wayland).
main features: intelligent screen capture with perceptual hash dedup, natural language search over your screen history, conversational RAG chat, meeting transcription using Gemma 4's audio encoder, a no-code agent builder (plain english or python), MCP server with 8 tools for Claude/Cursor/VS Code, built-in Model Hub to swap Gemma variants based on your GPU, and Notion/Obsidian sync.
the thing I think would make for a good conversation is the "one model does everything" angle — most similar tools stack Whisper + a vision model + a language model separately, but ScreenMind runs the entire pipeline on a single Gemma 4 model on as low as 4GB VRAM.
Name
Ayush Shekhar
GitHub Handle
@ayushh0110
Tell us about yourself
hey, I'm Ayush — ML engineer from India. I like building things with local LLMs.
Built ToolForge (fine-tuned a tool router with QLoRA), an autonomous AI agent that's live in production, and now ScreenMind which kinda happened because I got curious if Gemma 4 could do vision + audio + reasoning all at once on a cheap GPU.
I also contribute to LiteLLM, LlamaIndex, and Instructor when I run into bugs while using them. always down to talk about local AI and open source.
Project Name
ScreenMind
Project Repo Link
https://github.com/ayushh0110/ScreenMind.git
Stream Date
Dates
No response
Twitter URL
https://x.com/ayushh_ss
LinkedIn URL
https://www.linkedin.com/in/ayushhss/
Additional Information
ScreenMind got some traction recently — featured on AlternativeTo as a Recall alternative, hit the front page of Hacker News, and was part of the Gemma 4 Challenge on dev.to.
tech stack: Python, llama.cpp (via llama-cpp-python), Gemma 4, Flask, SQLite, MiniLM embeddings. runs on Windows and Linux (X11 + Wayland).
main features: intelligent screen capture with perceptual hash dedup, natural language search over your screen history, conversational RAG chat, meeting transcription using Gemma 4's audio encoder, a no-code agent builder (plain english or python), MCP server with 8 tools for Claude/Cursor/VS Code, built-in Model Hub to swap Gemma variants based on your GPU, and Notion/Obsidian sync.
the thing I think would make for a good conversation is the "one model does everything" angle — most similar tools stack Whisper + a vision model + a language model separately, but ScreenMind runs the entire pipeline on a single Gemma 4 model on as low as 4GB VRAM.