Structure Queries lets you interact with any web page via text or voice. It is built for dense web content such as research papers, articles, blog posts, and technical documentation.
Using the app requires a Samsar account created at app.samsar.one. Privacy Policy: structurequeries.samsar.one/privacy
The current implementation combines:
- Firecrawl for adaptive page crawling and extraction
- a cleanup pipeline for embedding-ready text
samsar-jsfor page embeddings, similarity search, retrieval, and assistant completions- ElevenLabs for transcription, voice selection, preview, and TTS playback
The repo is a monorepo with:
client/: Chrome extension (Manifest V3) built with TypeScript + esbuildclient/public/: public web client assets served by the backend at/web-clientserver/: Node.js API built with TypeScript + Express
End-to-end product flow from page preparation to text or voice reply.
Firecrawl crawl and cleanup.
ElevenLabs transcription and TTS loop.
Backend embedding, retrieval, and assistant flow.
POST /api/webpages/analyzeruns the full prepare flow.- Firecrawl is used through
server/src/lib/url-embedding-crawl.ts. - The crawl is seed-page-first: the source page is scraped before deeper crawling is attempted.
- Crawl depth is capped at
5levels byFIRECRAWL_CRAWL_LEVELS. - Crawl breadth is capped by
FIRECRAWL_MAX_LINKSand the prepare-page credit budget. - Child-link crawling only happens when the seed page is too short to stand on its own.
- The current adaptive threshold is
1200words for the primary page. - The current adaptive child-link cap is
5prioritized links. - Firecrawl requests are constrained to main content, with external links and subdomains disabled, and query parameters ignored.
cleanEmbeddingSourceText()inserver/src/lib/embedding-text-cleanup.tsprepares crawl output for embeddings.- It normalizes line endings, removes zero-width characters, and decodes common HTML entities.
- Markdown image blocks are removed.
- Markdown links are reduced to anchor text.
- Heading markers, list markers, quote markers, and inline markdown wrappers are stripped at the line level.
- Boilerplate lines such as
skip to content,table of contents,menu,navigation, andsearchare filtered out. - Consecutive duplicate lines are removed.
- Large blank gaps are collapsed into cleaner paragraph spacing.
- Cleaned page records are sent to
samsarAdapter.generateEmbeddingsFromPlainText(...). - The current backend iteration uses OpenAI embeddings for embedding creation and similarity comparison.
- Similarity lookup runs through
samsarAdapter.similarToEmbedding(...). - Retrieval lookup runs through
samsarAdapter.searchEmbeddings(...). - Retrieved chunks are converted into a grounding block before the assistant call.
- Current defaults are
8similarity candidates and6retrieved chunks.
- The grounded chatbot implementation lives in
server/src/lib/chat-agent.ts. - After similarity and retrieval, grounded messages are sent to
samsarAdapter.createExternalAssistantCompletion(...). - The current backend iteration uses GPT-5.4 internally for the RAG query engine.
- The response path supports grounded text replies and optional image generation metadata through the same assistant surface.
- The realtime voice gateway is exposed at
/ws/plugin. - User audio is transcribed with ElevenLabs before the grounded assistant request is made.
- Assistant replies can be synthesized back into audio with ElevenLabs TTS.
GET /api/voicesis used to populate remote ElevenLabs voice choices.- The current voice list request fetches up to
100ElevenLabs voices. - The UI currently exposes
6selectable non-auto TTS languages:en,es,fr,de,hi,pt GET /api/voices/previewis used for voice-preview playback.
- The Chrome extension and public web client both use the same analysis and chat backend.
- The public web client is served at
GET /web-client. - The backend also exposes an OpenAI-compatible surface at
POST /v1/chat/completions. - The proxy remains stateless: browser installs keep their
externalUserApiKey,assistantSessionId, andtemplateIdclient-side, while assistant state lives upstream.
npm install
npm run dev:serverIn a second terminal:
STRUCTUREDQUERIES_SERVER_ORIGIN=http://localhost:3000 npm run dev:clientThen:
- open the public web client at
http://localhost:3000/web-client - load the unpacked extension from
client/distin Chrome
The extension build defaults to https://structurequeries.samsar.one.
Set STRUCTUREDQUERIES_SERVER_ORIGIN before building or watching the client if you want a different backend origin, for example:
STRUCTUREDQUERIES_SERVER_ORIGIN=http://localhost:3000 npm run dev:clientFor a production build:
STRUCTUREDQUERIES_SERVER_ORIGIN=https://structurequeries.samsar.one npm run build:clientclient/
public/ Static extension assets such as manifest and popup HTML
scripts/ Build/watch tooling for the extension
src/ Popup, background service worker, and content script
server/
src/ Express app, routes, connectors, adapters, and RAG flow logic
assets/
*.png Latest splash and workflow visuals
- npm workspaces for
clientandserver - scripts:
npm run dev:servernpm run dev:clientnpm run buildnpm run check
The client workspace currently uses:
@types/chrome^0.1.38chokidar^4.0.3esbuild^0.27.4typescript^5.9.3
Runtime notes:
- the extension is bundled for
chrome120 - the public web client is plain static HTML/CSS/JS under
client/public - the public web client is served by the backend and does not have a separate package manifest
The server workspace currently uses:
@elevenlabs/elevenlabs-js^2.39.0@mendable/firecrawl-js^4.16.0cors^2.8.6dotenv^17.3.1express^5.2.1mongodb^6.21.0mongoose^8.23.0samsar-js^0.48.12ws^8.19.0
Server development dependencies:
@types/cors^2.8.19@types/express^5.0.6@types/node^25.5.0@types/ws^8.18.1tsx^4.21.0typescript^5.9.3
This project currently relies on three paid third-party APIs for its core features:
- Firecrawl via
@mendable/firecrawl-js: firecrawl.dev - Samsar via
samsar-js: app.samsar.one - ElevenLabs via
@elevenlabs/elevenlabs-js: elevenlabs.io/app/home
You will need active provider accounts and API keys for the integrations you want to run locally.
Copy server/.env.example to server/.env and fill in the required integrations.
Create a Samsar account at app.samsar.one and generate a SAMSAR_API_KEY before running the server locally.
Required local server env:
PORT=3000
SAMSAR_API_KEY=your_samsar_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_keyOptional server env already supported by the current code:
NODE_ENV=development
DOTENV_CONFIG_PATH=
ELEVENLABS_DEFAULT_VOICE_ID=
ELEVENLABS_DEFAULT_MODEL_ID=eleven_multilingual_v2
SAMSAR_PUBLIC_API_BASE_URL=https://api.samsar.one
APP_NAME=test
CURRENT_ENV=development
FIRECRAWL_API_URL=https://api.firecrawl.dev
FIRECRAWL_CRAWL_LEVELS=5
FIRECRAWL_MAX_LINKS=10
FIRECRAWL_POLL_INTERVAL_SECONDS=5
FIRECRAWL_TIMEOUT_SECONDS=120Firecrawl local/self-hosted note:
- the server uses
@mendable/firecrawl-js - by default it targets
https://api.firecrawl.dev - by default the proxy allows up to
10pages across up to5discovery levels - the seed page is preferred first and child-link credits are only spent when the seed page is too short
- if you run Firecrawl locally, set
FIRECRAWL_API_URLto your local Firecrawl base URL and setFIRECRAWL_API_KEYto the key expected by that instance - webpage analysis endpoints require both
FIRECRAWL_API_KEYandSAMSAR_API_KEY
- build output:
client/dist - default local backend origin:
http://localhost:3000when you setSTRUCTUREDQUERIES_SERVER_ORIGIN=http://localhost:3000 - main flows use
POST /api/browser-sessions,POST /api/browser-sessions/register,POST /api/chat-completion,GET /api/webpages/status,POST /api/webpages/analyze, andGET /api/voices
- served by the backend at
GET /web-client - static assets come from
client/public - websocket gateway:
/ws/plugin - the public client uses the same analysis, voice, and session APIs as the extension
The current server runtime uses:
- ElevenLabs via
@elevenlabs/elevenlabs-js - Firecrawl via
@mendable/firecrawl-js - Samsar via
samsar-js
Key backend entry points:
server/src/connectors/: raw client and connection factoriesserver/src/adapters/: higher-level wrappers on top of those connectorsserver/src/lib/url-embedding-crawl.ts: Firecrawl crawl, scrape, cleanup, and record-building flowserver/src/lib/embedding-text-cleanup.ts: markdown and HTML cleanup for embedding-ready textserver/src/lib/chat-agent.ts: similarity lookup, retrieval, grounding block assembly, and assistant completionserver/src/stack.ts: backend registry and manifestGET /api/stack: runtime-visible stack manifest and configuration statusGET /api/voices: ElevenLabs voice list surfaceGET /api/voices/preview: ElevenLabs preview proxyPOST /api/browser-sessions: stateless browser-session syncPOST /api/browser-sessions/register: explicit external-user registration for the extension install flowPOST /api/chat-completion: proxy chat endpoint with grounded retrieval metadataGET /api/webpages/status: webpage embedding status lookupPOST /api/webpages/analyze: Firecrawl-backed webpage analysis and embedding ingestionPOST /v1/chat/completions: OpenAI-compatible text completion surface over the same grounded backend flow/ws/plugin: realtime transcription, assistant, and TTS loop
Copy the values you need from server/.env.example into your local server env before using the provider adapters.




