Skip to content

maxliebscher/DocumentContextExtractor

Repository files navigation

Document Context Extractor

Offline Windows tool for finding context around keyword or regex matches in local documents.

The app searches a document, expands a smart context block around every hit, previews the result safely, and exports the matches to DOCX, Markdown, or TXT.

Repository: https://github.com/maxliebscher/DocumentContextExtractor

Screenshots

Click any screenshot to open it full-size.

Writing workflow

General workflow Dark Violet writing Soft Slate writing
General workflow Dark Violet writing workflow Soft Slate writing workflow

Features

  • Runs locally on Windows; no browser or cloud account required
  • Input formats:
    • Word .docx
    • Markdown .md, .markdown
    • Plain text .txt
    • HTML .html, .htm
    • Rich Text .rtf with basic text extraction
  • Regex or plain keyword search
  • Optional case-sensitive matching
  • Smart context expansion:
    • starts one paragraph before and after a hit
    • expands up and down until a blank paragraph or a # heading
    • optional min/max word guards, matching the original extraction behavior
  • Safe preview:
    • starts with the first matches only
    • More matches expands in chunks
    • All matches shows as much as the preview safety limit allows
    • optional match highlighting in the preview, on by default
    • export always includes all matches
  • Larger default window with a taller preview area
  • DPI-aware Windows Forms layout with scrolling fallback for smaller screens
  • Tooltips for the less obvious controls
  • Language selection:
    • German when the system language is German
    • English otherwise
    • manual switch in the header
  • Local-only About dialog:
    • no cloud uploads
    • no telemetry
    • no user accounts
  • Export formats:
    • Word .docx
    • Markdown .md
    • Plain text .txt
  • UI themes:
    • Calm Burgundy
    • Dark Violet
    • Soft Slate
  • Burgundy app icon based on a highlighted excerpt/document motif

Usage

  1. Download the latest Windows ZIP from the GitHub release page.

  2. Unzip it and start DocumentContextExtractor.exe.

  3. Drop a source file onto the window or choose Browse.

  4. Enter a keyword or regex, for example:

    contract|deadline|meeting
    
  5. Choose smart context, length, case, export format, language, and theme.

  6. Click Show preview or Extract and save.

Command Line

The same executable can also run without the UI:

.\DocumentContextExtractor.exe --input manuscript.docx --keyword "contract|deadline" --output extract.md

Options:

--smart true|false
--min NUMBER
--max NUMBER
--ignore-length
--case-sensitive

Privacy

Document Context Extractor runs locally. It does not upload documents, use telemetry, or require a cloud account.

Development

End users do not need the .NET SDK when using a self-contained release build. Developers need Windows and the .NET 8 SDK with Windows Desktop support.

Build from source:

dotnet build .\DocumentContextExtractor.csproj

Create a self-contained Windows x64 release build:

dotnet publish .\DocumentContextExtractor.csproj -c Release -r win-x64 --self-contained true /p:PublishSingleFile=true /p:EnableCompressionInSingleFile=true

License

MIT

About

Offline Windows tool for extracting context snippets from local documents

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors