Skip to content

noQuli/gitsteal

Repository files navigation

GitSteal

⚠️ DISCLAIMER

This project is for authorized security research and defensive auditing only. It is intended to help owners identify and protect their own exposed secrets. Do not use it illegally or against systems you do not own or are not explicitly permitted to assess.

The project authors are not responsible for any consequences resulting from misuse, unauthorized scanning, or other unlawful activity.

GitSteal is a small Python utility that searches GitHub for repositories matching one or more keywords and then runs TruffleHog against the discovered repository URLs. It can help identify exposed API keys in authorized environments.

Features

  • Search GitHub commits by keyword using the GitHub Search API
  • Deduplicate repository URLs before scanning
  • Run TruffleHog in parallel with configurable concurrency
  • Persist scanned repositories to avoid rescanning the same URLs
  • Save verified findings to JSON for later review
  • Prune the scan history automatically when it grows large

Requirements

  • Python 3.14 or newer
  • A GitHub personal access token with access to the search API
  • TruffleHog installed and available on your PATH

Setup

  1. Install dependencies with uv:

    uv sync
  2. Create a .env file in the project root and add your GitHub token:

    GITHUB_API_KEY=your_token_here
  3. Add search keywords to key_words.txt, one per line.

Usage

Run the scanner from the project root:

uv run main.py

If you are using the virtual environment directly, activate it first and then run:

python main.py

Command-line options

  • --start-page: first GitHub search page to fetch, default 1
  • --last-page: last GitHub search page to fetch, default 3
  • --concurrency: number of parallel TruffleHog scans, default 10
  • --keywords: path to the keyword file, default key_words.txt

Example

uv run main.py --start-page 1 --last-page 5 --concurrency 6 --keywords key_words.txt

Input format

The keyword file supports one keyword per line. Empty lines are ignored, and lines starting with # are treated as comments.

Example:

OPENAI_API_KEY
GEMINI_API_KEY

Output files

The scanner writes local state to the project root:

  • scanned_urls.txt — repository URLs that have already been processed
  • trufflehog_results.json — verified findings returned by TruffleHog

These files are ignored by Git so local scan state does not get committed.

How it works

  1. Load keywords from key_words.txt
  2. Query GitHub for repositories related to each keyword
  3. Remove repositories that have already been scanned
  4. Run TruffleHog against each new repository URL
  5. Save verified findings and update scan history

Safety and authorization

Use this tool only on repositories and accounts you own or are explicitly authorized to assess. Do not use it to collect secrets from third-party systems or repositories without permission.

For additional legal and operational details, see DISCLAIMER.md.

Project structure

  • main.py — scanner implementation and CLI entry point
  • key_words.txt — sample keyword list
  • DISCLAIMER.md — usage and legal notice
  • LICENSE — project license text
  • pyproject.toml — project metadata and dependencies

License

This project is licensed under the terms of the LICENSE file.

Notes

  • The code currently uses GitHub’s search API and may be affected by rate limits.
  • TruffleHog must be installed separately; this project only orchestrates it.
  • Results depend on the keywords you provide and the repositories returned by GitHub search.

About

Scan GItHub for leaked API keys

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages