Skip to content

damarff/upwork-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Upwork Job Scraper

Multi-platform job scraper for Upwork and Freelancer.com with automatic Cloudflare bypass. Extracts job listings via browser automation (DrissionPage) and API calls (curl_cffi), stores results in SQLite.

Features

  • Upwork scraping β€” DrissionPage + Chromium, bypasses Cloudflare JS challenges, extracts window.__NUXT__ state
  • Freelancer.com scraping β€” curl_cffi with Chrome TLS fingerprint impersonation, no browser needed
  • 20 keyword categories β€” auto-scrapes across AI, automation, scraping, and development niches
  • SQLite storage β€” deduplication, status tracking, filtering
  • CLI interface β€” python -m src.main <command>
  • REST API β€” optional FastAPI server for querying results

Quick Start

# Install
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Configure
cp config.yaml config.yaml  # edit keywords & delay

# Scrape Freelancer.com (fast, no browser)
python -m src.main freelancer

# Scrape Upwork (needs Chromium + display)
python -m src.main upwork

# Scrape a single job detail
python -m src.main detail <url>

# View results
python -m src.main stats
python -m src.main jobs

Structure

src/
β”œβ”€β”€ main.py              # CLI entry point
β”œβ”€β”€ config.py            # Config loader
β”œβ”€β”€ database.py          # SQLite operations
β”œβ”€β”€ job_filter.py        # Relevance filtering
β”œβ”€β”€ scrapers/
β”‚   β”œβ”€β”€ base.py          # Shared browser setup (DrissionPage)
β”‚   β”œβ”€β”€ upwork.py        # Upwork Nuxt extraction
β”‚   β”œβ”€β”€ freelancer.py    # Freelancer.com API (curl_cffi)
β”‚   └── job_detail.py    # Individual job detail scrape
β”œβ”€β”€ models/
β”‚   └── job.py           # Job dataclass + parsers
└── utils/
    β”œβ”€β”€ logger.py
    └── clean.py

Tech Stack

  • Python 3.10+
  • DrissionPage (Chromium browser automation)
  • curl_cffi (Chrome TLS fingerprint impersonation)
  • FastAPI (optional API server)
  • SQLite

About

πŸ•·οΈ Autonomous web scraper bypassing Cloudflare via DrissionPage. Extracts Upwork jobs with auto-categorization, SQLite storage, and Supabase sync. Python + FastAPI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors