AUSCrawl

20 years of AUS course data, one SQLite file.

Warning

Do not run the crawler unless you know what you are doing. The crawler makes tens of thousands of requests to AUS Banner and can easily overwhelm the server if misconfigured, which can result in service disruption and get you in trouble with the university. A pre-built database (aus_courses.db) is already included in this repository with a complete snapshot of all course data since 2005 — use that instead.

What is this?

AUSCrawl is a fast, async web crawler that scrapes AUS Banner for course data across every semester since 2005 and stores it in an SQLite database. But more importantly, it ships a ready-to-use database (as a release download) so you never have to run the crawler yourself.

Written in Python. Single file. ~15 minutes for a full crawl of 74,000+ course sections, catalog descriptions, prerequisites, and more.

The Database

Every release ships aus_courses.db (gzipped, ~16 MB), a complete SQLite database containing every course, instructor, prerequisite, and catalog description from AUS Banner since Fall 2005. Just download it, gunzip, and start building. (It's distributed as a release asset rather than committed to the repo so clones stay small.)

Table	Records	Description
`courses`	75,467	Every course section ever offered
`course_dependencies`	156,512	Prerequisite/corequisite links with minimum grades
`section_details`	73,778	Prerequisites, corequisites, restrictions, waitlist, fees (+ structured JSON)
`section_instructors`	72,476	Every instructor per section (incl. co-taught), with primary flag
`catalog`	3,046	Course descriptions, credit/lecture/lab hours
`catalog_detail`	3,532	Course-level attributes (degree-requirement tags), schedule types, levels
`instructors`	1,987	All instructors with emails and first appearance
`semesters`	101	Every term from Fall 2005 to the present
`subjects`	98	All subject codes (COE, ENG, MTH, etc.)
`attributes`	231	Course attributes
`levels`	9	Academic levels (Undergraduate, Graduate, etc.)

Build Something Cool

This dataset is a goldmine for AUS students. Use it to help your fellow students or sharpen your own skills:

Prerequisite visualizer — build an interactive graph of course dependencies for your major
Schedule planner — help students find open sections that fit their timetable
Instructor tracker — see which professors teach what, and how their assignments changed over the years
Course trend analysis — which courses are offered less frequently? Which departments are growing?
Grade requirement explorer — find every course that requires a minimum grade of C- or higher
Data science projects — 20 years of course data across 98 subjects is a great dataset for learning SQL, pandas, or building dashboards

If you build something with this data, open an issue and let us know — we'd love to see it.

Getting Started

# Download the latest database (compressed, ~16 MB) from Releases
curl -L -o aus_courses.db.gz \
  https://github.com/DeadPackets/AUSCrawl/releases/latest/download/aus_courses.db.gz
gunzip aus_courses.db.gz

# Open it with sqlite3
sqlite3 aus_courses.db

# Or use Python
python3 -c "
import sqlite3
conn = sqlite3.connect('aus_courses.db')
for row in conn.execute('SELECT term_name, COUNT(*) FROM courses JOIN semesters ON courses.term_id = semesters.term_id GROUP BY courses.term_id ORDER BY courses.term_id DESC LIMIT 5'):
    print(row)
"

Example Queries

-- All courses taught by a specific instructor
SELECT term_id, subject, course_number, title, days, start_time, end_time
FROM courses WHERE instructor_name LIKE '%Smith%'
ORDER BY term_id DESC;

-- Courses with prerequisites and minimum grades
SELECT d.subject, d.course_number, d.dep_type, d.minimum_grade,
       sd.prerequisites
FROM course_dependencies d
JOIN section_details sd ON sd.crn = d.crn AND sd.term_id = d.term_id
WHERE d.dep_type = 'prerequisite'
GROUP BY d.subject, d.course_number;

-- How many sections per semester
SELECT s.term_name, COUNT(*) as sections
FROM courses c JOIN semesters s ON c.term_id = s.term_id
GROUP BY c.term_id ORDER BY c.term_id;

-- Course catalog with hours breakdown
SELECT subject, course_number, description, credit_hours, lecture_hours, lab_hours
FROM catalog WHERE subject = 'COE';

-- Find all prerequisites for a specific course
SELECT d.subject, d.course_number, d.minimum_grade
FROM course_dependencies d
JOIN courses c ON c.crn = d.crn AND c.term_id = d.term_id
WHERE c.subject = 'COE' AND c.course_number = '390'
GROUP BY d.subject, d.course_number;

Database Schema

The SQLite database contains 13 normalized tables with proper indexes:

Core tables:

semesters — term ID and name (e.g. 202620, Spring 2026)
subjects — subject codes and full names (e.g. COE, Computer Engineering)
courses — every course section with schedule, instructor, classroom, etc.
instructors — deduplicated instructor names and emails with first_seen
levels — academic levels (Undergraduate, Graduate, etc.)
attributes — course attributes with first_seen

Extended tables:

catalog — course descriptions, credit/lecture/lab hours, department
catalog_detail — course-level attributes (degree-requirement tags), schedule types, levels, and catalog-level prerequisites/corequisites/restrictions
section_details — prerequisites, corequisites, restrictions, waitlist, fees per section, plus structured prerequisites_json / corequisites_json (boolean AND/OR expression trees) and restrictions_json (typed include/exclude groups)
section_instructors — every instructor on each section, including co-taught ones, with an is_primary flag
course_dependencies — flat prerequisite/corequisite links with minimum grade requirements

Banner Technical Details

AUS uses Ellucian Banner, a student information system widely deployed across universities. The public-facing schedule search is served at banner.aus.edu behind Cloudflare, exposing several OWA (Oracle Web Agent) endpoints:

Endpoint	Method	Purpose
`/axp3b21h/owa/bwckschd.p_disp_dyn_sched`	GET	Semester dropdown — returns all available term IDs
`/axp3b21h/owa/bwckgens.p_proc_term_date`	POST	Subject listing — returns available subjects for a given term
`/axp3b21h/owa/bwckschd.p_get_crse_unsec`	POST	Course search — returns HTML tables of all matching sections
`/axp3b21h/owa/bwckctlg.p_display_courses`	GET	Course catalog — returns descriptions, credit hours, department
`/axp3b21h/owa/bwckctlg.p_disp_course_detail`	GET	Course detail — returns course-level attributes, schedule types, prerequisites
`/axp3b21h/owa/bwckschd.p_disp_detail_sched`	GET	Section detail — returns prerequisites, corequisites, restrictions, waitlist, fees

The course search endpoint accepts all subject codes in a single POST body (up to ~4,500 bytes before the WAF rejects it), returning a large HTML page with <table class="datadisplaytable"> rows. Instructor emails are obfuscated using Cloudflare's email protection (XOR encoding with the first byte as key). The crawler paces the GET endpoints with a global token-bucket rate limiter (AIMD around ~18–25 req/s); in practice 429 responses begin around ~30 req/s, well below the documented stream limits (~10,000 HTTP/2 streams per connection). Actual enrollment/seat counts are not exposed by AUS Banner (only waitlist figures), even for completed terms.

Crawler Documentation

Caution

Only run the crawler if you need fresher data than what's in the included database. Be aware that aggressive crawling can take down AUS Banner and result in your IP being banned. The default settings are tuned to be safe, but modifying worker counts or running multiple instances simultaneously can cause problems.

Click to expand crawler docs

Requirements

Python 3.13+ and uv.

Usage

uv run python crawl.py [options]

Flag	Description
`-o`, `--output`	SQLite output path (default: `aus_data.db`)
`-t`, `--terms`	Only crawl specific term IDs (e.g. `202620 202510`)
`-w`, `--workers`	Max concurrent requests (default: 50)
`--rate`	Target GET requests/sec; AIMD ceiling that paces the catalog/detail phases (default: 18, backs off on 429s). Raise to go faster, lower for extra safety
`--delay`	Extra seconds to pause before each request (default: 0; pacing is normally handled by `--rate`)
`--latest`	Only crawl the most recent semester
`--resume`	Skip semesters already in the database
`--force`	Drop and recreate all tables
`--no-catalog`	Skip catalog description scraping
`--no-details`	Skip section detail scraping
`-v`, `--verbose`	Debug-level logging

How It Works

The crawler runs in 5 phases:

Semester discovery — fetches the list of all available terms from Banner's dropdown
Subject catalog — fetches subject codes from every semester and deduplicates (the dropdown varies per term)
Course scraping — POSTs to the schedule search endpoint for every semester with all subjects in a single batch, then parses the HTML response with lxml (50 concurrent workers)
Catalog scraping — GETs course catalog pages for a sample of 6 evenly-spaced terms to collect descriptions, hours, and departments (10 concurrent workers)
Detail scraping — GETs the section detail page for every unique CRN/term pair to extract prerequisites, corequisites, restrictions, waitlist info, and fees (10 concurrent workers)

Technical Details

Async HTTP/2 via httpx with connection pooling and automatic retry with exponential backoff
lxml for HTML parsing (12x faster than BeautifulSoup)
ThreadPoolExecutor offloads CPU-bound parsing from the async event loop
Catalog sampling reduces catalog requests by ~80% while maintaining full course coverage
Cloudflare email protection decoding (XOR-obfuscated instructor emails)
Crash resilience — each phase saves to DB immediately; detail phase does periodic batch saves every 5,000 entries; --resume skips completed work
Rate-limit aware — respects server 429 responses with exponential backoff; GET endpoints capped at 10 workers to avoid triggering bans

_{Built for AUS students, by an AUS student.}
MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
crawl.py		crawl.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUSCrawl

What is this?

The Database

Build Something Cool

Getting Started

Example Queries

Database Schema

Banner Technical Details

Crawler Documentation

Requirements

Usage

How It Works

Technical Details

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AUSCrawl

What is this?

The Database

Build Something Cool

Getting Started

Example Queries

Database Schema

Banner Technical Details

Crawler Documentation

Requirements

Usage

How It Works

Technical Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages