20 years of AUS course data, one SQLite file.
Warning
Do not run the crawler unless you know what you are doing. The crawler makes tens of thousands of requests to AUS Banner and can easily overwhelm the server if misconfigured, which can result in service disruption and get you in trouble with the university. A pre-built database (aus_courses.db) is already included in this repository with a complete snapshot of all course data since 2005 — use that instead.
AUSCrawl is a fast, async web crawler that scrapes AUS Banner for course data across every semester since 2005 and stores it in an SQLite database. But more importantly, it ships a ready-to-use database (as a release download) so you never have to run the crawler yourself.
Written in Python. Single file. ~15 minutes for a full crawl of 74,000+ course sections, catalog descriptions, prerequisites, and more.
Every release ships aus_courses.db (gzipped, ~16 MB), a complete SQLite database containing every course, instructor, prerequisite, and catalog description from AUS Banner since Fall 2005. Just download it, gunzip, and start building. (It's distributed as a release asset rather than committed to the repo so clones stay small.)
| Table | Records | Description |
|---|---|---|
courses | 75,467 | Every course section ever offered |
course_dependencies | 156,512 | Prerequisite/corequisite links with minimum grades |
section_details | 73,778 | Prerequisites, corequisites, restrictions, waitlist, fees (+ structured JSON) |
section_instructors | 72,476 | Every instructor per section (incl. co-taught), with primary flag |
catalog | 3,046 | Course descriptions, credit/lecture/lab hours |
catalog_detail | 3,532 | Course-level attributes (degree-requirement tags), schedule types, levels |
instructors | 1,987 | All instructors with emails and first appearance |
semesters | 101 | Every term from Fall 2005 to the present |
subjects | 98 | All subject codes (COE, ENG, MTH, etc.) |
attributes | 231 | Course attributes |
levels | 9 | Academic levels (Undergraduate, Graduate, etc.) |
This dataset is a goldmine for AUS students. Use it to help your fellow students or sharpen your own skills:
- Prerequisite visualizer — build an interactive graph of course dependencies for your major
- Schedule planner — help students find open sections that fit their timetable
- Instructor tracker — see which professors teach what, and how their assignments changed over the years
- Course trend analysis — which courses are offered less frequently? Which departments are growing?
- Grade requirement explorer — find every course that requires a minimum grade of C- or higher
- Data science projects — 20 years of course data across 98 subjects is a great dataset for learning SQL, pandas, or building dashboards
If you build something with this data, open an issue and let us know — we'd love to see it.
# Download the latest database (compressed, ~16 MB) from Releases
curl -L -o aus_courses.db.gz \
https://github.com/DeadPackets/AUSCrawl/releases/latest/download/aus_courses.db.gz
gunzip aus_courses.db.gz
# Open it with sqlite3
sqlite3 aus_courses.db
# Or use Python
python3 -c "
import sqlite3
conn = sqlite3.connect('aus_courses.db')
for row in conn.execute('SELECT term_name, COUNT(*) FROM courses JOIN semesters ON courses.term_id = semesters.term_id GROUP BY courses.term_id ORDER BY courses.term_id DESC LIMIT 5'):
print(row)
"-- All courses taught by a specific instructor
SELECT term_id, subject, course_number, title, days, start_time, end_time
FROM courses WHERE instructor_name LIKE '%Smith%'
ORDER BY term_id DESC;
-- Courses with prerequisites and minimum grades
SELECT d.subject, d.course_number, d.dep_type, d.minimum_grade,
sd.prerequisites
FROM course_dependencies d
JOIN section_details sd ON sd.crn = d.crn AND sd.term_id = d.term_id
WHERE d.dep_type = 'prerequisite'
GROUP BY d.subject, d.course_number;
-- How many sections per semester
SELECT s.term_name, COUNT(*) as sections
FROM courses c JOIN semesters s ON c.term_id = s.term_id
GROUP BY c.term_id ORDER BY c.term_id;
-- Course catalog with hours breakdown
SELECT subject, course_number, description, credit_hours, lecture_hours, lab_hours
FROM catalog WHERE subject = 'COE';
-- Find all prerequisites for a specific course
SELECT d.subject, d.course_number, d.minimum_grade
FROM course_dependencies d
JOIN courses c ON c.crn = d.crn AND c.term_id = d.term_id
WHERE c.subject = 'COE' AND c.course_number = '390'
GROUP BY d.subject, d.course_number;The SQLite database contains 13 normalized tables with proper indexes:
Core tables:
semesters— term ID and name (e.g.202620,Spring 2026)subjects— subject codes and full names (e.g.COE,Computer Engineering)courses— every course section with schedule, instructor, classroom, etc.instructors— deduplicated instructor names and emails withfirst_seenlevels— academic levels (Undergraduate, Graduate, etc.)attributes— course attributes withfirst_seen
Extended tables:
catalog— course descriptions, credit/lecture/lab hours, departmentcatalog_detail— course-level attributes (degree-requirement tags), schedule types, levels, and catalog-level prerequisites/corequisites/restrictionssection_details— prerequisites, corequisites, restrictions, waitlist, fees per section, plus structuredprerequisites_json/corequisites_json(boolean AND/OR expression trees) andrestrictions_json(typed include/exclude groups)section_instructors— every instructor on each section, including co-taught ones, with anis_primaryflagcourse_dependencies— flat prerequisite/corequisite links with minimum grade requirements
AUS uses Ellucian Banner, a student information system widely deployed across universities. The public-facing schedule search is served at banner.aus.edu behind Cloudflare, exposing several OWA (Oracle Web Agent) endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/axp3b21h/owa/bwckschd.p_disp_dyn_sched |
GET | Semester dropdown — returns all available term IDs |
/axp3b21h/owa/bwckgens.p_proc_term_date |
POST | Subject listing — returns available subjects for a given term |
/axp3b21h/owa/bwckschd.p_get_crse_unsec |
POST | Course search — returns HTML tables of all matching sections |
/axp3b21h/owa/bwckctlg.p_display_courses |
GET | Course catalog — returns descriptions, credit hours, department |
/axp3b21h/owa/bwckctlg.p_disp_course_detail |
GET | Course detail — returns course-level attributes, schedule types, prerequisites |
/axp3b21h/owa/bwckschd.p_disp_detail_sched |
GET | Section detail — returns prerequisites, corequisites, restrictions, waitlist, fees |
The course search endpoint accepts all subject codes in a single POST body (up to ~4,500 bytes before the WAF rejects it), returning a large HTML page with <table class="datadisplaytable"> rows. Instructor emails are obfuscated using Cloudflare's email protection (XOR encoding with the first byte as key). The crawler paces the GET endpoints with a global token-bucket rate limiter (AIMD around ~18–25 req/s); in practice 429 responses begin around ~30 req/s, well below the documented stream limits (~10,000 HTTP/2 streams per connection). Actual enrollment/seat counts are not exposed by AUS Banner (only waitlist figures), even for completed terms.
Caution
Only run the crawler if you need fresher data than what's in the included database. Be aware that aggressive crawling can take down AUS Banner and result in your IP being banned. The default settings are tuned to be safe, but modifying worker counts or running multiple instances simultaneously can cause problems.
Click to expand crawler docs
Python 3.13+ and uv.
uv run python crawl.py [options]
| Flag | Description |
|---|---|
-o, --output |
SQLite output path (default: aus_data.db) |
-t, --terms |
Only crawl specific term IDs (e.g. 202620 202510) |
-w, --workers |
Max concurrent requests (default: 50) |
--rate |
Target GET requests/sec; AIMD ceiling that paces the catalog/detail phases (default: 18, backs off on 429s). Raise to go faster, lower for extra safety |
--delay |
Extra seconds to pause before each request (default: 0; pacing is normally handled by --rate) |
--latest |
Only crawl the most recent semester |
--resume |
Skip semesters already in the database |
--force |
Drop and recreate all tables |
--no-catalog |
Skip catalog description scraping |
--no-details |
Skip section detail scraping |
-v, --verbose |
Debug-level logging |
The crawler runs in 5 phases:
- Semester discovery — fetches the list of all available terms from Banner's dropdown
- Subject catalog — fetches subject codes from every semester and deduplicates (the dropdown varies per term)
- Course scraping — POSTs to the schedule search endpoint for every semester with all subjects in a single batch, then parses the HTML response with lxml (50 concurrent workers)
- Catalog scraping — GETs course catalog pages for a sample of 6 evenly-spaced terms to collect descriptions, hours, and departments (10 concurrent workers)
- Detail scraping — GETs the section detail page for every unique CRN/term pair to extract prerequisites, corequisites, restrictions, waitlist info, and fees (10 concurrent workers)
- Async HTTP/2 via
httpxwith connection pooling and automatic retry with exponential backoff - lxml for HTML parsing (12x faster than BeautifulSoup)
- ThreadPoolExecutor offloads CPU-bound parsing from the async event loop
- Catalog sampling reduces catalog requests by ~80% while maintaining full course coverage
- Cloudflare email protection decoding (XOR-obfuscated instructor emails)
- Crash resilience — each phase saves to DB immediately; detail phase does periodic batch saves every 5,000 entries;
--resumeskips completed work - Rate-limit aware — respects server 429 responses with exponential backoff; GET endpoints capped at 10 workers to avoid triggering bans
Built for AUS students, by an AUS student.
MIT License