Data Track — Week 3 Assignment

Build a Validated Ingestion Pipeline · Total: 100 points · Passing: 60

Why no task folders?

Previous assignments split work across task-1/, task-2/, etc. This assignment drops that structure intentionally. Real Python projects keep all related modules at the root — you navigate by reading the code, not by opening numbered folders.

Every file you need to touch is listed below, in the order you should work through them.

Where to start

Work through the files in this order. Each one maps to a task in the assignment chapter.

Step	File	Task in the chapter	Points
1	`models.py`	Task 4: Pydantic Validation	—
2	`ingest_api.py`	Task 1: Error Handling + Task 2: API Ingestion	—
3	`ingest_files.py`	Task 3: File Reading	—
4	`validate.py`	Task 4: Pydantic Validation	—
5	`database.py`	Task 5: Database Storage	—
6	`pipeline.py`	Task 6: Pipeline Orchestration	70 total
7	`output/azure_compare.md`	Task 7: Azure CLI + Portal	15
8	`AI_DEBUG.md`	Task 8: AI Debug Report	15

Open each file and read the docstrings and TODO comments — they explain exactly what to implement. Start with models.py and ingest_api.py; pipeline.py is the last thing you wire together.

Repository layout

.
├── data/
│   └── weather_stations.csv        # input dataset — do not edit
├── output/
│   ├── azure_compare.md            # Task 7: fill in your 3 comparison sentences
│   └── azure_resource_groups.json  # Task 7: generated by your Python script
├── models.py          # Step 1 — Pydantic model (Task 4)
├── ingest_api.py      # Step 2 — fetch_with_retry + API call (Tasks 1–2)
├── ingest_files.py    # Step 3 — CSV reader (Task 3)
├── validate.py        # Step 4 — batch validation (Task 4)
├── database.py        # Step 5 — SQLite tables + upsert (Task 5)
├── pipeline.py        # Step 6 — orchestrator that calls everything (Task 6)
├── AI_DEBUG.md        # Step 8 — your debugging log (Task 8)
├── requirements.txt
├── .env.example
├── .hyf/
│   └── test.sh        # auto-grader — read this to see exactly what is checked
└── .github/workflows/
    └── grade-assignment.yml

Files the pipeline generates at runtime (gitignored):

weather.db — SQLite database
output/error_report.json — invalid records from validation

Run the pipeline

python3 -m pip install -r requirements.txt
python3 -m pipeline

Check your score locally

Run the same grader the auto-grader runs on every PR push:

bash .hyf/test.sh
cat .hyf/score.json

Scoring ladder (Tasks 1–6)

Points are awarded incrementally so partial work earns partial credit:

Score	What the grader checks
10/70	All required files exist
20/70	`python3 -m pipeline` runs without crashing
40/70	`output/error_report.json` is a valid list with the right fields; `weather.db` has rows
50/70	Pipeline is idempotent: a second run leaves the same row count (upsert working)
70/70	Code uses: `@field_validator` + `@classmethod` in `models.py`, `?` placeholders in `database.py`, `ON CONFLICT` upsert in `database.py`, `time.sleep` backoff in `ingest_api.py`

For instructors / track maintainers

This repo is the upstream template. At the start of each cohort, generate a cohort repo under HackYourAssignment (Use this template → Create a new repository, owner = HackYourAssignment, name = c<NN>-data-week3). Students fork that cohort repo and open PRs back to it; the auto-grader runs on every push.

Edits to the assignment, dataset, or grader belong here on the template — not on cohort copies.

👩‍🎓 Students: if you landed here, you are in the wrong place. Go to your cohort repo under HackYourAssignment. Your teacher posts the exact link in your cohort channel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Track — Week 3 Assignment

Why no task folders?

Where to start

Repository layout

Run the pipeline

Check your score locally

Scoring ladder (Tasks 1–6)

For instructors / track maintainers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.hyf		.hyf
data		data
output		output
.env.example		.env.example
.gitignore		.gitignore
AI_DEBUG.md		AI_DEBUG.md
AZURE_LOGIN.md		AZURE_LOGIN.md
README.md		README.md
database.py		database.py
ingest_api.py		ingest_api.py
ingest_files.py		ingest_files.py
models.py		models.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
validate.py		validate.py

Folders and files

Latest commit

History

Repository files navigation

Data Track — Week 3 Assignment

Why no task folders?

Where to start

Repository layout

Run the pipeline

Check your score locally

Scoring ladder (Tasks 1–6)

For instructors / track maintainers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages