Skip to content

HackYourFuture/data-assignment-week-3

Repository files navigation

Data Track — Week 3 Assignment

Build a Validated Ingestion Pipeline · Total: 100 points · Passing: 60


Why no task folders?

Previous assignments split work across task-1/, task-2/, etc. This assignment drops that structure intentionally. Real Python projects keep all related modules at the root — you navigate by reading the code, not by opening numbered folders.

Every file you need to touch is listed below, in the order you should work through them.


Where to start

Work through the files in this order. Each one maps to a task in the assignment chapter.

Step File Task in the chapter Points
1 models.py Task 4: Pydantic Validation
2 ingest_api.py Task 1: Error Handling + Task 2: API Ingestion
3 ingest_files.py Task 3: File Reading
4 validate.py Task 4: Pydantic Validation
5 database.py Task 5: Database Storage
6 pipeline.py Task 6: Pipeline Orchestration 70 total
7 output/azure_compare.md Task 7: Azure CLI + Portal 15
8 AI_DEBUG.md Task 8: AI Debug Report 15

Open each file and read the docstrings and TODO comments — they explain exactly what to implement. Start with models.py and ingest_api.py; pipeline.py is the last thing you wire together.


Repository layout

.
├── data/
│   └── weather_stations.csv        # input dataset — do not edit
├── output/
│   ├── azure_compare.md            # Task 7: fill in your 3 comparison sentences
│   └── azure_resource_groups.json  # Task 7: generated by your Python script
├── models.py          # Step 1 — Pydantic model (Task 4)
├── ingest_api.py      # Step 2 — fetch_with_retry + API call (Tasks 1–2)
├── ingest_files.py    # Step 3 — CSV reader (Task 3)
├── validate.py        # Step 4 — batch validation (Task 4)
├── database.py        # Step 5 — SQLite tables + upsert (Task 5)
├── pipeline.py        # Step 6 — orchestrator that calls everything (Task 6)
├── AI_DEBUG.md        # Step 8 — your debugging log (Task 8)
├── requirements.txt
├── .env.example
├── .hyf/
│   └── test.sh        # auto-grader — read this to see exactly what is checked
└── .github/workflows/
    └── grade-assignment.yml

Files the pipeline generates at runtime (gitignored):

  • weather.db — SQLite database
  • output/error_report.json — invalid records from validation

Run the pipeline

python3 -m pip install -r requirements.txt
python3 -m pipeline

Check your score locally

Run the same grader the auto-grader runs on every PR push:

bash .hyf/test.sh
cat .hyf/score.json

Scoring ladder (Tasks 1–6)

Points are awarded incrementally so partial work earns partial credit:

Score What the grader checks
10/70 All required files exist
20/70 python3 -m pipeline runs without crashing
40/70 output/error_report.json is a valid list with the right fields; weather.db has rows
50/70 Pipeline is idempotent: a second run leaves the same row count (upsert working)
70/70 Code uses: @field_validator + @classmethod in models.py, ? placeholders in database.py, ON CONFLICT upsert in database.py, time.sleep backoff in ingest_api.py

For instructors / track maintainers

This repo is the upstream template. At the start of each cohort, generate a cohort repo under HackYourAssignment (Use this template → Create a new repository, owner = HackYourAssignment, name = c<NN>-data-week3). Students fork that cohort repo and open PRs back to it; the auto-grader runs on every push.

Edits to the assignment, dataset, or grader belong here on the template — not on cohort copies.

👩‍🎓 Students: if you landed here, you are in the wrong place. Go to your cohort repo under HackYourAssignment. Your teacher posts the exact link in your cohort channel.

About

HackYourFuture data track week 3 assignment files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors