This repository contains a demo and practice set for a short lesson on importing different data types into Python (primarily with Pandas).
importing-data-class/
├─ data/
│ ├─ demo/ # instructor-led
│ └─ practice/ # student exercise
├─ notebooks/
│ ├─ 01_demo.ipynb
│ └─ 02_exercises.ipynb
├─ src/
│ └─ util.py
├─ requirements.txt
└─ README.md
uv init
uv add ipykernel pandas lxml openpyxl- CSV/TSV —
pd.read_csvwithsep,usecols,dtype,nrows,compression. - Excel —
pd.read_excelone sheet vssheet_name=Nonefor all sheets. - JSON — Python's
json+pd.json_normalizefor nested data. - Plain text —
open()/ context manager &.read()vs.readlines(). - HTML tables —
pd.read_html()returns a list of DataFrames; select one by index or withmatch=/attrs=.
Open notebooks/02_exercises.ipynb and complete the TODO cells:
- Load
practice/tsv/air_quality.tsvas a DataFrame (tab-delimited), setdateas the index. - Read
practice/json/events.json, flatten attendees to one row per person. - Count lines that contain
"ERROR"inpractice/text/log.txt. - Read all tables from
practice/html/wiki_table.htmland select the one with idawards. - Load
practice/csv_gz/movies.csv.gz(gzipped) and compute the averageimdb_rating. - (Optional) Read
practice/excel/sales_regions.xlsxand join both sheets onregion.
- If Excel engines are unavailable in your environment, the Excel files may be omitted; install
openpyxlto enable reading/writing.xlsx. - Everything is small and self-contained so it works without internet access.