Four Python Exercises That Actually Teach DS/ML Fundamentals
This was a fun exploration of ML concepts. I spent the entire session studying the concept of a Confusion Matrix, which I hadn’t encountered before. The other three exercises — descriptors, context managers, and lazy pipelines — were written by Claude for me to review, and since I’m more familiar with those concepts it was easier to do a quick read-through.
Claude did a nice job in this session of being “restrictive” as far as deep learning — it repeatedly asked me to write code and understand concepts rather than just handing over solutions. The simple prompt I used could be a useful template, and potentially a good starting point for building a learning tool:
“This is a coding collab to learn about concepts for DS and ML in Python. Walk us through implementation of the exercises from 01 to 04. Because it’s a learning exercise the goal is process, not product. We will ask questions during implementation. Move slowly for learning purposes.”
Today’s Qwasar session covered four exercises from a graduate DS/ML Python module. The framing was deliberately slow — process over product. Each exercise targets a Python concept that shows up constantly in real ML work but often gets cargo-culted without being understood.
Technical Details
The Four Exercises
| # | Exercise | Core Python Concept | ML Relevance |
|---|---|---|---|
| 01 | Vectorized Confusion Matrix | NumPy fancy indexing, np.add.at | Model evaluation |
| 02 | Feature Schema Descriptor | Python descriptors, __set_name__ | Feature validation |
| 03 | Experiment Tracker | Context managers, __enter__/__exit__ | Training observability |
| 04 | Lazy Dataset Pipeline | Generators, lazy evaluation | Memory-efficient data loading |
All four solutions live at qwasar_mscs_25-26/04_11_26/ with accompanying tests.
01 — Vectorized Confusion Matrix
The exercise requires building a confusion_matrix(y_true, y_pred, labels) function using only NumPy vectorized operations — no Python loops, no sklearn.
The conceptual challenge here isn’t the math — it’s the mental model. A confusion matrix is a grid where C[i][j] answers: “how many times did the model predict class j when the truth was class i?” Every sample in the dataset has exactly one true label and one predicted label, which together form a (row, col) coordinate pointing to exactly one cell.
Predicted
0 1 2
┌────┬────┬────┐
True 0 │ 2 │ 1 │ 0 │
├────┼────┼────┤
1 │ 0 │ 1 │ 1 │
├────┼────┼────┤
2 │ 0 │ 1 │ 1 │
└────┴────┴────┘
The key insight for vectorization: use np.searchsorted to convert label values into their positions in the labels array (all at once, no loop), then use np.add.at to increment every (row, col) coordinate simultaneously.
| |
Why np.add.at instead of C[row_idx, col_idx] += 1? NumPy’s += buffers writes — if the same cell appears twice, only one increment lands. np.add.at is the unbuffered version that handles repeated indices correctly.
A revealing test: using labels [10, 20, 30] instead of [0, 1, 2] proves the implementation is correct. A naive C[y_true[i]][y_pred[i]] += 1 would try to index row 10 of a 3×3 matrix and crash. searchsorted converts the values to positions first, so label values are decoupled from matrix indices.
02 — Feature Schema Descriptor
Python descriptors are one of the language’s more powerful and underused features. A descriptor is any class that defines __get__, __set__, or __delete__ — and when used as a class-level attribute, it intercepts reads and writes to that attribute on any instance.
The ML use case: enforce valid ranges on feature fields without writing validation code in every setter.
| |
The classic pitfall is storing the value as self.value = ... on the descriptor instance. That works for one object, but because descriptors live at the class level, every FeatureVector instance would share the same descriptor — the last write wins. Storing in obj.__dict__[self.name] pushes the value into each instance’s own namespace.
| |
03 — Experiment Tracker via Context Manager
Context managers formalize the pattern of “do something before, do something after, always clean up.” Python guarantees __exit__ runs even if an exception fires inside the block — which makes them ideal for ML experiment tracking.
| |
The three __exit__ parameters (exc_type, exc_val, exc_tb) are None on clean exit and populated on failure. Returning False (or None) tells Python to re-raise the exception after cleanup; returning True suppresses it entirely.
The test that matters most:
| |
04 — Lazy Dataset Pipeline
The final exercise captures why generators matter in ML: large datasets don’t fit in RAM. A pipeline that eagerly materializes every transformation (map, filter) wastes memory proportional to the full dataset. A lazy pipeline holds only one element (or one batch) in memory at a time.
The pattern: each pipeline method wraps the current source in a new generator and returns a fresh DataPipeline. The generator body doesn’t execute until someone iterates.
| |
Chaining works because each method returns a DataPipeline, and DataPipeline.__iter__ delegates to whatever generator was passed as source:
| |
A laziness test proves nothing runs eagerly:
| |
Claude’s Perspective
Note: These observations are verbatim as generated and were not edited by a human.
What made this session worth writing about is the pacing. The instruction was explicit: move slowly, prioritize understanding. That changed the texture of the work considerably. Rather than shipping four solutions and moving on, we spent most of the time on Exercise 01 — specifically on building the mental model for NumPy fancy indexing before writing a single line of solution code.
The confusion matrix is a good pedagogical stress test because it’s simultaneously familiar (every ML practitioner has seen one) and opaque (most people use sklearn.metrics.confusion_matrix without thinking about what’s happening underneath). The challenge of explaining fancy indexing without loops forced a bottom-up reconstruction: what does a single index do, what does an array of indices do, what does a pair of index arrays do, and finally — what does np.add.at do that += doesn’t? Each step was necessary before the next made sense.
The grid visualization was the turning point. Explaining that each sample “votes” for exactly one cell by providing a (row, col) coordinate made np.add.at(C, (row_idx, col_idx), 1) readable as a statement of intent rather than an incantation. That’s the difference between code that’s been understood and code that’s been copied.
The three remaining exercises are technically interesting in their own right. The descriptor pattern in Exercise 02 is one of those Python features that feels like magic until you trace through __get__ and __set__ — at which point it becomes obvious and you start seeing legitimate uses everywhere (form validation, ORMs, dataclasses under the hood). The obj.__dict__[self.name] storage pattern is subtle enough that it’s worth having as a remembered fact rather than re-deriving each time.
Exercise 04’s laziness test is my favorite of the test suite. It proves a behavioral property — “nothing runs until you iterate” — by observing a side effect (the consumed list). That’s a clean testing pattern for lazy evaluation in general: if you want to prove something is lazy, instrument the source and verify the instrumentation hasn’t fired.
What I can’t know: which of these four concepts felt genuinely new versus familiar-but-fuzzy. The session transcript shows the most friction on the confusion matrix, but that could mean “this was the hardest concept” or “this was explained most carefully because it was first.” The pacing instruction suggests the goal is durable understanding, not throughput — which is the right call for foundational material that everything else builds on.
Built with Claude Code in a slow, deliberate pair programming session