Did It Actually Learn?
You trained a model last time — but is it any good? Here’s a trap: if you test it on the exact penguins it learned from, of course it does well. That’s like giving a student the test answers to study, then giving them the same test. Real testing uses questions the model hasn’t seen. Today you learn the honest way to measure a model: split your data, and check its accuracy.
💡 In Colab. scikit-learn is built in.
Split: study set and test set
The trick is to hold some examples back. The model learns from a training set and is tested on a separate test set it never saw. train_test_split does this for you:
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
penguins = sns.load_dataset("penguins").dropna()
X = penguins[["bill_length_mm", "flipper_length_mm", "body_mass_g"]]
y = penguins["species"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
test_size=0.2 keeps 20% of the penguins aside for testing and trains on the other 80%. Now the test is fair — the model has never seen those penguins.
Train, then score
Train on the training set, then ask for the accuracy on the test set:
model = KNeighborsClassifier()
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))
.score(...) returns the fraction the model got right on the unseen penguins — a number between 0 and 1. A score of 0.95 means it correctly identified 95 out of 100 test penguins. Run it and see your number.
What accuracy means
1.0= perfect (got every test example right).0.5on a two-way choice = no better than a coin flip.- Higher is better, but perfect is rare and a little suspicious with real data.
The honest score comes from the test set. A model that aces its training examples but flops on new ones hasn’t really learned the pattern — it just memorized. Testing on unseen data catches that.
Try it 🎯
- Change
test_size=0.2to0.3(hold back 30%). Does the accuracy change much? - Add a fourth feature — include
"bill_depth_mm"inX. Retrain and check the accuracy.
Think about it 🔮
Why is it unfair to test the model on the same penguins it trained on? (Because it could just memorize them and look perfect, without learning a pattern that works on new penguins. The test set checks real understanding, not memory.)
Fix the bug 🐞
This trains and tests, but on the same data — so the score is misleadingly high. Fix it to test on the held-out set:
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_train, y_train))
(It’s scoring on X_train — the data it studied. Score on the unseen test set instead: model.score(X_test, y_test).)
Your mission 🚀
Train the penguin model with a train/test split and print its accuracy. Then try to improve the score: experiment with different features in X (add or remove measurements) and a different test_size. Note which combination gives the best honest accuracy.
What you learned today
- Never test a model on the data it trained on — that’s cheating.
train_test_splitholds back a test set the model never sees.model.score(X_test, y_test)gives accuracy — the fraction right on unseen data.- Higher is better; suspiciously perfect usually means something’s off.
Next time you build a complete “guesser” from scratch — train, test, and try it on your own examples. 🔮