Project: Build a Guesser
You now know every step of training a model. Today you put them together into a complete guesser — the standard machine-learning recipe, start to finish — and then make it your own.
💡 In Colab. The recipe is always the same five steps.
The five-step recipe
Every supervised machine-learning project follows the same shape. Here it is, end to end, predicting penguin species:
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
# 1. Get the data
penguins = sns.load_dataset("penguins").dropna()
# 2. Pick features (X) and the label (y)
X = penguins[["bill_length_mm", "flipper_length_mm", "body_mass_g"]]
y = penguins["species"]
# 3. Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 4. Train the model
model = KNeighborsClassifier()
model.fit(X_train, y_train)
# 5. Check how well it learned
print("Accuracy:", model.score(X_test, y_test))
Read it as a story: get data → choose clues and answer → hold some back → learn → test. That five-step shape is the backbone of almost every machine-learning project in the world.
Use your guesser
Once trained, predict new penguins (remember: a list-of-lists, even for one):
new_penguins = [
[39, 190, 3700],
[48, 215, 5000],
[46, 200, 4200],
]
print(model.predict(new_penguins))
It prints a species for each set of measurements. You built a working classifier!
Make it your own 🚀
Now build a guesser you choose. Options:
- Different label: predict
sexinstead ofspecies(sety = penguins["sex"]). Is it easier or harder? (Check the accuracy!) - Different features: try predicting species from just
["flipper_length_mm", "body_mass_g"]. Does accuracy drop? - Different dataset: load
tipsand predictdayfromtotal_billandsize, ortitanicand predictsurvived. (Pick number columns for the features.)
For any choice, follow the same five steps, print the accuracy, and predict a few of your own examples.
# your guesser — pick a dataset, features, and label, then run the 5 steps
Think about it 🔮
You try predicting sex from penguin measurements and the accuracy is much lower than predicting species. What might that tell you? (That measurements separate the species clearly, but males and females overlap a lot — the features just don’t carry that pattern as strongly. Low accuracy is a clue about the data, not a failure on your part.)
What you learned today
- Every ML project follows the five-step recipe: data → features & label → split → fit → score.
- Once trained,
model.predict([[...]])guesses new examples. - Changing the label, the features, or the dataset changes how well the model can do.
- A low accuracy is information — it tells you the features don’t capture that pattern.
Next time we tackle the kind of AI that started it all: recognizing images — and you’ll discover that, to a computer, a picture is just a grid of numbers. 🔢