Test It, See the Mistakes, Make It Better
Your digit recognizer scores high — but a number isn’t the whole story. A real machine-learning engineer looks at what the model got wrong and asks how to do better. Today you’ll watch your AI make predictions, spot its mistakes, and try to improve it.
💡 In Colab. Start by training the model from last lesson:
from sklearn.datasets import load_digits from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt digits = load_digits() X, y = digits.data, digits.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) model = KNeighborsClassifier() model.fit(X_train, y_train)
Watch it predict
Ask the model to label the test images, then look at a few — the picture, what the AI guessed, and the true answer:
predictions = model.predict(X_test)
for i in range(5):
plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
plt.title("AI guessed " + str(predictions[i]) + " — really " + str(y_test[i]))
plt.show()
Each test example is 64 numbers; .reshape(8, 8) folds them back into an 8×8 picture so you can see what the AI saw.
Hunt for the mistakes
Mostly it’s right — but find the ones it missed:
for i in range(len(X_test)):
if predictions[i] != y_test[i]:
plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
plt.title("AI said " + str(predictions[i]) + ", but it's " + str(y_test[i]))
plt.show()
Look closely at the ones it got wrong. Often you’d struggle too — a sloppy 4 that looks like a 9, a 3 that could be an 8. The AI’s mistakes usually happen on genuinely confusing, messy digits. That’s reassuring: it’s failing where the data is hard, not randomly.
Try it 🎯
How many did it get wrong out of the test set? Count them:
wrong = 0
for i in range(len(X_test)):
if predictions[i] != y_test[i]:
wrong = wrong + 1
print("Got", wrong, "wrong out of", len(X_test))
Try to do better
Two simple ways to push accuracy up:
# 1. Try a different number of neighbors
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
print("With 3 neighbors:", model.score(X_test, y_test))
# 2. Give it MORE training data (smaller test set)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
model.fit(X_train, y_train)
print("With more training data:", model.score(X_test, y_test))
More examples and the right settings usually help. This tuning — “what makes it better?” — is a big part of real machine-learning work.
Think about it 🔮
The AI’s mistakes are mostly messy, hard-to-read digits. Why is that actually a good sign? (It means the model learned the real pattern — it only trips on genuinely ambiguous cases, the same ones that would fool a person, rather than failing on clear, easy digits.)
Your mission 🚀
Investigate your recognizer: show 5 of its mistakes as pictures, count how many it got wrong total, then try to beat your accuracy by changing n_neighbors and test_size. Record your best score and what settings produced it.
What you learned today
- Don’t stop at the score — look at the actual mistakes.
.reshape(8, 8)turns a row of pixels back into a viewable image.- A good model’s errors cluster on genuinely hard, messy examples.
- More data and better settings (
n_neighbors) can improve accuracy — tuning is real ML work.
Next time is the capstone — train a model of your own, and have an important conversation about using AI wisely. 🤖