Jul 20, 2026

Test It, See the Mistakes, Make It Better

Your digit recognizer scores high — but a number isn’t the whole story. A real machine-learning engineer looks at what the model got wrong and asks how to do better. Today you’ll watch your AI make predictions, spot its mistakes, and try to improve it.

💡 In Colab. Start by training the model from last lesson:

from sklearn.datasets import load_digits
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
digits = load_digits()
X, y = digits.data, digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
model = KNeighborsClassifier()
model.fit(X_train, y_train)

Watch it predict

Ask the model to label the test images, then look at a few — the picture, what the AI guessed, and the true answer:

predictions = model.predict(X_test)

for i in range(5):
    plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
    plt.title("AI guessed " + str(predictions[i]) + " — really " + str(y_test[i]))
    plt.show()

Each test example is 64 numbers; .reshape(8, 8) folds them back into an 8×8 picture so you can see what the AI saw.

Hunt for the mistakes

Mostly it’s right — but find the ones it missed:

for i in range(len(X_test)):
    if predictions[i] != y_test[i]:
        plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
        plt.title("AI said " + str(predictions[i]) + ", but it's " + str(y_test[i]))
        plt.show()

Look closely at the ones it got wrong. Often you’d struggle too — a sloppy 4 that looks like a 9, a 3 that could be an 8. The AI’s mistakes usually happen on genuinely confusing, messy digits. That’s reassuring: it’s failing where the data is hard, not randomly.

Try it 🎯

How many did it get wrong out of the test set? Count them:

wrong = 0
for i in range(len(X_test)):
    if predictions[i] != y_test[i]:
        wrong = wrong + 1
print("Got", wrong, "wrong out of", len(X_test))

Try to do better

Two simple ways to push accuracy up:

# 1. Try a different number of neighbors
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
print("With 3 neighbors:", model.score(X_test, y_test))

# 2. Give it MORE training data (smaller test set)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
model.fit(X_train, y_train)
print("With more training data:", model.score(X_test, y_test))

More examples and the right settings usually help. This tuning — “what makes it better?” — is a big part of real machine-learning work.

Think about it 🔮

The AI’s mistakes are mostly messy, hard-to-read digits. Why is that actually a good sign? (It means the model learned the real pattern — it only trips on genuinely ambiguous cases, the same ones that would fool a person, rather than failing on clear, easy digits.)

Your mission 🚀

Investigate your recognizer: show 5 of its mistakes as pictures, count how many it got wrong total, then try to beat your accuracy by changing n_neighbors and test_size. Record your best score and what settings produced it.

What you learned today

Don’t stop at the score — look at the actual mistakes.
.reshape(8, 8) turns a row of pixels back into a viewable image.
A good model’s errors cluster on genuinely hard, messy examples.
More data and better settings (n_neighbors) can improve accuracy — tuning is real ML work.

Next time is the capstone — train a model of your own, and have an important conversation about using AI wisely. 🤖