MTH 448/563 Data-Oriented Computing

Fall 2019

Day 25 = Day -4

Wednesday, November 20

Nice videos on neural networks from 3Blue1Brown


Ch 1 What *is* a neural network?

Ch 2 Gradient Descent

Ch 3 Back Propagation

Training our network

Last time, we saw that a single small step along the negative gradient at WT=0 takes us to a set of weights resembling the class-averages of the images, which performs quite well: 90+% accuracy.

Exercise 1: Let's see what happens if we try to make it better by more steps of gradient descent.

Starting point for today's coding

Exercise 2: Make appropriate plots of performance changes during training.

Specifically: make plots of loss and accuracy on training and test images vs step (epoch) number

Build confusion matrix?


The loss on our training set can continue to get smaller without actual improvement in performance. What's happening and how can we fix it?


example: L2 penalty on W


ignore random selections of nodes during learning

stochastic gradient descent

Can be more efficient to use just a subset of the images for each step.

epoch = one cycle through all the mini-batches


Automation of network construction and computation of gradient.

If you haven't already:

conda install tensorflow

Exercise 3: Let's reproduce our "from-scratch" network code using Tensorflow (2.0).

MNIST data set

In case we want to do experiments on a larger training set of handwritten characters (digits), there is this:

Download the 4 gz files at

There are modules for reading the MNIST data, but we have had difficulty with them in the past. Here is low level code that works, which you can copy and paste this code to load MNIST data into your notebook.