Fall 2019
Wednesday, November 20
Last time, we saw that a single small step along the negative gradient at WT=0 takes us to a set of weights resembling the class-averages of the images, which performs quite well: 90+% accuracy.
Exercise 1: Let's see what happens if we try to make it better by more steps of gradient descent.
Starting point for today's coding
Exercise 2: Make appropriate plots of performance changes during training.
Specifically: make plots of loss and accuracy on training and test images vs step (epoch) number
Build confusion matrix?
The loss on our training set can continue to get smaller without actual improvement in performance. What's happening and how can we fix it?
example: L2 penalty on W
ignore random selections of nodes during learning
Can be more efficient to use just a subset of the images for each step.
epoch = one cycle through all the mini-batches
Automation of network construction and computation of gradient.
If you haven't already:
conda install tensorflow
Exercise 3: Let's reproduce our "from-scratch" network code using Tensorflow (2.0).
In case we want to do experiments on a larger training set of handwritten characters (digits), there is this:
Download the 4 gz files at http://yann.lecun.com/exdb/mnist/
There are modules for reading the MNIST data, but we have had difficulty with them in the past. Here is low level code that works, which you can copy and paste this code to load MNIST data into your notebook.