MTH 448/563 Data-Oriented Computing

Fall 2019

Day 24 = Day -5

Monday, November 18

Back-propagation of gradient for simplest network, cont'd

Summary of notation:

c       number of classes
j,k     general class indices
n       number of training images
i       general image index
hw      number of pixels
q       general pixel index
WT     array of weights (c by hw)
X       array of flattened images (hw by n)
S       array of image class scores (c by n)  = WTX
P       array of image class "probabilities" (output of softmax) (c by n)
l       array of individual image losses (length n)
L       overall (scalar) loss

Numpy implementation of loss and gradient


1. Write code that computes the loss and also the gradient via back-prop.

  • Let's make it as vectorized as possible.
  • Let's develop the code with small random data.

2. Check it for correctness by comparing with finite difference approximation.

3. Train the network by gradient descent.

A few guidelines:

  • invert images in X
  • normalize X by its sum, say, to avoid plugging large numbers into exponential function
  • start with all weights 0 to avoid prejudice
  • a journey begins with a single step - see where a single step takes us
  • helpful to picture the rows of $W^T$


Prep for Wednesday:

conda install tensorflow