MTH 448/563 Data-Oriented Computing

Fall 2019

Day 20 = Day -9

Monday, November 4

Remarks about Project 4: Star Recognizer

  • Such a pleasure not to have the distraction of misspelled words!
  • Vectorize code with numpy as much as possible. Deal with individual components as little as possible.
  • Favor "naked" numpy arrays over pandas dataframes unless the dataframe provides some real benefit.
  • Don't search an entire table to look up one row: reset index if necessary, or use a dict.
# FORBIDDEN!!!!!!!!!!!!!
for key in big_list_of_keys:
        myrow = df[ df['keycolumn']==key ].iloc[0]
        # This requires a full table search for each key!
        ...
# GOOD!
df.index = df['keycolumn']
for key in big_list_of_keys:
        myrow = df.loc[key]
        ...

Interpretation of data

Basic idea: devise a function from the set of all possible instances of a particular kind of data to a relatively small set of "interpretations".

E.g. xrays, car complaint collections, linearish features in an image

For discrete codomain (classification problem): feature engineering/extraction & separation sometimes used.

2 examples ...

Recognizing handwritten characters

Sample row

handwriting_row_f19.png

some_colorpngs.png

Exercise: What features would be useful?

Recognizing tree species from leaf

populus_deltoides_wb1164-07-2_cropped.jpg 13001931303503_ficus_carica.jpg

How LeafSnap works

Recognizing handwritten characters - let's do it

Download and unzip this zip file of your handwriting.

Let's see what we've got:

from PIL import Image
import glob
%pylab inline

pngs = sorted(glob.glob('pngs/*tong*.png'))
for png in pngs:
        print(png)
        img = Image.open(png)
        imshow(img)
        break

Exercise: Invent and compute some potentially useful features.


Do these work?

# Prevent scrolling boxes
%%javascript
IPython.OutputArea.auto_scroll_threshold = 200;

# Have Jupyter use full page width
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))

IPython.CodeCell.options_default.cm_config.autoCloseBrackets = false;

ringland@blue:~/.jupyter/custom$ cat custom.css
div.prompt { display: none; }
.container { width:100% !important; }