Fall 2019

Wednesday, November 6

Download and unzip this cleaned up collection of images: zip file of your handwriting.

Where we got to on Monday, with a couple of improvements

pngs = sorted(glob.glob('handwriting_f19/pngs/*.png'))#[:5] # new cleaned pngs folder after class n = len(pngs) features = ['ink','log aspect','lr-asymmetry'] d = len(features) F = np.empty((d,n)) x = arange(0,w) # linspace(0,w,w,endpoint=False) y = arange(0,h) # linspace(0,h,h,endpoint=False) X,Y = np.meshgrid(x,y) for k,png in enumerate( pngs ): #print(png) img = Image.open(png) #imshow(img) a = np.array(img) a = a[:,:,0] # get just one layer- they are all the same a = 255 - a # invert so character is high values ink = a.sum() / (h*w*255) # scaled to [0,1] # maybe too extreme? # better alternatives if ink == 0: print('Blank image:',png) assert ink>0 F[0 ,k] = ink # height and width of character xmin = X[ a>0 ].min() xmax = X[ a>0 ].max() ymin = Y[ a>0 ].min() ymax = Y[ a>0 ].max() logaspect = np.log10((ymax-ymin)/(xmax-xmin)) F[1 ,k] = logaspect # left-right asymmetry cbbx = (xmin+xmax)/2 # center of bounding box cogx = (X*a).sum() / a.sum() # x-coordinate of center of mass of ink lrasymmetry = (cogx-cbbx) / (xmax-xmin) F[2 ,k] = lrasymmetry #print(F) plt.figure(figsize=(12,12)) for i in range(d): for j in range(d): plt.subplot(d,d,i*d+j+1) if i==j: plt.text(.5,.5,features[i],ha='center') plt.xticks([]) plt.yticks([]) else: plt.scatter( F[j,:], F[i,:] , s=2, alpha=0.4 )

Exercise: let's color the points according to their character classes, which we know from the image filename.

Exercise: set it up so we can include or exclude character classes for easier inspection of what we're doing.

This is a way of separating a cloud of points into k clusters - by their proximity to k points called "means".

Let's implement it.