Day 26, Thursday, May 3, 2018

Computing with text

Word frequency analysis, cont'd

Log-log plot and Zipf's Law: If it's a straight line, what does it mean? And what is the slope?

George_Kingsley_Zipf_1917.jpg

Caution: as always use only your own words in your report.

Exercise: List the words in your text in order of decreasing length.

Exercise: Examine the relationship between word frequency and frequency rank.

Generate a random text

Other texts of possible interest: 1st_debate_clinton.txt, 1st_debate_trump.txt, 2nd_debate_clinton.txt, 2nd_debate_trump.txt

Germ avoidance problem, cont'd

elliptical_machine_uses_a.jpg

Code to start from: germ_avoidance_starter_code_rev1.ipynb

A class for making pictures

tracerecorder.py

infection_rev1.png

At the end of each trial, you will have 3D boolean arrays for users and machines so that UV[i,j,k] says whether on day i, user j has virus k.

We can aggregate this in a number of ways to obtain quantities of interest.

Actually, I'm tempted to make a 4D array with the additional index for trial number, so that we can do any kind of aggregation by summing along appropriate indices. Too much memory?