MTH 448/563 Data-Oriented Computing

Fall 2019

Day 3: Visualization of data

Greenscreen collage

greenscreen_collage.png

Visualization of data

names_on_a_board.jpg

Context: first names given in US 1880-2018

Concepts: logarithms, ratios

Tools: matplotlib, altair. Choosing.

Plotting libraries

matplotlib.pyplot - the most widely used historically

altair - putatively "declarative" (what) instead of "imperative" (how)

A data structure for the names data

My suggestion: a dictionary of dictionaries { name:{'F':array,'M':array} }

Useful features: glob.glob(), split()

Visually representing aspects of the data: frequency

Exercise: plot the frequency of your own name

T or F?: My name experienced a surge in popularity at the start of World War 1 and again during World War 2.

Scaling

Exercise: Find a name that had a sudden surge or drop in popularity - and suggest an explanation.

Another aspect of the data: gender specificity

Exercise: Illustrate the gender specificity of a few names. Then for a lot of names.

Discussion: How to explain features of gender-specificity plot.

Exercise: Make a count of frequencies by last letter and gender.

Assignment for next Monday, Sep 9

Carefully study the Report Guide. There will be a quiz on it on Monday.