Day 08

Thursday, Feb 23, 2017


Cobalt scandal

Upload your picture to UBlearns.

Here are your pictures and mine.

Here is my own code for downloading and visualizing the complaints (It caches the complaints so they are downloaded only once.)

Escape from Jupyter homework - comments

Do not include cell output in the file you write.

Triple-quoted strings as comments

This is
a multi-line string
which could be used as a comment.

Using "with"

with open('in.txt') as f, open('out.txt','w') as g:
        # read and write to these files

No need to close the files: "with" takes care of that for you.

Thoughts/Discussion on Reports

Embedding data: yes or no?

Yes, give the reader a feel for the data you're dealing with:


The following is bigger but still fits on one screen, and is very helpful in conveying to the reader how you have processed the data:


But no, do NOT include long lists, tables that extend over multiple screens, that no one is going to read. Include this kind of material in an Appendix if you think it's vital to have for reference.

Scaling things out

number of steering complaints / number of complaints

number of complaints / number of cars

number of Elizabeths / total number of births



number of distinct names / total number of births ?


Shifting axes

Life-cycle of names, pulses of popularity


Perhaps we could shift and rescale every name to place their peaks of popularity all at the same point, and see how the distribution looks.

Taking logarithms

big, small, very small

zero must be a special value! (not an arbitrary origin)


Stacked histograms BAD! (imho)


Obscures what's going on for every layer except the lowest.

How many curves in a plot?

A few is good. Very many can be good. Intermediate not so much.

Combining anecdote with entire population statistics

A think this picture is striking:


Rotating axis labels

plt.xticks(np.arange(1940,1986,1),rotation=60) plt.xlim(1940,1985)


Legend outside the box

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) #(Hui Duan) gives external legend.


What is the distribution of 1-year jumps? (Perhaps broken down by popularity level.)

XML (eXtensible Markup Language)

What is XML? Wikipedia says: Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.

Examples of XML standards

GPX an XML format for exchanging GPS data (Example: Winnipeg.gpx recorded using MyTracks on my phone, Dec 31, 2015.)


KML for plotting things on Google Earth (Example: NCEDC earthquakes)


SVG for vector graphics (Example: simple.svg)


And last but not least, HTML - the language of the web!

Many sources provide data in their own ad-hoc XML format. Example: real-time Chicago bus information

Exercise XML 1a: your own XML dialect and data

Type an XML document of your own invention from scratch by hand. Let's say you want to record a stream of notes from you to yourself. Or a set of any other kind of objects - you can choose anything you like. Invent an appropriate set of tags for the task.

Controversy?: attributes vs. nested elements

Exercise XML 1b: Write an XML parser

to extract information from your XML document.

Useful things:

from lxml import etree
fromstring(), tostring()
 .tag, .text, .attrib[name]

Exercise XML 1c: How would this look in JSON?

How would the above all work using JSON instead of XML? Replicate Exercise 7a with JSON (manually type the analogous json document, and parse it with python).