MTH 448/563 Data-Oriented Computing

Fall 2019

Day 9: JSON

Let's socialize a bit first: introduce yourself to 2 people you haven't yet spoken to.

Todays topics

  • recaps: regex, color maps and color scales for SVG map
  • JSON
  • escape from Jupyter

JSON

The main current competitor of XML for self-describing human-readable data formats.

Supplementary reference: Jennifer Widom database lectures.

Examples of use

Example of use: Thunderbird email client's bookkeeping:

$ ls *.json

addons.json
blocklist-addons.json
blocklist-gfx.json
blocklist-plugins.json
directoryTree.json
downloads.json
extensions.json
folderTree-1.json
folderTree-2.json
folderTree-3.json
folderTree-4.json
folderTree-5.json
folderTree-6.json
folderTree-7.json
folderTree.json
logins.json
search.json
sessionCheckpoints.json
session.json
times.json
xulstore.json

Another example: Jupyter notebooks!

Exercise: Make a copy of any of your Jupyter notebook (.ipynb) files as something.json. Then view with your browser or a text editor.

Special appeal for us: JSON text happens to look almost like Python code with data structures we use all the time (lists, dicts). (It is Javascript code.)

Browser plugins that render JSON nicely are available.

Correspondence with Python data structures

  • object = dictionary (string:value)
  • array = list
  • value = string/number/boolean/null/object/array

How to read and write JSON in Python

json module has methods to write and read JSON files: dump, load (and dumps, loads to and from strings)

import json
with open('foo.json') as f:
        myobj = json.load(f)

with open('bar.json','w') as f:
        json.dump(myobj,f,indent=3,sortkeys=True)

Cautions:

  • JSON keys are always strings (not required in Python dictionaries)
  • JSON text is almost pasteable as Python code, but JSON "true/false" map to Python "True/False", and JSON "null" maps to Python "None".
  • numpy arrays can't be stored as JSON without conversion to lists.

Escape from Jupyter

Exercise: Write an "Escape from Jupyter" code. Imagine the Jupyter project languishes, and 10 years from now you want to run the code in Jupyter notebooks you wrote this year. Write a plain text python program that extracts all the python code and annotation from a Jupyter Notebook and writes a corresponding executable plain Python script. Specifically:

def jup2py(jupfile):
        # writes plain text python file [jupfile].py

NHTSA complaints database

The National Highway Traffic Safety Administration (part of the US Department of Transportation) maintains a database of vehicle safety complaints: you can file a complaint here.

There is an web API to access the complaints:

http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

import requests
import json
url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'
year,make,model = '2005','Chevrolet','Cobalt'
url = url0.format(year,make,model)
s = requests.get(url).text  # a JSON string
complaints = json.loads(s)

Exercise: Suppose you are thinking of buying a used Hyundai Sonata [substitute your own preferred make and model]. Are there any model years you should avoid? Make a chart of the number of complaints versus model year (using altair).

Chevrolet Cobalt ignition switch

From Wikipedia article on Chevrolet Cobalt: Faulty ignition switches in the Cobalts, which cut power to the car while in motion, were eventually linked to many crashes resulting in fatalities, starting with a teenager in 2005 who drove her new Cobalt into a tree. The switch continued to be used in the manufacture of the vehicles even after the problem was known to GM. On February 21, 2014, GM recalled over 700,000 Cobalts for issues traceable to the defective ignition switches. In May 2014 the NHTSA fined the company $35 million for failing to recall cars with faulty ignition switches for a decade, despite knowing there was a problem with the switches. Thirteen deaths were linked to the faulty switches during the time the company failed to recall the cars.

Exercise: Was this problem evident from the NHTSA complaint database long before the 2014 recall? How would you go about searching the database for evidence of other serious problems?

Report 3

For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

Cautions:

  • Without data from somewhere else, you do not know how many new cars of a given year, make, model were sold in the US.
  • Likewise, you do not know how many of a given year, make, model were still on the road in any given year.

So you may need to do comparisons that effectively divide those unknowns out.

Strange date format (elaborated Unix timestamp)

Times are measured in seconds since the beginning of 1970.

from datetime import datetime
t0 = datetime.fromtimestamp(1266296400)
2010-02-16 00:00:00

datetime.datetime objects can be subtracted from each other

datetime.strptime('2019/09/25 14:41','%Y/%m/%d %H:%M')
datetime.datetime(2019, 9, 25, 14, 41)