MTH 448/563 Data-Oriented Computing

Fall 2019

Day 10: JSON, cont'd

Let's socialize a bit first: introduce yourself to 2 people you haven't yet spoken to.

Todays topics

  • correction
  • Report 2
  • escape from Jupyter

Correction about intensive

Tried to trick you, but tricked myself.

Review of Reports 2

These ranged from "you can do much better" to amazing!

Things you must NOT do

  • repeat code blocks
comments_on_report2_f19/code_repetition.png
  • hard-code indices (This is what lists and dictionaries are for!)
comments_on_report2_f19/index_hardcoding.png comments_on_report2_f19/spelling_error.png
  • search one giant table for every element of another giant table
comments_on_report2_f19/inefficient_1.png comments_on_report2_f19/inefficient_2.png
  • contradict what the reader can see in your own results
skull_and_crossbones.png

Note: Testing membership in a set is MUCH faster than testing presence in a list: demo.

Thing you must always do

  • sanity-check your results

Average temperatures in January (cannot possibly be correct):

comments_on_report2_f19/texas_newmexico.png

Obesity rates (suspect, but ...):

comments_on_report2_f19/obesity_also_has_state_boundary_jumps.png

Index for website

quiz name "please index"

Matt Nagowski exhibit

mattnagowski/index.html

NHTSA complaints database, cont'd

The National Highway Traffic Safety Administration (part of the US Department of Transportation) maintains a database of vehicle safety complaints: you can file a complaint here.

There is an web API to access the complaints:

http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

import requests
import json
url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'
year,make,model = '2005','Chevrolet','Cobalt'
url = url0.format(year,make,model)
s = requests.get(url).text  # a JSON string
complaints = json.loads(s)

Exercise: Suppose you are thinking of buying a used Hyundai Sonata [substitute your own preferred make and model]. Are there any model years you should avoid? Make a chart of the number of complaints versus model year (using altair).

Chevrolet Cobalt ignition switch

From Wikipedia article on Chevrolet Cobalt: Faulty ignition switches in the Cobalts, which cut power to the car while in motion, were eventually linked to many crashes resulting in fatalities, starting with a teenager in 2005 who drove her new Cobalt into a tree. The switch continued to be used in the manufacture of the vehicles even after the problem was known to GM. On February 21, 2014, GM recalled over 700,000 Cobalts for issues traceable to the defective ignition switches. In May 2014 the NHTSA fined the company $35 million for failing to recall cars with faulty ignition switches for a decade, despite knowing there was a problem with the switches. Thirteen deaths were linked to the faulty switches during the time the company failed to recall the cars.

Exercise: Was this problem evident from the NHTSA complaint database long before the 2014 recall? How would you go about searching the database for evidence of other serious problems?

Code written in class last Wed (Day 9) cleaned up a little:

import requests
import json
url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

make,model = 'Chevrolet','Cobalt'
d = {'number of complaints':[],'model year':[]}
for year in range(2000,2019):
        url = url0.format(year,make,model)
        print(str(year)+'\r',end='')
        s = requests.get(url).text  # a JSON string
        complaints = json.loads(s)
        d['number of complaints'].append(complaints['Count'])
        d['model year'].append(year)

import altair as alt
alt.renderers.enable('notebook')
import pandas as pd
df = pd.DataFrame.from_dict(d)
alt.Chart(df,title=make+' '+model).mark_bar().encode( x='model year:O', y='number of complaints')

Report 3

For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

Cautions:

  • Without getting data from somewhere else, you do not know how many new cars of a given year, make, model were sold in the US.
  • Likewise, you do not know how many of a given year, make, model were still on the road in any given year.

So you may need to do comparisons that effectively divide those unknowns out.

Strange date format (elaborated Unix timestamp)

Times are measured in seconds since the beginning of 1970.

from datetime import datetime
t0 = datetime.fromtimestamp(1266296400)
2010-02-16 00:00:00

datetime.datetime objects can be subtracted from each other

datetime.strptime('2019/09/25 14:41','%Y/%m/%d %H:%M')
datetime.datetime(2019, 9, 25, 14, 41)

Exercise: Use google to find out how to use Altair to make a histogram of the IncidentDate of complaints of a certain kind.