Index of daily outlines

Contents


aggregation

day11_f19.html - a few simple aggregations

day11_f19.html Grouping and aggregation

day11_f19.html The groupby object supports aggregation, using agg({column:aggregator})

day12_f19.html Grouping and aggregation

day12_f19.html The groupby object supports aggregation, using agg({column:aggregator}) where aggregator is either a ufunc, like numpy.std or

day13_f19.html Grouping and aggregation

day13_f19.html The groupby object supports aggregation, using agg({column:aggregator}) where aggregator is either a ufunc, like numpy.std or

day14_f19.html aggregation

day15_f19.html aggregation

alt.chart

day10_f19.html alt.Chart(df,title=make+' '+model).mark_bar().encode( x='model year:O', y='number of complaints')

day13_f19.html alt.Chart(table).mark_bar().encode(x=dd+':N',y='count',color='Age Group')

altair

day03_f19.html Tools: matplotlib, altair. Choosing.

day03_f19.html altair - putatively "declarative" (what) instead of "imperative" (how)

day04_f19.html How to make a bar chart in altair

day04_f19.html Example here

day09_f19.html Are there any model years you should avoid? Make a chart of the number of complaints versus model year (using altair).

day10_f19.html import altair as alt

day10_f19.html Exercise: Use google to find out how to use Altair to make a histogram of the IncidentDate of complaints of a certain kind.

day10_f19.html Are there any model years you should avoid? Make a chart of the number of complaints versus model year (using altair).

day11_f19.html Exercise: Make a histogram of incident dates or filing dates of complaints about 1 make, model, year. .

day11_f19.html Nice to make a stacked bar chart with Altair but ...

day11_f19.html altair wants the data in "long" format while we have prepared it in "wide" format. "Melting" and "pivoting" are operations

day11_f19.html long vs. wide tables, pandas.melt(). Note altair also provides "melt" and "fold" transformations.

day12_f19.html Altair data encoding types for reference:

day13_f19.html Altair data encoding types for reference: Nominal, Ordinal, Quantitative, Temporal

amazon

day04_f19.html Grab a price from Amazon

day04_f19.html Sometimes you will find Amazon refuses to serve the page to a script (robot).

day05_f19.html Unlike the Amazon item price, it is not very easy to locate the chunks of text we want.

bar chart

day04_f19.html How to make a bar chart in altair

day11_f19.html Nice to make a stacked bar chart with Altair but ...

day11_f19.html Steering complaints and stacked bar chart

day12_f19.html - Display in a bar chart.

day12_f19.html - Elaborate the bar chart as a stacked bar chart with color denoting Age Group (use groupby with a tuple of columns).

day13_f19.html - Display in a bar chart.

day13_f19.html - Elaborate the bar chart as a stacked bar chart with color denoting Age Group (use groupby with a tuple of columns).

birth weight

day13_f19.html Exercise: Reusing code from last time, make a histogram of Birth Weight in the SPARCS dataset.

day13_f19.html - Is there a relationship between birth weight and the age of mothers? (!)

broccoli

day11_f19.html ['veg','broccoli',5],

day12_f19.html ['veg','broccoli',5],

day13_f19.html ['veg','broccoli',5],

bs4

day05_f19.html b = bs4.BeautifulSoup(s,'lxml')

day05_f19.html import bs4

day05_f19.html HTML parsing with bs4

day05_f19.html More general solution: bs4 module or lxml module

day05_f19.html bs4 is a bit more tolerant of badly formed XML.

day06_f19.html b = bs4.BeautifulSoup(s,'lxml')

day06_f19.html import bs4

bubbles

day06_f19.html Exercise: Make a picture with a large collection of "bubbles" of random size, position, and color.

day06_f19.html - writing SVG from scratch (random bubbles)

day07_f19.html Bubble check: Show your picture of a large collection of "bubbles" of random size, position, and color.

buffalo

448_563_policies_f19.html Course website: http://blue.math.buffalo.edu/448 This website provides all the basic information you need for the class,

448_563_policies_f19.html Email: ringland at buffalo.edu. Be sure to put “448” or “563” in the Subject line.

448_563_policies_f19.html Any violation of this policy will be pursued to the fullest extent of university policy.

day01_f19.html - Open Data Buffalo (json)

day05_f19.html url = 'http://www.buffalo.edu/class-schedule?switch=showcourses&semester=fall&division=UGRD&dept=ASL'

day05_f19.html For each ASL course this at UB semester, get the timeslot, the number of students registered, and the number of empty seats.

day06_f19.html url = 'http://www.buffalo.edu/class-schedule?switch=showcourses&semester=fall&division=UGRD&dept=ASL'

day08_f19.html "http://blue.math.buffalo.edu/463/mycourses"

day08_f19.html targetNamespace ="http://blue.math.buffalo.edu/463/mycourses"

day08_f19.html xmlns="http://blue.math.buffalo.edu/463/mycourses"

day08_f19.html Reason: Unexpected child with tag '{http://blue.math.buffalo.edu/463/mycourses}nombre' at position 1. Tag (number | name | semester) expected.

day08_f19.html XMLSchemaChildrenValidationError: failed validating <Element '{http://blue.math.buffalo.edu/463/mycourses}course' at 0x7fc9687ab3b8> with XsdGroup(model='all', occurs=[1, 1]):

not_used_day01_f19.html Download this list of English words: http://blue.math.buffalo.edu/448/words.txt

chicago

day01_f19.html - Chicago real-time bus info (XML)

day05_f19.html Many sources provide data in their own ad-hoc XML format. Example: real-time Chicago bus information

color map

day06_f19.html - hex color specs and color maps

day06_f19.html Color maps

day07_f19.html - color maps

day07_f19.html 2D color maps can be contemplated, but their comprehensibility is debatable.

day07_f19.html Although you are welcome to consider elaborate color maps (even 2D maps representing two quantities),

day07_f19.html Color maps

day07_f19.html Discrete color maps (i.e. with a small finite set of colors) are also widely used.

day07_f19.html Note: You don't have to use a linear color map: consider if a logarithmic one would be helpful.

day08_f19.html Although you are welcome to consider elaborate color maps (even 2D maps representing two quantities),

day08_f19.html Note: You don't have to use a linear color map: consider if a logarithmic one would be helpful.

day09_f19.html - recaps: regex, color maps and color scales for SVG map

count

448_563_policies_f19.html traced to you (cannot be your real name or an abbreviation of it, not your person number, not your gmail account name, etc.).

day03_f19.html Exercise: Make a count of frequencies by last letter and gender.

day05_f19.html count = 0

day06_f19.html with open('USA_Counties_with_FIPS_and_names.svg') as f:

day06_f19.html Exercise: Use a text editor to color the county where you were born with some bright color of your choice, by adding a "fill" attribute to the appropriate path element, and check it in your browser. If you were not born in the USA, consider yourself an honorary Erie County native.

day06_f19.html Exercise: Write code to randomly color the counties of this map.

day06_f19.html Here is a map of the United States with outlines for every county: USA_Counties_with_FIPS_and_names.svg

day07_f19.html with open('USA_Counties_with_FIPS_and_names.svg') as f:

day07_f19.html Exercise: Use a text editor to color the county where you were born with some bright color of your choice, by adding a "fill" attribute to the appropriate path element, and check it in your browser. If you were not born in the USA, consider yourself an honorary Erie County native.

day07_f19.html Exercise: Write code to color the counties of this map randomly.

day07_f19.html - US county FIPS codes

day07_f19.html Color the counties on the map according to your data

day07_f19.html Find some interesting data for every county in the US

day07_f19.html Here is a map of the United States with outlines for every county: USA_Counties_with_FIPS_and_names.svg

day07_f19.html Search online for some data given for every county in the nation.

day07_f19.html You may want to consult this page about the FIPS codes for counties.

day08_f19.html Quiz "are they intensive?": Of US counties: (i) maximum distance from state capital (ii) average distance from state capital.

day08_f19.html Color the counties on the map according to your data

day08_f19.html Find some interesting data for every county in the US

day08_f19.html Search online for some data given for every county in the nation.

day08_f19.html You may want to consult this page about the FIPS codes for counties.

day10_f19.html d['number of complaints'].append(complaints['Count'])

day11_f19.html ['veg','carrot',15]], columns=['type','name','count'])

day11_f19.html # count complaints whose Summary contains search string

day11_f19.html d['other'].append(complaints['Count']-nc)

day11_f19.html - value_counts

day11_f19.html My code that counts the complaints that mention something specific as well as all the others:

day12_f19.html ['veg','carrot',15]], columns=['type','name','count'])

day12_f19.html Exercise: Explore variation in charges from county to county. Be careful not to draw unwarranted conclusions.

day12_f19.html - List diagnoses with counts in order of decreasing frequency.

day12_f19.html - mean and/or mean charges by hospital county

day12_f19.html Download this county map of New York State and this function I wrote: colorny.py to color it from a Series.

day12_f19.html the name of an operation pandas understands, like 'count'.

day13_f19.html ['veg','carrot',15]], columns=['type','name','count'])

day13_f19.html Exercise: Explore variation in charges from county to county. Be careful not to draw unwarranted conclusions.

day13_f19.html - List diagnoses with counts in order of decreasing frequency.

day13_f19.html Download this county map of New York State and this function I wrote: colorny.py to color it from a Series.

day13_f19.html alt.Chart(table).mark_bar().encode(x=dd+':N',y='count',color='Age Group')

day13_f19.html the name of an operation pandas understands, like 'count'.

day14_f19.html To aggregate (e.g. sum, average, count, min, max) by groups, use "GROUP BY"

day14_f19.html where AGG stands for one of SUM, COUNT, MAX, AVG, etc.

day15_f19.html To aggregate (e.g. sum, average, count, min, max) by groups, use "GROUP BY"

day15_f19.html where AGG stands for one of SUM, COUNT, MAX, AVG, STDEV, etc.

day16_f19.html Galaxy count

day17_f19.html Galaxy count

not_used_day01_f19.html Count and display word-pair frequencies in tweets of RealDonaldTrump and BarackObama.

not_used_day01_f19.html Help: here is a list of countries (that needs cleaning).

not_used_day01_f19.html Which countries are mentioned in the Wikipedia article on the potato?

dataframe

day02_f19.html A dataframe is quite like a 2D numpy array except the rows and columns can have arbitrary labels instead of just successive integers.

day02_f19.html Exercise: add columns to the dataframe containing

day02_f19.html Load the data into a pandas "dataframe" as follows:

day10_f19.html df = pd.DataFrame.from_dict(d)

day11_f19.html df = pd.DataFrame.from_dict(d)

day11_f19.html df = pd.DataFrame([['fruit','apple',23],

day11_f19.html groupby gives an iterable of (group name, group dataframe) tuples

day11_f19.html Pandas DataFrame and Series features

day12_f19.html df = pd.DataFrame([['fruit','apple',23],

day12_f19.html groupby gives an iterable of (group name, group dataframe) tuples

day13_f19.html df = pd.DataFrame([['fruit','apple',23],

day13_f19.html groupby gives an iterable of (group name, group dataframe) tuples

dictionaries

day03_f19.html My suggestion: a dictionary of dictionaries { name:{'F':array,'M':array} }

day09_f19.html - JSON keys are always strings (not required in Python dictionaries)

day10_f19.html - hard-code indices (This is what lists and dictionaries are for!)

dictionary

day03_f19.html My suggestion: a dictionary of dictionaries { name:{'F':array,'M':array} }

day05_f19.html Attributes of elements can be got using dictionary-element-like syntax.

day09_f19.html - object = dictionary (string:value)

extensive

448_563_policies_f19.html - Extensive writing of formal reports - 7 of them.

day07_f19.html Quiz "extensive": give me one example of an extensive property.

day07_f19.html - intensive vs. extensive properties

day07_f19.html .. image:: choropleth_extensive.png

day07_f19.html For future reference, if you do want to represent an extensive quantity, possible options (for another report) are:

day07_f19.html Intensive vs. extensive properties

day07_f19.html of regions with the same value. In contrast, "extensive" properties are absolute amounts

day16_f19.html An extensive moderate (0.5 arcsecond) resolution survey of the sky: an example of big data with a SQL API.

day17_f19.html An extensive moderate (0.5 arcsecond) resolution survey of the sky: an example of big data with a SQL API.

footprint

day16_f19.html somewhere in the DR14 footprint,

day17_f19.html somewhere in the DR14 footprint,

fruit

day11_f19.html ['fruit','banana',113],

day11_f19.html ['fruit','orange',67],

day11_f19.html df = pd.DataFrame([['fruit','apple',23],

day12_f19.html ['fruit','banana',113],

day12_f19.html ['fruit','orange',67],

day12_f19.html df = pd.DataFrame([['fruit','apple',23],

day13_f19.html ['fruit','banana',113],

day13_f19.html ['fruit','orange',67],

day13_f19.html df = pd.DataFrame([['fruit','apple',23],

groupby

day02_f19_sketch.html If I do it with Pandas, I have to introduce groupby etc which is a bit much

day11_f19.html groupby gives an iterable of (group name, group dataframe) tuples

day11_f19.html More generally, we can use the groupby method

day11_f19.html The groupby object supports aggregation, using agg({column:aggregator})

day12_f19.html groupby gives an iterable of (group name, group dataframe) tuples

day12_f19.html - Elaborate the bar chart as a stacked bar chart with color denoting Age Group (use groupby with a tuple of columns).

day12_f19.html Pandas groupby method

day12_f19.html The groupby object supports aggregation, using agg({column:aggregator}) where aggregator is either a ufunc, like numpy.std or

day13_f19.html groupby gives an iterable of (group name, group dataframe) tuples

day13_f19.html - Elaborate the bar chart as a stacked bar chart with color denoting Age Group (use groupby with a tuple of columns).

day13_f19.html Pandas groupby method

day13_f19.html The groupby object supports aggregation, using agg({column:aggregator}) where aggregator is either a ufunc, like numpy.std or

day15_f19.html class.groupby( teamname from last time ).agg( discuss and work together )

heatmap

day13_f19.html - Make a heatmap or scatter plot of charges/day vs. length of stay. .

histogram

day10_f19.html Exercise: Use google to find out how to use Altair to make a histogram of the IncidentDate of complaints of a certain kind.

day11_f19.html Exercise: Make a histogram of incident dates or filing dates of complaints about 1 make, model, year. .

day11_f19.html - Can we understand this distribution by making a histogram? (histogram1d.py, matplotlib.pyplot.bar(barcenters, heights, barwidth))

day12_f19.html Exercise: Make a histogram of the 'Total Charges' in the SPARCS data. (log charges, normal comparison?)

day12_f19.html - Can we understand this distribution by making a histogram? (histogram.py, matplotlib.pyplot.bar(barcenters, heights, barwidth))

day12_f19.html Histograms in more depth

day12_f19.html Let's make a histogram of this data:

day12_f19.html make our own histogram

day13_f19.html Exercise: Reusing code from last time, make a histogram of Birth Weight in the SPARCS dataset.

day13_f19.html - Create a histogram of the lengths of stay.

inner

day13_f19.html inner, left (outer), right (outer), and (full) outer joins

day14_f19.html Remember the default inner join omits rows where data absent in other table.

day14_f19.html inner join

day14_f19.html natural join - inner join on common column(s)

day15_f19.html Remember the default inner join omits rows where data absent in other table.

day15_f19.html inner join

day15_f19.html natural join - inner join on common column(s)

intensive

day07_f19.html Quiz "intensive": give me one example of an intensive property.

day07_f19.html - intensive vs. extensive properties

day07_f19.html Intensive properties are things like proportions, densities, rates. They are unchanged upon merging

day07_f19.html Intensive vs. extensive properties

day07_f19.html With very rare exceptions, choropleth maps should be used only for "intensive" properties.

day08_f19.html Quiz "are they intensive?": Of US counties: (i) maximum distance from state capital (ii) average distance from state capital.

day10_f19.html Correction about intensive

join

day12_f19.html Merging (joining) tables

day13_f19.html Merging (joining) tables

day13_f19.html inner, left (outer), right (outer), and (full) outer joins

day14_f19.html select * from A join B on A.foo = B.foo;

day14_f19.html select * from A outer left join B on A.foo = B.foo where A.blah = 'Hey!';

day14_f19.html select * from A outer left join B on A.foo = B.foo where A.blah in (2018,2017,2016);

day14_f19.html select * from A outer left join B on A.foo = B.foo;

day14_f19.html select * from B outer left join A on A.foo = B.foo;

day14_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day14_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day14_f19.html select * from tableA natural join tableB;

day14_f19.html Remember the default inner join omits rows where data absent in other table.

day14_f19.html SQLite does not support right joins, but if you need one, just swap the left and right tables:

day14_f19.html inner join

day14_f19.html joins

day14_f19.html natural join - inner join on common column(s)

day14_f19.html outer left join

day15_f19.html select * from A join B on A.foo = B.foo;

day15_f19.html select * from A outer left join B on A.foo = B.foo where A.blah = 'Hey!';

day15_f19.html select * from A outer left join B on A.foo = B.foo where A.blah in (2018,2017,2016);

day15_f19.html select * from A outer left join B on A.foo = B.foo;

day15_f19.html select * from B outer left join A on A.foo = B.foo;

day15_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day15_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day15_f19.html select * from tableA natural join tableB;

day15_f19.html Remember the default inner join omits rows where data absent in other table.

day15_f19.html SQLite does not support right joins, but if you need one, just swap the left and right tables:

day15_f19.html inner join

day15_f19.html joins

day15_f19.html natural join - inner join on common column(s)

day15_f19.html outer left join

json

448_563_policies_f19.html - Structured data formats (json, xml, ...) and validation

day01_f19.html - Structured data formats (json, xml, ...) and validation

day01_f19.html - NHTSA complaints (json)

day01_f19.html - Open Data Buffalo (json)

day05_f19.html Today we will look at XML, the 2nd of 3 major plain-text data markup languages in use today (csv, xml, json).

day09_f19.html json.dump(myobj,f,indent=3,sortkeys=True)

day09_f19.html myobj = json.load(f)

day09_f19.html $ ls *.json

System Message: WARNING/2 (index_of_daily_outlines.rst, line 562); backlink

Inline emphasis start-string without end-string.

day09_f19.html addons.json

day09_f19.html blocklist-addons.json

day09_f19.html blocklist-gfx.json

day09_f19.html blocklist-plugins.json

day09_f19.html complaints = json.loads(s)

day09_f19.html directoryTree.json

day09_f19.html downloads.json

day09_f19.html extensions.json

day09_f19.html folderTree-1.json

day09_f19.html folderTree-2.json

day09_f19.html folderTree-3.json

day09_f19.html folderTree-4.json

day09_f19.html folderTree-5.json

day09_f19.html folderTree-6.json

day09_f19.html folderTree-7.json

day09_f19.html folderTree.json

day09_f19.html import json

day09_f19.html logins.json

day09_f19.html s = requests.get(url).text # a JSON string

day09_f19.html search.json

day09_f19.html session.json

day09_f19.html sessionCheckpoints.json

day09_f19.html times.json

day09_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day09_f19.html with open('bar.json','w') as f:

day09_f19.html with open('foo.json') as f:

day09_f19.html xulstore.json

day09_f19.html Exercise: Make a copy of any of your Jupyter notebook (.ipynb) files as something.json. Then view with your

day09_f19.html - JSON

day09_f19.html - JSON keys are always strings (not required in Python dictionaries)

day09_f19.html - JSON text is almost pasteable as Python code, but JSON "true/false" map to Python "True/False", and JSON "null" maps to Python "None".

day09_f19.html - numpy arrays can't be stored as JSON without conversion to lists.

day09_f19.html Browser plugins that render JSON nicely are available.

day09_f19.html Day 9: JSON

day09_f19.html How to read and write JSON in Python

day09_f19.html JSON

day09_f19.html Special appeal for us: JSON text happens to look almost like Python code with data structures we use all the time (lists, dicts). (It is Javascript code.)

day09_f19.html Supplementary reference: Jennifer Widom database lectures.

day09_f19.html http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

day09_f19.html json module has methods to write and read JSON files: dump, load (and dumps, loads to and from strings)

day10_f19.html complaints = json.loads(s)

day10_f19.html s = requests.get(url).text # a JSON string

day10_f19.html complaints = json.loads(s)

day10_f19.html import json

day10_f19.html s = requests.get(url).text # a JSON string

day10_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day10_f19.html Day 10: JSON, cont'd

day10_f19.html http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

day11_f19.html complaints = json.loads(s)

day11_f19.html import json

day11_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day13_f19.html alt.data_transformers.enable('json')

merge

day14_f19.html Note: Can do a.merge(b) as well as pandas.merge(a,b)

movie

day01_f19.html The Great Hack

day08_f19.html Or a set of any other kind of objects, a contacts list, movies you like, etc. - you can choose anything you like.

day11_f19.html - movie reviews

day11_f19.html In preparation for next week, please provide reviews of at least 5 movies:

day11_f19.html Your movie reviews

day11_f19.html movie table except change first Z to r

day12_f19.html Your movie reviews

day12_f19.html movie table except change first Z to r

day13_f19.html 'f19_movie.xlsx' :'1rZAUcQxNTGcwZTnUEI5kZJvDGjLE7V1LvB5Rey5rY1w', # except change first Z to r

day13_f19.html Quiz: "movie question". A question about the movie reviews you'd like answered. (Please write your question on a single line.)

day13_f19.html Your movie reviews

day13_f19.html movie table except change first Z to r

day14_f19.html 'f19_movie.xlsx' :'1rZAUcQxNTGcwZTnUEI5kZJvDGjLE7V1LvB5Rey5rY1w', # except change first Z to r

day14_f19.html .import 448f18_movie.csv movie

day14_f19.html SELECT sql FROM sqlite_master WHERE name='movie';

day14_f19.html q = 'select * from movie'

day14_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day14_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day14_f19.html Quiz: "movie question". A question about the movie reviews you'd like answered. (Please write your question on a single line.)

day14_f19.html Movie ratings database

day14_f19.html Your movie questions

day14_f19.html Your movie reviews

day14_f19.html movie table except change first Z to r

day14_f19.html movie queries revisited in SQL

day15_f19.html 'f19_movie.xlsx' :'1rZAUcQxNTGcwZTnUEI5kZJvDGjLE7V1LvB5Rey5rY1w', # except change first Z to r

day15_f19.html .import f19_movie.csv movie

day15_f19.html SELECT sql FROM sqlite_master WHERE name='movie';

day15_f19.html q = 'select * from movie'

day15_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day15_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day15_f19.html Exercise: Let us redo our movie database queries in SQL.

day15_f19.html Movie ratings database

day15_f19.html Your movie questions

day15_f19.html Your movie reviews

day15_f19.html movie table except change first Z to r

day15_f19.html movie queries revisited in SQL

day16_f19.html Review of queries on movie database.

day16_f19.html Your movie questions

new york

day01_f19.html - New York State hospitalizations (SPARCS) (csv)

day07_f19.html Animated choropleth on child mortality in yesterday's New York Times.

day11_f19.html - What does your average New Yorker get charged for hospital stays in one year?

day12_f19.html - What does your average New Yorker get charged for hospital stays in one year?

day12_f19.html Download this county map of New York State and this function I wrote: colorny.py to color it from a Series.

day13_f19.html Download this county map of New York State and this function I wrote: colorny.py to color it from a Series.

not_used_day01_f19.html Is Ireland mentioned in the home page of today's New York Times?

news

data_in_the_news_f19.html A selection of recent articles from the news media about applications of the tools and techniques of data collection and analysis.

data_in_the_news_f19.html Data in the news, Fall 2019

nhtsa

day01_f19.html - NHTSA complaints (json)

day09_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day09_f19.html Exercise: Was this problem evident from the NHTSA complaint database long before the 2014 recall?

day09_f19.html For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

day09_f19.html From Wikipedia article on Chevrolet Cobalt: Faulty ignition switches in the Cobalts, which cut power to the car while in motion, were eventually linked to many crashes resulting in fatalities, starting with a teenager in 2005 who drove her new Cobalt into a tree. The switch continued to be used in the manufacture of the vehicles even after the problem was known to GM. On February 21, 2014, GM recalled over 700,000 Cobalts for issues traceable to the defective ignition switches. In May 2014 the NHTSA fined the company $35 million for failing to recall cars with faulty ignition switches for a decade, despite knowing there was a problem with the switches. Thirteen deaths were linked to the faulty switches during the time the company failed to recall the cars.

day09_f19.html NHTSA complaints database

day09_f19.html http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

day09_f19.html maintains a database of vehicle safety complaints: you can file a complaint here.

day10_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day10_f19.html Exercise: Was this problem evident from the NHTSA complaint database long before the 2014 recall?

day10_f19.html For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

day10_f19.html From Wikipedia article on Chevrolet Cobalt: Faulty ignition switches in the Cobalts, which cut power to the car while in motion, were eventually linked to many crashes resulting in fatalities, starting with a teenager in 2005 who drove her new Cobalt into a tree. The switch continued to be used in the manufacture of the vehicles even after the problem was known to GM. On February 21, 2014, GM recalled over 700,000 Cobalts for issues traceable to the defective ignition switches. In May 2014 the NHTSA fined the company $35 million for failing to recall cars with faulty ignition switches for a decade, despite knowing there was a problem with the switches. Thirteen deaths were linked to the faulty switches during the time the company failed to recall the cars.

day10_f19.html NHTSA complaints database, cont'd

day10_f19.html http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

day10_f19.html maintains a database of vehicle safety complaints: you can file a complaint here.

day11_f19.html url0 = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/{}/make/{}/model/{}?format=json'

day14_f19.html NHTSA Complaints Report 3 reviews

day15_f19.html NHTSA Complaints Report 3 reviews

day16_f19.html - a serious flaw in cars of a certain make, model, year in the NHTSA complaints

day17_f19.html - a serious flaw in cars of a certain make, model, year in the NHTSA complaints

outer

day13_f19.html inner, left (outer), right (outer), and (full) outer joins

day14_f19.html select * from A outer left join B on A.foo = B.foo where A.blah = 'Hey!';

day14_f19.html select * from A outer left join B on A.foo = B.foo where A.blah in (2018,2017,2016);

day14_f19.html select * from A outer left join B on A.foo = B.foo;

day14_f19.html select * from B outer left join A on A.foo = B.foo;

day14_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day14_f19.html outer left join

day15_f19.html select * from A outer left join B on A.foo = B.foo where A.blah = 'Hey!';

day15_f19.html select * from A outer left join B on A.foo = B.foo where A.blah in (2018,2017,2016);

day15_f19.html select * from A outer left join B on A.foo = B.foo;

day15_f19.html select * from B outer left join A on A.foo = B.foo;

day15_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day15_f19.html outer left join

pandas

448_563_policies_f19.html - Python Pandas data analysis library

day01_f19.html - Use of Python Pandas data analysis library

day01_f19.html and is what the data-analysis library Pandas uses "under the hood". Today we will explore using numpy directly.

day02_f19.html df = pandas.read_csv('some_words.csv')

day02_f19.html import pandas

day02_f19.html pandas is a Python library for analyzing 2D tabular data ('panel data' is supposedly the origin of the name).

day02_f19.html Load the data into a pandas "dataframe" as follows:

day02_f19.html Review of some python basics and gentle introduction to Pandas

day02_f19_sketch.html Gently introducing Pandas and reviewing more basic python.

day02_f19_sketch.html If I do it with Pandas, I have to introduce groupby etc which is a bit much

day05_f19.html - pandas.read_csv()

day05_f19.html Possible easy solution: pandas.read_html()

day10_f19.html import pandas as pd

day11_f19.html - pandas.melt()

day11_f19.html import pandas as pd

day11_f19.html - pandas for data exploration

day11_f19.html Pandas DataFrame and Series features

day11_f19.html long vs. wide tables, pandas.melt(). Note altair also provides "melt" and "fold" transformations.

day12_f19.html Pandas groupby method

day12_f19.html the name of an operation pandas understands, like 'count'.

day13_f19.html Pandas groupby method

day13_f19.html the name of an operation pandas understands, like 'count'.

day14_f19.html Note: Can do a.merge(b) as well as pandas.merge(a,b)

price

day04_f19.html Grab a price from Amazon

day05_f19.html Unlike the Amazon item price, it is not very easy to locate the chunks of text we want.

quiz

448_563_policies_f19.html - 25% Quizzes, not necessarily announced in advance, and class participation.

day02_f19.html (something you don't understand, something you want to know). Use the quiz form

day02_f19.html Let's test the quiz mechanism. Quiz name: 'test'. Content: 'hello'.

day02_f19.html Quizzes today

day02_f19.html with the quiz name 'my question day 2'.

day02_f19_sketch.html Review numpy and give some quizzes

day03_f19.html There will be a quiz on it on Monday.

day04_f19.html Quiz

day04_f19.html Quiz Name: Kira

day04_f19.html Quiz on Report Guide

day06_f19.html Quiz #1: Upload to UBlearns a plain text file with your code that does this. (Just copy and paste your code into a plain text file skej.txt.)

day07_f19.html Quiz "extensive": give me one example of an extensive property.

day07_f19.html Quiz "intensive": give me one example of an intensive property.

day08_f19.html Quiz "regex": Ask a concrete question about using regular expressions.

day08_f19.html Quiz "are they intensive?": Of US counties: (i) maximum distance from state capital (ii) average distance from state capital.

day10_f19.html quiz name "please index"

day12_f19.html Quiz: "most common diagnosis": What is the most common reason for an inpatient hospital stay?

day13_f19.html Quiz: "most common diagnosis": Guess what is the most common reason for an inpatient hospital stay?

day13_f19.html Quiz: "movie question". A question about the movie reviews you'd like answered. (Please write your question on a single line.)

day14_f19.html Quiz: "movie question". A question about the movie reviews you'd like answered. (Please write your question on a single line.)

day16_f19.html Quiz "Halley": In which year did I make those observations?

day17_f19.html Quiz "Halley": In which year did I make those observations?

day17_f19.html Quiz: 'partner'. What is the first name of your partner from Monday's class?

rating

day06_f19.html We can use Python to draw directly be generating SVG.

day11_f19.html rating table except change first Z to q

day12_f19.html rating table except change first Z to q

day13_f19.html 'f19_rating.xlsx' :'108Z2iAlSiq8AQUJCOVa5uj8lkjIpBIPLIr83C41amYw'} # except change first Z to q

day13_f19.html rating table except change first Z to q

day14_f19.html 'f19_rating.xlsx' :'108Z2iAlSiq8AQUJCOVa5uj8lkjIpBIPLIr83C41amYw'} # except change first Z to q

day14_f19.html .import 448f18_rating.csv rating

day14_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day14_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day14_f19.html Movie ratings database

day14_f19.html rating table except change first Z to q

day15_f19.html 'f19_rating.xlsx' :'108Z2iAlSiq8AQUJCOVa5uj8lkjIpBIPLIr83C41amYw'} # except change first Z to q

day15_f19.html .import f19_rating.csv rating

day15_f19.html select * from rating natural join movie where stars>4.5 order by year desc;

day15_f19.html select * from rating r outer left join movie m on m.movie_id = r.movie_id where stars>4.5;

day15_f19.html Movie ratings database

day15_f19.html rating table except change first Z to q

report_guide_f19.html and the report should be laid out in a complete, coherent and engaging manner demonstrating that you are thinking and learning.

regex

day08_f19.html Quiz "regex": Ask a concrete question about using regular expressions.

day08_f19.html Many useful "cheatsheets" available online, like this one

day09_f19.html - recaps: regex, color maps and color scales for SVG map

report

448_563_policies_f19.html Academic Integrity: Cheating in any form will not be tolerated. In particular, your reports must be entirely written by you in your own words,

448_563_policies_f19.html - 75% Formal reports (Jupyter Notebook format), due biweekly, due 7am Saturdays.

448_563_policies_f19.html - Extensive writing of formal reports - 7 of them.

data_in_the_news_f19.html "Scraping public data from a website doesn't constitute 'hacking,' according to a new court ruling that could dramatically limit abuse of the United States' primary hacking law, the Computer Frud and Abuse Act (CFAA). From a report: In its declaration, the court ruled that to violate the CFAA, somebody would need to actually 'circumvent [a] computer's generally applicable rules regarding access permissions, such as username and password requirements,' meaning it's not really hacking if you're not bypassing some kind of meaningful authorization system."

data_in_the_news_f19.html Washington Post article

day01_f19.html Detailed Report Guide

day01_f19.html More on the biweekly reports

day01_f19.html screenshot of the beginning of one

day03_f19.html Carefully study the Report Guide.

day04_f19.html True or False?: You can turn in your report late for reduced credit.

day04_f19.html False! There is no such thing as a late report.

day04_f19.html Quiz on Report Guide

day04_f19.html Report 1 on US baby names

day05_f19.html <body>Don't forget you have to grade 448/563 reports this weekend!</body>

day07_f19.html For future reference, if you do want to represent an extensive quantity, possible options (for another report) are:

day07_f19.html Report 2: Choropleth map of US

day08_f19.html - Report #2 help

day08_f19.html Help with Report 2: Story told with choropleth map(s) of US

day09_f19.html For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

day09_f19.html Report 3

day10_f19.html - Report 2

day10_f19.html .. image:: comments_on_report2_f19/code_repetition.png

day10_f19.html .. image:: comments_on_report2_f19/index_hardcoding.png

day10_f19.html .. image:: comments_on_report2_f19/inefficient_1.png

day10_f19.html .. image:: comments_on_report2_f19/inefficient_2.png

day10_f19.html .. image:: comments_on_report2_f19/obesity_also_has_state_boundary_jumps.png

day10_f19.html .. image:: comments_on_report2_f19/spelling_error.png

day10_f19.html .. image:: comments_on_report2_f19/texas_newmexico.png

day10_f19.html For Report 3, you will dig into the NHTSA complaint database to find something you feel in interesting or important.

day10_f19.html Report 3

day10_f19.html Review of Reports 2

day14_f19.html NHTSA Complaints Report 3 reviews

day15_f19.html NHTSA Complaints Report 3 reviews

day17_f19.html This will be the subject of Report 4, due 8am Saturday, Nov 2.

report_guide_f19.html observations, and interpretations. Quality of the narrative of the report.

report_guide_f19.html .. |Code| replace:: Quality of the Python code included in the report. Relevance

report_guide_f19.html .. |Presentation| replace:: Organization of the report. Text formatting.

report_guide_f19.html A report grade with a bonus may be higher than A, and will be indicated by either

report_guide_f19.html A report must begin with an introduction section. It should explain the project

report_guide_f19.html All Python code in the report must be entered in such order that it can be read

report_guide_f19.html All code included in your report must work. Do not submit reports with execution errors.

report_guide_f19.html Have fun, explore, and put in the report what you came up with.

report_guide_f19.html If you used books, articles, websites etc. while preparing the report they should

report_guide_f19.html Project Report Guide

report_guide_f19.html Project reports will be graded as follows. Each element listed in the

report_guide_f19.html Report Style Guide

report_guide_f19.html Report conclusions

report_guide_f19.html Report grading rubrics

report_guide_f19.html Report introduction

report_guide_f19.html Report organization

report_guide_f19.html The 'XX' bonus will increase the report grade by two grades (e.g from B to A-).

report_guide_f19.html The final section of the report must summarize your results and conclusions.

report_guide_f19.html The reports are your opportunity to tell a compelling story. When I read your report, I will look for your understanding, coverage and presentation of results. So submitting project reports is not about merely carrying out the tasks laid out in the problem statements. I expect you to use those as a springboard to begin your explorations and study in depth. When you finally report your results, you must be speaking in your own voice,

report_guide_f19.html You are free to discuss projects with your classmates. However, each project report

report_guide_f19.html Your report should contain only code output that serves some purpose

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html Example

report_guide_f19.html all computer code. Plagiarism will result in the zero score on a report, and

report_guide_f19.html and executed sequentially, from the beginning to the end of the report. For example,

report_guide_f19.html and the goals of the report in a way which is engaging and understandable to

report_guide_f19.html and the report should be laid out in a complete, coherent and engaging manner demonstrating that you are thinking and learning.

report_guide_f19.html be listed at the end of the report.

report_guide_f19.html letter grade for the whole report.

report_guide_f19.html must be the main part of your project report. It is fine to include some background

report_guide_f19.html report should be logically organized into sections reflecting its content.

report_guide_f19.html the overall report grade to the next higher grade (from B to B+, from B+ to A- etc.).

report_guide_f19.html you should not have in a report two consecutive code cells without any text between them.

report_guide_f19.html you should not use a Python function in your report prior to defining this function.

report_guide_f19.html | Report Content | 30% | |Content| |

reviewer

day11_f19.html reviewer table except change first Z to K

day12_f19.html reviewer table except change first Z to K

day13_f19.html 'f19_reviewer.xlsx':'1ZJpIeML2f2t6UxxGFzOTOQPM0xtyUgJVVBrW3I9nyjU', # except change first Z to K

day13_f19.html reviewer table except change first Z to K

day14_f19.html 'f19_reviewer.xlsx':'1ZJpIeML2f2t6UxxGFzOTOQPM0xtyUgJVVBrW3I9nyjU', # except change first Z to K

day14_f19.html .import 448f18_reviewer.csv reviewer

day14_f19.html reviewer table except change first Z to K

day15_f19.html 'f19_reviewer.xlsx':'1ZJpIeML2f2t6UxxGFzOTOQPM0xtyUgJVVBrW3I9nyjU', # except change first Z to K

day15_f19.html .import f19_reviewer.csv reviewer

day15_f19.html select * from reviewer;

day15_f19.html reviewer table except change first Z to K

scatter

day13_f19.html - Make a heatmap or scatter plot of charges/day vs. length of stay. .

scraping

data_in_the_news_f19.html "Scraping public data from a website doesn't constitute 'hacking,' according to a new court ruling that could dramatically limit abuse of the United States' primary hacking law, the Computer Frud and Abuse Act (CFAA). From a report: In its declaration, the court ruled that to violate the CFAA, somebody would need to actually 'circumvent [a] computer's generally applicable rules regarding access permissions, such as username and password requirements,' meaning it's not really hacking if you're not bypassing some kind of meaningful authorization system."

data_in_the_news_f19.html Sep 10: "Scraping public data isn't hacking"

day01_f19.html - Pure Python data wrangling (web-scraping, string-splitting, regular expressions)

day04_f19.html Web scraping

not_used_day01_f19.html Web scraping

sort

day09_f19.html json.dump(myobj,f,indent=3,sortkeys=True)

day11_f19.html - sort_values(by=)

not_used_day01_f19.html - sorted()

not_used_day01_f19.html Exercise 2a: Sort words by right-to-left alphabetical order

sort_values

day11_f19.html - sort_values(by=)

teaching

day13_f19.html Download these two small tables: firstname.csv, lastname.csv, teaching.csv. The third is from UB's (Oracle) database UBInfosource.

user agent

day04_f19.html In that case we will need to fake our User Agent.

day04_f19.html Spoofing user agent

xml

448_563_policies_f19.html - Structured data formats (json, xml, ...) and validation

day01_f19.html - Structured data formats (json, xml, ...) and validation

day01_f19.html - Chicago real-time bus info (XML)

day05_f19.html xmlns="http://www.w3.org/2000/svg"

day05_f19.html b = bs4.BeautifulSoup(s,'lxml')

day05_f19.html from lxml import etree

day05_f19.html Excercise: Make or take a Word or Writer document, rename it to something.zip, unzip it, and observe the XML!

day05_f19.html .. code:: xml

day05_f19.html Ad-hoc XML formats

day05_f19.html Day 5: XML and HTML in particular

day05_f19.html Example of data in an XML format:

day05_f19.html Examples of XML formats

day05_f19.html Many sources provide data in their own ad-hoc XML format. Example: real-time Chicago bus information

day05_f19.html Microsoft Office files (.docx, .xlsx, etc.), and Libre/OpenOffice files, are (gzipped) bundles of XML documents.

day05_f19.html More general solution: bs4 module or lxml module

day05_f19.html Nice Introduction at w3schools.com

day05_f19.html Quandry: attributes vs. nested elements

day05_f19.html Today we will look at XML, the 2nd of 3 major plain-text data markup languages in use today (csv, xml, json).

day05_f19.html Unfortunately all the UB class schedule pages are malformed XML!

day05_f19.html Wikipedia says: Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a [self-describing] format which is both human-readable and machine-readable.

day05_f19.html XML (eXtensible Markup Language)

day05_f19.html XML documents are element trees: elements contain other elements or text:

day05_f19.html XML is a "meta-format": it provides a standard structure for data formats.

day05_f19.html XML parsing with lxml.etree

day05_f19.html GPX an XML format for exchanging GPS data

day05_f19.html HTML - the language of the web!

day05_f19.html bs4 is a bit more tolerant of badly formed XML.

day06_f19.html xmlns="http://www.w3.org/2000/svg"

day06_f19.html b = bs4.BeautifulSoup(s,'lxml')

day06_f19.html from lxml import etree

day06_f19.html - XML schemas

day06_f19.html - modifying existing SVG with lxml

day06_f19.html .. code:: xml

day06_f19.html Day 6: XML formats, cont'd

day06_f19.html We can parse this map with the lxml module, as follows. In this module, SVG element attributes are accessed

day07_f19.html from lxml import etree

day07_f19.html - XML schemas

day07_f19.html - lxml

day07_f19.html Day 6: XML formats, cont'd

day07_f19.html We can parse this map with the lxml module, as follows. In this module, SVG element attributes are accessed

day08_f19.html xmlns ="http://www.w3.org/2001/XMLSchema"

day08_f19.html xmlns="http://blue.math.buffalo.edu/463/mycourses"

day08_f19.html <!-- "http://www.w3.org/2001/XMLSchema" is a magic phrase, like "Open, Sesame".

day08_f19.html <!-- THIS IS ALMOST MY FIRST XML SCHEMA -->

day08_f19.html <!-- THIS IS MY HAND-WRITTEN XML DOCUMENT CONFORMING TO A SCHEMA -->

day08_f19.html <?xml version="1.0"?>

day08_f19.html XMLSchemaChildrenValidationError: failed validating <Element '{http://blue.math.buffalo.edu/463/mycourses}course' at 0x7fc9687ab3b8> with XsdGroup(model='all', occurs=[1, 1]):

day08_f19.html XMLSchemaDecodeError: failed validating '201909!' with XsdAtomicBuiltin(name='xs:integer'):

day08_f19.html import xmlschema

day08_f19.html xmlschema.validate('mydoc.xml','myschema.xsd')

day08_f19.html Exercise, part 1: (results will be collected). Type up an XML document of your own invention from scratch.

day08_f19.html Semantic validation: can be done with the free-standing program xmllint from libxml2-utils, and within Python using lxmlschema among other modules.

day08_f19.html Syntactic validation (i.e. testing that the XML is well-formed), can be done with firefox, some text editors,

day08_f19.html - XML schemas

day08_f19.html - lxml

day08_f19.html .. code:: xml

day08_f19.html Day 8: XML schema, validation, regular expressions

day08_f19.html I will be asking you to turn in your xml and xsd documents.

day08_f19.html Imposing semantic rules with XML Schema

day08_f19.html My schema document: myxmls.xsd

day08_f19.html My simple example XML document: mydoc.xml

day08_f19.html Note that with both of the changes above, we still had well-formed XML.

day08_f19.html Note: you can even embed "regular expressions" in an XML schema: myxmlre.xml, myxmlre.xsd

day08_f19.html Now we validate the xml data document against the xsd schema document:

day08_f19.html References: my examples above, and w3schools

day08_f19.html Validation of an XML document against a schema

day08_f19.html Write a schema for your own XML document.

day08_f19.html XML Schema

day08_f19.html xmlschema.validate will tell us.

day09_f19.html The main current competitor of XML for self-describing human-readable data formats.

xml schema

day06_f19.html - XML schemas

day07_f19.html - XML schemas

day08_f19.html <!-- THIS IS ALMOST MY FIRST XML SCHEMA -->

day08_f19.html - XML schemas

day08_f19.html Day 8: XML schema, validation, regular expressions

day08_f19.html Imposing semantic rules with XML Schema

day08_f19.html Note: you can even embed "regular expressions" in an XML schema: myxmlre.xml, myxmlre.xsd

day08_f19.html XML Schema

Docutils System Messages

System Message: ERROR/3 (index_of_daily_outlines.rst, line 1074); backlink

Undefined substitution referenced: "Code".

System Message: ERROR/3 (index_of_daily_outlines.rst, line 1076); backlink

Undefined substitution referenced: "Presentation".

System Message: ERROR/3 (index_of_daily_outlines.rst, line 1154); backlink

Undefined substitution referenced: "Content".