Day 11

Tuesday, Mar 7, 2017

Contents

Report 2: Choropleth map of US - checkup

Find some interesting data for every county in the US

Search online for some data given for every county in the nation.

You may want to consult this page about the FIPS codes for counties. Fall-back suggestion if you don't find anything else good: annual unemployment data from the US Bureau of Labor Statistics.

Color the counties on the map according to your data

This may require some thought, some research, and some work. Although you are welcome to consider elaborate color maps (even 2D maps representing two quantities), you should beware of creating graphics that are strikingly colorful but difficult for the viewer to interpret. Simple things like a white-to-red gradient can be very effective: the picture below shows the fraction of the population that is of African descent (generated by a student in this class in 2015).

african_american_percentage_in_us.png

Quiz: upload one choropleth map (a single svg or png image) you have made for your Report 2.

XML Schema, cont'd

We validate the xml data document against the xsd schema document:

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml validates

XML Schema exercise

Write a schema for your own XML document from last week. Make sure your document "validates" against your schema. Then "break" your documents in several small ways to see how the validator responds. I will be asking you to turn in your xml and xsd documents.

References: my examples above, and w3schools

Note: you can even embed regular expressions in an XML schema: myxmlre.xml, myxmlre.xsd

Assignment due 11:59pm Sunday, March 12. All non-compound elements must be typed. Use a regular expression to restrict at least one element. Use both "required" and "maxOccurs" attributes where logic dictates and at least once.

Relational databases and SQL

What a relation (a.k.a. table) is.

A relation is a set (strictly a 'multiset' or 'bag') of tuples with elements drawn respectively from a list of specified sets. For example, if we have sets R, S, T, then any collection of tuples {(ri, si, ti)}, where each ri ∈ R, si ∈ S, and ti ∈ T, is a relation on those sets.

Concrete example: two sets

name = {'Adhish', 'Anna', 'Daniel', 'Edward', 'Maggie', 'Sakar' }

age =  ≥ 0

Relation [('Adhish', 21), ('Anna', 20), ('Sakar',20), ('Maggie', 19)]

It's convenient to display these in a table, and in fact in this context, a relation is often called a "table".

import pandas
df = pandas.DataFrame([('Adhish', 21), ('Anna', 20), ('Sakar',20), ('Maggie', 19)],columns=['name','age'])
df
        name    age
0       Adhish  21
1       Anna    20
2       Sakar   20
3       Maggie  19

Relational DataBase Management Systems (RDBMSs)

SQL: a "backronym" for "Simple Query Language" or "Structured Query Language" (the latter being actually a whopping misrepresentation).

sqlite3 and some basic SQL commands

sqlite3 metacommands start with a period. Google to find a list of all of them.

SQL commands end with a semicolon (;).

463$ sqlite3 test.db
SQLite version 3.8.11.1 2015-07-29 20:00:57
Enter ".help" for usage hints.
sqlite>

Some SQLITE metacommands:

sqlite> .tables
sqlite> .mode column
sqlite> .headers on
sqlite> .read
sqlite> .import
sqlite> .mode csv
sqlite> .separator

Some SQL commands (they all end with a semicolon):

sqlite> create table ... ;
sqlite> insert into ... ;
sqlite> select * from ... ;

Ctrl-d or .quit to exit from sqlite3.

Let's review some movies

Go to here: https://docs.google.com/document/d/1C8ZcUbhZgO7Mx7NJBisUAD80qJhAuc99OLfkyqAXn8Y/edit?usp=sharing except change the first Z to an X.

Next: invent some questions you'd like to find out from the movie database.

Some more databases to learn SQL on

Tiny

Tiny

This has two tables - questions asked, and people who asked them. Download it and view it in your text editor. Then in sqlite3, execute the commands in the text file by typing

sqlite> .read tinydb.sql

Small

Our movie ratings

Bigger

2016 presidential election results

Download these csv files: 2012 presidential election results, county codes and names. After creating empty table with appropriate columns:

sqlite> .import foo.csv mytable

May need metacommand ".mode csv" and/or ".separator ,"

Next time ....

Learn the function of basic SQL commands on tinydb

Then try them out on movies and election databases.

Figure out how to use all of the following keywords and symbols:

select
from
limit
where
= < >
order by
desc
like '%foo%'
as
+-/*
count
max
min
in

Joins

"Joins" are more-or-less subsets of the cartesian product of tables.