MTH 448/563 Data-Oriented Computing

Fall 2018

Day 14, Thursday, Oct 11

Pairs for today (same as Tuesday)

Andy
Bao
 
Charles
Chase
 
Chinmayee
Deepisha
 
Dipen
Donavon
 
Dylan
Edwin
 
Eric
Jaiden
 
Jay
Jingye
 
Myonggin
Nicholas
 
Nitin
Palkit
 
Parth
Qinxin
 
Rahul
Raymond
 
Suyash
Vivek
 
William
Yang
 
Zachary
Zhiyi

Multi-table databases, joining

Exerise: Contribute to a database of movie reviews.

First add yourself to the reviewer table. Note that your reviewer ID must be different from everyone else's.

Then review some movies. If any of the movies aren't already in the movie table, add them to it. Note that no two movies may have the same ID.

Then enter your ratings in the rating table.

Exercise: Download the three tables to your computer using this code:

sheets = {
'448f18_reviewer.xlsx':'1qFVivZwo43u5xfLss5u0T_Xs3agUZl5jR1Gkmnaov-Y',
'448f18_movie.xlsx'   :'1MnXZfPCX9CHAJIP0Y5VgnF97gXg1f7m-3HHaRZWkhTM',
'448f18_rating.xlsx'  :'1vs4VUHHJ0UxAwCXl1FzBMblVzZKK6tVt2Ym0HI0DOQI'}

for sheet in sheets:
         url = 'https://docs.google.com/spreadsheets/d/' + sheets[sheet] + '/export?format=xlsx'
         r = requests.get(url)
         with open(sheet,'wb') as f:
             f.write(r.content)
         print('Downloaded '+sheet)

Check that they have arrived!

Exercise: Load the 3 movie database tables into pandas dataframes, and merge them into one dataframe

revrat = reviewer.merge( rating, on='reviewer_id' )

Discussion: Why are databases factored into multiple tables like this?

Exercise: In groups, write some questions about the movie review database that you'd like to know the answer to - on the share board except change the 8 to a 7.

merge: inner, left (outer), right (outer), and (full) outer joins

or how I messed up the first semester I was Undergrad Director.

Exercise: What is are the differences among the outputs of these variants of merge?

import pandas
biodata = pandas.DataFrame( [[900,'Jingye','Tan',21],[411,'John','Ringland','40+']], columns=['employeeid','first','last','age'])
biodata
email = pandas.DataFrame( [[900,'kira@buffalo.edu'],[900,'jt@gmail.com']], columns=['personnumber','email'])
email

movie review database queries

Let's try to answer some of the questions you've posed.

Relational databases and SQL

What a relation (a.k.a. table) is.

A relation is a set (strictly a 'multiset' or 'bag') of tuples with elements drawn respectively from a list of specified sets. For example, if we have sets R, S, T, then any collection of tuples {(ri, si, ti)}, where each ri ∈ R, si ∈ S, and ti ∈ T, is a relation on those sets.

Concrete example: two sets

name = {'Andy', 'Chinmayee', 'Chase', 'Donavon', 'Myonggin', 'Vivek' }

age =  ≥ 0

Relation [('Andy', 21), ('Chinmayee', 20), ('Donavon',20), ('Vivek', 22)]

It's convenient to display these in a table, and in fact in this context, a relation is often called a "table".

import pandas
df = pandas.DataFrame([('Andy', 21), ('Chinmayee', 20), ('Donavon',20), ('Vivek', 22)],columns=['name','age'])
df
        name            age
0       Andy            21
1       Chinmayee       20
2       Donavon         20
3       Vivek           22

Relational DataBase Management Systems (RDBMSs)

SQL: a "backronym" for "Simple Query Language" or "Structured Query Language" (the latter being actually a whopping misrepresentation).

sqlite3 and some basic SQL commands

sqlite3 metacommands start with a period. Google to find a list of all of them.

SQL commands end with a semicolon (;).

$ sqlite3 tempf18.sqlite
-- Loading resources from /home/ringland/.sqliterc

SQLite version 3.11.0 2016-02-15 17:29:24
Enter ".help" for usage hints.
sqlite> .tables
sqlite> .quit
$

Some SQLITE metacommands:

sqlite> .tables
sqlite> .mode column
sqlite> .headers on
sqlite> .width
sqlite> .mode csv
sqlite> .import
sqlite> .separator

sqlite> .read

Exercise: Let's re-download the movie review tables as csv. (Just change the destination file extensions to csv, and the export format to csv:

sheets = {
'448f18_reviewer.csv':'1qFVivZwo43u5xfLss5u0T_Xs3agUZl5jR1Gkmnaov-Y',
'448f18_movie.csv'   :'1MnXZfPCX9CHAJIP0Y5VgnF97gXg1f7m-3HHaRZWkhTM',
'448f18_rating.csv'  :'1vs4VUHHJ0UxAwCXl1FzBMblVzZKK6tVt2Ym0HI0DOQI'}

for sheet in sheets:
         url = 'https://docs.google.com/spreadsheets/d/' + sheets[sheet] + '/export?format=csv'
         r = requests.get(url)
         with open(sheet,'wb') as f:
             f.write(r.content)
         print('Downloaded '+sheet)

Check that they have arrived!

Creating a database by importing csv files

Exercise: Import csv files into a sqlite database.

$ sqlite3 mymoviedb.sqlite

sqlite> .mode csv
sqlite> .import 448f18_movie.csv movie

Any changes are immediately written to the (binary) .sqlite database file.

Some SQL commands (they all end with a semicolon):

sqlite> .mode column
sqlite> .headers on
sqlite> select * from movie;
sqlite> .width 10 10 30 30
sqlite> select * from movie;
sqlite> delete from movie where year='';

Ctrl-d or .quit to exit from sqlite3.

Dumping a database as SQL text

sqlite> .output mymoviedb.sql
sqlite> .dump

Inspect the sql file to see how a table is created from scratch with SQL.

Some more SQL commands:

sqlite> **create table** ... ;
sqlite> **insert** into ... ;
sqlite> **select** * from ... ;

Ctrl-d or .quit to exit from sqlite3.