MTH 448/563 Data-Oriented Computing

Spring 2017

Day 2

Exercise 1 recap

Finally, we can wrap everything up in a function that can retrieve the price of any product:

import requests
def getprice(pid):
         ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
         url = 'https://www.amazon.com/dp/'+pid
         s = requests.get(url, headers={'User-Agent':ua})
         pattern = '<span id="priceblock_ourprice" class="a-size-medium a-color-price">$'
         price = float( s.text.split(pattern)[-1].split('</span>')[0] )
         return price

getprice('B0027YPQEC')
29.04

Exercise 2: More play with text

Download this list of English words: http://blue.math.buffalo.edu/448/words.txt

Exercise 2a: Sort words by right-to-left alphabetical order

Sorting

sorted(), sorted(,reverse=True), sorted(,key=)

string replace

functions, def and lambdas

Exercise 2b: List all the palindromes

Exercise 2c: List all the reversible words

Note that set-membership can be tested much faster than list membership.

Exercise 3

Analyze the first Presidential debate between Hilary Clinton and Donald Trump

Some questions to answer:

  • how much did each of them speak?
  • how big is the vocabulary of each?
  • which words did each use most frequently?

Might want to remove punctuation first.

Report 1 data set: History of first names in the US

This will be the subject of your Report 1 Consider the National data set from this US Social Security Administration page

https://www.ssa.gov/oact/babynames/limits.html

about names given to babies in the US from 1890 through 2015.

Download the national data.

Homework: Report 1 will be entail extracting something interesting from this data.

Reminder: Semester-long data-collection project

Think about what you'd like to do, and let me know next week.