Spring 2017

# Day 1

## Data and society

Data collection explosion is transforming society.

A tension ...

vs. useful tools that make life better: east coast storm this past weekend, visualized by Google Maps Traffic:

Data in the news almost every day.

Science

CERN LHC

VLA

## Classroom setup

Every day at the beginning of class we will quietly and quickly arrange the tables and chairs like this. At the end of class we will restore them to their original state.

## Getting to know each other

Let's introduce ourselves.

## Pure Python data wrangling

### Exercise 1: Split-and-select

Extract price from an Amazon.com product page

First with browser-downloaded page

Then with requests

We want to eliminate the browser step ...

import requests
url = 'https://www.amazon.com/Union-61100-Outdoor-Garden-Statue/dp/B0027YPQEC'
s = requests.get(url)
'29.04' in s.text

True


Oh, it actually worked. Sometimes you will find Amazon refuses to serve the page to a script (robot). In that case we will need to fake our User Agent.

Now we can "spoof" the user agent:

Finally, we can wrap everything up in a function that can retrieve the price of any product:

import requests
def getprice(pid):
ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
url = 'https://www.amazon.com/dp/'+pid
s = requests.get(url, headers={'User-Agent':ua})
pattern = '<span id="priceblock_ourprice" class="a-size-medium a-color-price">\$'
price = float( s.text.split(pattern)[-1].split('</span>')[0] )
return price

getprice('B0027YPQEC')

29.04


### Exercise 2: More play with text

Download this list of English words: http://blue.math.buffalo.edu/448/words.txt

Exercise 2a: Sort words by right-to-left alphabetical order

Hints:

w = 'drawer'
w[::-1]

'reward'


Note that set-membership can be tested much faster than list membership.

Exercise 2b: List all the palindromes

Exercise 2c: List all the reversible words

Useful Python features:

• list element access and slicing with stride
• string replace
• list sort
• functions, def and lambdas

/