Day 05

Tuesday, Feb 14, 2017

First names exploration report

Try to include something that is about culture or the population as a whole.

Upload your report to UBlearns as a single Jupyter Notebook (.ipynb) file.

Embedding images in a Jupyter notebook

jupyter_how_to_embed_image.png

Regular expressions, cont'd

There will be a (conventional paper-and-pencil, closed-book) quiz on this next Thursday.

Here's the one I gave last year, and my solutions.

Regex cheat sheet http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Top 500 list

Why should you be familiar with Linux? Top 500 supercomputers as of Nov 2016

Basic bash shell commands

pwd, cd

ls

man

mkdir

cp, mv, rm

Caution! Not undoable!!

grep

grep is a line-based filter for plain text files. Basic syntax is

grep somestring myfile.txt

Useful options: -i, -n, -v.

Executing bash shell commands from Python

import os
os.system( your_command_as_a_string )

If you want to capture the output with your code, you'll need a more complicated method: Google "python subprocess.Popen".

Regex Exercises

Find the names of all the PDF documents in this web page: Celebration of Student Academic Excellence 2016. Optional: Download (Linux: wget, Mac: curl) all the PDFs and make jpg images of each. (imagemagick convert)

Find all the high and low temperature predictions in this page.

Long-term data-collection project

How to deal with pages with Javascript-generated content, like this?

Selenium.