MTH 448/563 Data-Oriented Computing

Fall 2019

Day 6: XML formats, cont'd

Upcoming topics

  • UB class schedule analysis
  • writing SVG from scratch (random bubbles)
  • hex color specs and color maps
  • modifying existing SVG with lxml
  • choropleth maps
  • XML schemas

UB class schedule analysis

At the end of class last time, I asked: you last time to prepare: Please complete this exercise: we want a list(?) of objects that contain the room, meeting days, meeting times, and link to seating info, for all classes.

Quiz #1: Upload to UBlearns a plain text file with your code that does this. (Just copy and paste your code into a plain text file skej.txt.)

Exercise: Let's develop and compute a measure of fullness of the classes offered by a Department. Here's a starting point:

import requests
import bs4
url = ''
s = requests.get(url).text
b = bs4.BeautifulSoup(s,'lxml')
tables =  b.find_all('table')
mytable = tables[5]  # found that we want #5 by trial and error
d = {}
for row in mytable:
        if # exclude null rows that exist for some reason
            tds = row.find_all('td')
            if len(tds)==11: # by experimentation found that true data rows have 11 rtd elements

Drawing and coloring with SVG

Drawing from scratch

Recall the SVG circle from last time:

   width="100" height="100">
  <circle cx="50" cy="50" r="40" stroke="gray" stroke-width="8" fill="#77cc77" />

We can use Python to draw directly be generating SVG.

Exercise: Make a picture with a large collection of "bubbles" of random size, position, and color.

How to make a random color in hex format?

Re-coloring an existing drawing

Here is a map of the United States with outlines for every county: USA_Counties_with_FIPS_and_names.svg

Exercise: Use a text editor to color the county where you were born with some bright color of your choice, by adding a "fill" attribute to the appropriate path element, and check it in your browser. If you were not born in the USA, consider yourself an honorary Erie County native.

We can parse this map with the lxml module, as follows. In this module, SVG element attributes are accessed as a dict element.attrib.

from lxml import etree
import numpy as np
with open('USA_Counties_with_FIPS_and_names.svg') as f:
            map = etree.fromstring('utf-8'))
item = map[0] # second g is State lines and separators
for i,path in enumerate(item):
            if i>5: break

We can write it back out after making modifications, like this:

with open('modifiedmap.svg','w') as f:

Exercise: Write code to randomly color the counties of this map.

Choropleth maps

From Wikipedia: A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.


Color maps

A function from an interval such as [0,1] to 3D color space. Let's make some.