MTH 448/563 Data-Oriented Computing

Fall 2019

Day 6: XML formats, cont'd

Todays topics

  • bubble check
  • formatting
  • carriage return character
  • SVG in notebook
  • choropleth maps
  • intensive vs. extensive properties
  • US county FIPS codes
  • lxml
  • color maps

Probably not till next week

  • XML schemas
  • regular expressions

Drawing and coloring with SVG, cont'd

Drawing from scratch

Bubble check: Show your picture of a large collection of "bubbles" of random size, position, and color.

Re-coloring an existing drawing

Here is a map of the United States with outlines for every county: USA_Counties_with_FIPS_and_names.svg

Exercise: Use a text editor to color the county where you were born with some bright color of your choice, by adding a "fill" attribute to the appropriate path element, and check it in your browser. If you were not born in the USA, consider yourself an honorary Erie County native.

We can parse this map with the lxml module, as follows. In this module, SVG element attributes are accessed as a dict element.attrib.

from lxml import etree
import numpy as np
with open('USA_Counties_with_FIPS_and_names.svg') as f:
            map = etree.fromstring(f.read().encode('utf-8'))
item = map[0] # second g is State lines and separators
print(item.tag,len(item))
for i,path in enumerate(item):
                print(path.attrib.keys())
            if i>5: break

We can write it back out after making modifications, like this:

with open('modifiedmap.svg','w') as f:
            f.write(etree.tostring(map).decode('utf-8'))

Exercise: Write code to color the counties of this map randomly.

Choropleth maps

From Wikipedia: A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.

african_american_percentage_in_us.png

Animated choropleth on child mortality in yesterday's New York Times.

Intensive vs. extensive properties

With very rare exceptions, choropleth maps should be used only for "intensive" properties. Intensive properties are things like proportions, densities, rates. They are unchanged upon merging of regions with the same value. In contrast, "extensive" properties are absolute amounts that are additive upon merging of regions.

Quiz "intensive": give me one example of an intensive property.

Quiz "extensive": give me one example of an extensive property.

For example, a choropleth is not appropriate for tons of CO2 emitted or for total sales, etc.

BAD:

choropleth_extensive.png

For future reference, if you do want to represent an extensive quantity, possible options (for another report) are:

disks (or other objects) whose area is proportional to the represented quantity:

GOOD:

hisp_pies.gif

or a "cartogram" in which areas are distorted so as to be proportional to the represented quantity:

GOOD:

cartogram_PaullHennig2016WorldMap.OAha.CC-BY-4.0.jpg

(source)

Color maps

A function from an interval such as [0,1] to 3D color space. Let's make some.

Simplest and very commonly used kind is a linear (well, affine) map: f : [0, 1] → [0, 1]3, defined by f(h) = (1 − h)c0 + hc1.

Discrete color maps (i.e. with a small finite set of colors) are also widely used.

2D color maps can be contemplated, but their comprehensibility is debatable.

Report 2: Choropleth map of US

Due 8:00am Saturday, Sep 28.

Find some interesting data for every county in the US

Search online for some data given for every county in the nation.

You may want to consult this page about the FIPS codes for counties. Fall-back suggestion if you don't find anything else you like: annual unemployment data from the US Bureau of Labor Statistics.

Note: You don't have to use a linear color map: consider if a logarithmic one would be helpful.

Color the counties on the map according to your data

This may require some thought, some research, and some work. Although you are welcome to consider elaborate color maps (even 2D maps representing two quantities), you should beware of creating graphics that are strikingly colorful but difficult for the viewer to interpret. Simple things like a white-to-red gradient can be very effective: the picture above shows the fraction of the population that is of African descent (generated by a student in a previous run of this course).