Day 10

Thursday, Mar 2, 2017


Turning off unwanted features of Jupyter notebook

On Linux (Ubuntu 16.04), I created two files in ~/.jupyter/custom/ as follows to turn off prompts and automatic brackets and quotes. (~ means my home folder) These customizations appear to apply both to my native installation and to anaconda.

~/.jupyter/custom$ ls
custom.css  custom.js

~/.jupyter/custom$ cat custom.js
IPython.CodeCell.options_default.cm_config.autoCloseBrackets = false;

~/.jupyter/custom$ cat custom.css
div.prompt { display: none; }


SVG Exercise 1

Generate an SVG file that represents a collection of randomly sized, placed and colored circles. Use simple.svg as a prototype.

Choropleth maps

From Wikipedia: A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.

SVG Exercise 3: Parse an SVG file

In Python, do a little parsing of the SVG file: see if you can read (and re-write) the attributes of the elements of the US Counties map. For example, how about coloring each county at random?

Report 2: Choropleth map of US

Find some interesting data for every county in the US

Search online for some data given for every county in the nation.

You may want to consult this page about the FIPS codes for counties. Fall-back suggestion if you don't find anything else good: annual unemployment data from the US Bureau of Labor Statistics.

Color the counties on the map according to your data

This may require some thought, some research, and some work. Although you are welcome to consider elaborate color maps (even 2D maps representing two quantities), you should beware of creating graphics that are strikingly colorful but difficult for the viewer to interpret. Simple things like a white-to-red gradient can be very effective: the picture below shows the fraction of the population that is of African descent (generated by a student in this class in 2015).


XML, cont'd

Another XML use case

ArcGIS Public Garden Data Model

This data model for use by managers of public parks, and used specifically by Harvard's Arnold Arboretum is specified in XML: ArcGIS Public Garden Data Model


XML Schemas and validation of XML against them

My simple example XML document: myxmls.xml

My data document:


      xsi:schemaLocation=" myxmls.xsd"
      version="0.0" >

              <number>MTH 463</number>
              <name>Data-Oriented Computing</name>

              <number>MTH 463</number>
              <name>Data-Oriented Computing</name>

              <number>MTH 649</number>
              <name>Partial Differential Equations</name>


Syntactic validation (i.e. testing that the XML is well-formed), can be done with firefox, some text editors, as well as xmllint.

Semantic validation: xmllint from libxml2-utils

Imposing semantic rules with XML Schema: myxmls.xsd

My schema document:

<?xml version="1.0"?>

<!--  "" is a magic phrase, like "Open, Sesame".
              No connection is being made to that website. Other phrases that work are ...?

              is grammatically similar to a Python statement like
              import foo as xs

              The mandated use of a URL here as the namespace name is intended to ensure
              that namespace names are unique.

              "qualified" enforces format of the XML instance documents,
              requiring them to use qualified names for items in this namespace.


      xmlns:xs           =""
      targetNamespace    =""
      elementFormDefault ="qualified">

      <xs:element name="mycourses">
                              <xs:element name="course" maxOccurs="unbounded">
                                                      <xs:element name="number"   type="xs:string" />
                                                      <xs:element name="name"     type="xs:string" />
                                                      <xs:element name="semester" type="xs:integer" />
                      <xs:attribute name="version" type="xs:decimal" use="required" />


Now we validate the xml data document against the xsd schema document:

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml validates

If we modify the data so that it no longer conforms to the schema, xmllint will tell us.

  <number>MTH 463</number>
  <name>Data-Oriented Computing</name>

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml:12: element semester: Schemas validity error :
Element '{}semester':
'201601a' is not a valid value of the atomic type 'xs:integer'.
myxmls.xml fails to validate


  <nombre>MTH 463</nombre>
  <name>Data-Oriented Computing</name>

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml:10: element nombre: Schemas validity error :
Element '{}nombre':
This element is not expected.
Expected is one of (
{}semester ).
myxmls.xml fails to validate

Note that with both of the changes above, we still had well-formed XML.

XML Schema exercise

Write a schema for your own XML document from last week. Make sure your document "validates" against your schema. Then "break" your documents in several small ways to see how the validator responds. I will be asking you to turn in your xml and xsd documents.

References: my examples above, and w3schools

Note: you can even embed regular expressions in an XML schema: myxmlre.xml, myxmlre.xsd