Day 13, Tuesday, Mar 13, 2018

Fitting a linear model to data, cont'd

Pointwise errors ei = αxi + β − yi

Some freedom in defining overall error E in terms of pointwise errors {ei}

We found a formula the line that minimizes the choice E = ni = 1e2i

argmin(α, β) ∈ ℝ2 E

How to find best (α, β) for other choices?

We have a formula

only for E2

Brute force?

Random choices of (α, β)

More deliberate random choice?

Try all pairs of data points for E1

My conjecture: a line containing 2 of the data points is in argmin E when we choose E = ni = 1|ei|. (If true, this narrows down the possibilities to a finite set.)

E1 is piecewise-linear in alpha,beta:


and discontinuities in slope occur where the line contains a data point. Minimizer must include a point at which the piecewise linear graph has a vertex. There are only n(n-1)/2 of those.

Walk (or stagger) downhill


Note: function evaluations are often expensive

Useful metaphor: you are trying to find the deepest point in a muddy lake, using only a boat and a plumb line.


Another one: you are trying to find the oven parameters that will produce the perfect cake.


After class, I added some lines to the plot to show the sequence of trials more fully:

stagger_down.png staggerdown2.png

Project 3

A comparative study of several definitions of badness of fit of a linear function to data {(xi, yi) ∈ ℝ2 : i ∈ {1, 2, ..., n}}

Looking for best-fit line according to several definitions of error for a variety of data sets (large/small, regular/irregular), and your own justified personal value judgments based on the examples you present.

It seems likely that the various definitions of overall error will disagree most in their treatment of outliers. So some of your test cases should have data that is quite linear except for a small number of outliers, like this:


Global positioning satellites (GPS)

Another minimization problem


Spreadsheet for data entry except change the S to an R.