A Context for Classification

Posted by David Richards Sun, 08 Nov 2009 17:48:00 GMT

When I discovered systems and system archetypes and dynamic models and these kinds of things, I fell in love. I moved to Portland, I ate the stuff up. Life happened since then, and I'm addressing other issues, but I still have this love affair with seeing the integration of elements of a system. When the purpose of a system manifest itself from the whole of its partsm it's like a ballet to me.

An example from my boyhood is the systems that produce amusement parks. Once I could begin to see the forces at play that combined physics, engineering, economics, and the pursuit of pleasure to create an amusement park, that idea was immensely more pleasurable than the rides themselves. As a boy of around twelve years old, the moment of riding a roller coaster was just an interruption from the joy of realizing the systems all around me.

A friend saw some of this the other day. He walked into my office and wanted to talk about how he took some of my ideas from my "UTOSC talk":http://blog.tegugears.com/2009/10/08/utosc-resources to cluster the voters in the Utah Republican Party. He could start to see that there were forces at play behind the votes. These voters had purposes of their own, but as a whole, the system began to manifest a purpose and direction for itself. The structure of the elements of the Utah Republican Party guide its behavior.

I could be wrong, but I think that's why we get excited about classification. Simple linear regression can often draw a line between two classes in fairly useful ways. Neural networks, support vector machines, Gaussian processes, decision trees, KD Trees, all these wonderful inventions begin to tease out the players of a system. We can't always see these things from just the data, and a priori information needs to be asserted, but we live in a world where the common man can work on these things.

"Cherkassky and Mulier":http://www.amazon.com/Learning-Data-Concepts-Theory-Methods/dp/0471681822/ref=sr11?ie=UTF8&s=books&qid=1257704359&sr=8-1 are more exact when they explain that

bq. Learning is the process of estimating an unknown (input, output) dependency or structure of a System using a limited number of observations.

In other words, from observation, we learn how a system might turn inputs into outputs. The knowledge of the properties of steel and the forces of nature (physical inputs) can guide the creation of roller coasters and machines that flip us around in death-defying ways and give us a thrill of a lifetime (system outputs). For the price of a small ticket (our economic inputs), we can share in the accumulation of thousands of hours and millions of dollars to share in those thrills (system outputs).

What's more important, these systems can be generalized, to a point. Disney can create the happiest place on earth in Orlando and Anaheim, yet fall quite flat in Paris. The Harvard business case suggests they didn't react to the observations available to them on that project.

"Cherkassky and Mulier":http://www.amazon.com/Learning-Data-Concepts-Theory-Methods/dp/0471681822/ref=sr11?ie=UTF8&s=books&qid=1257704359&sr=8-1 go further to explain

bq. Under [the] statistical model estimation framework, the goal of learning is accurate identification of the unknown system, whereas under predictive learning the goal is accurate imitation of a system's output.

Those are the first steps in a difficult and rewarding journey through the world of data analysis, as guided by "Cherkassky and Mulier":http://www.amazon.com/Learning-Data-Concepts-Theory-Methods/dp/0471681822/ref=sr11?ie=UTF8&s=books&qid=1257704359&sr=8-1. They pick apart and give us a great context for classification methods. Above, they show us that the statistical model estimation framework wants to point out means and distributions and skew and kurtosis. The machine learning world is more interested in simply knowing what predictive power the observations might have. I.e., it's enough to know what a system does, rather than all about how it does it.

I introduce this book because you may want to actually get somewhere with your work.

Another way to get somewhere is to put classification in the context of the "Laws of Simplicity":http://lawsofsimplicity.com/, a framework from "John Maeda":http://www.maedastudio.com/index.php. The models I describe above reduce a complex systems to a few inputs and outputs. This is the first law of simplicity. We know that the reduced model isn't accurate, but it's more useful than a complete model. It suggests trends and decisions and distinctions, where a complete model looks complex and chaotic and undetermined.

The way things should be reduced, says Maeda, is by SHE:

  • Shrink
  • Hide
  • Embody

When classifying or learning a system, we shrink its parameters. We use "Principal Component Analysis":http://en.wikipedia.org/wiki/Principalcomponentanalysis, "Reconstructability Analysis":http://www.sysc.pdx.edu/download/papers/ldlpitfabstract.htm or other "parsimony methods":http://hunch.net/~jl/projects/reductions/reductions.html to make the problem tractable. A business person or research assistant can't often use a model that takes the coordination and harmony of 92 input variables, but they can work with one with three. If a three-parameter model is still useful, then it should be preferred over more complex models. The methodologies mentioned above propose ways of deciding how much to shrink a model.

Classification can also do a good job of hiding some of the complexity. Consider the structure of a neural network. There are understandable inputs, and desired outputs, and one or more hidden layers in between. Support Vector Machines create a mapping between n-dimensional data and a its model by looking only at the observations near the division between classifications.

Hiding the complexity doesn't mean we don't know that it's there. It means we don't need to see the complexity to accept the Gestalt of a system. From a "wikipedia article":http://en.wikipedia.org/wiki/Gestalt_psychology:

bq. Gestalt...is a theory of mind and brain positing that the operational principle of the brain is holistic, parallel, and analog, with self-organizing tendencies, or that the whole is different from the sum of its parts. The Gestalt effect refers to the form-forming capability of our senses, particularly with respect to the visual recognition of figures and whole forms instead of just a collection of simple lines and curves.

Whether this model of psychology is accurate is beyond debate: it isn't. There is, of course, more going on. It's a model, and is subject to the same constraints any model I'd create would have. But it embodies the purpose of a system. It describes how the complex inputs get transformed into figures and forms in our minds.

Our systems should also embody the main purpose of a system. The purpose that is manifest from observations and results, not our desired purpose for a system. It is this kind of embodiment that shows us that our economic systems are machines for growth and our health care systems are to pit the wills of the strong against the weak.

So, shrinking, hiding, and embodying is a context for reducing a model. It is a context for working on classification systems. Learning to classify systems is a walkable path, one that I'm walking right now, one that I've been gladly walking for quite a while.

Comments

Leave a comment

Comments