Archives

Visual pattern recognition

Today finds me in Frankfurt working at a customer site. What with jetlag and late nights it leaves me less time to work on blogs. But it’s Saturday and I have some time, so here goes. Today’s blog is a brief survey of interesting visualization techniques for complex data. I have a lot of experience in this area dating back to my graduate studies in which part of my work was to reconstruct the tracks of subatomic particles through detectors. In my experiment – E802, at the Brookhaven National Laboratory – the accelerator took silicon atoms, stripped the electrons off them and accelerated them up to ultra-relativistic speeds. The accelerator would then spray the particles onto a thin strip of metal and the resulting collisions would spray particles like pions, kaons, protons and deuterons (and so on, there were many types created!) into the detector apparatus. This is essentially what the experiment setup looked like:

The collision would spray particles into the detector apparatuses T1, T2, T3 and T4. These detectors consisted of wires strung from side to side of a large plastic frame. When the particle passed through, it would ionize the pressurized gas, and the ions would drift to the nearest wire and get detected as a pulse in the electronics. One of the trickiest aspects of such an experiment is to know when exactly an interesting collision occurred and to make sure that you read out the data at precisely the right time. Since collisions were happening all the time you have only a very small window during which you can read out the data. I’m leaving out many details here. Nonetheless for my purpose today I’ll demonstrate how the pattern recognition algorithm worked. The magnet between T2 and T3 bent the charged particles – positive one direction, negative the other. The amount of bend determined the momentum of the particle – if you had two particles of the same type, the faster one would bend less than the slower one. This is what a simple pattern might look like:

The purpose of my algorithm was to connect the dots, combine the information with other information available in other detectors after T4, and determine what type of particle it was and how fast it was going. Thing is, the detectors were not perfect. Sometimes (as in T2 here) they would not detect any ionization. Sometimes they detected other particles that were flying around the room, and you had no idea how to connect those signals up from one detector to the next.

Sometimes certain wires went haywire and fired all the time. And sometimes you got enormous “splat” events that looked like this:

In such an environment it was extremely important to review the algorithm by scanning through events by hand and trying to see if you could reconstruct any more patterns that the algorithm did not find, or reject any paths that the algorithm had reconstructed erroneously. Ultimately what we used for the data analysis were the reconstructed tracks only – we removed all the extraneous data and dealt with only the good information.

This is not necessarily a good recipe for much of the complex data we receive every day. There is far too much ambiguity for any sort of program to know exactly what it is we want to know. Search engine results are a good example of this. They try really hard, but there is ambiguity in the way we express what we are searching for, in the possible interpretations of the results, and in the available data – that is, the web pages themselves are ambiguous. Another example of this is one I touched on in an earlier blog, where I proposed that programs that monitor real time data use sound feedback to help grab our attention when something has changed in the data that is coming in.

I came across an interesting iPhone application today that helps people visualize complex information. Initially the application was intended for medical purposes but I could see it being used in any number of situations where you are dealing with numeric data or data that can be converted somehow to numeric. Check out the photo gallery – gory, but cool. Not sure compelled the developers to deploy the app to the iPhone instead of to the internet generally but I suppose if you can represent complex data in a very small visual display then you can do it for a larger display. It may even be that forcing it into a very small window helps you focus on removing as much extraneous information as possible.

The final category of visual displayer that I’d like to talk about today is a class of programs that help you visualize what’s on your hard disk. For instance, DiskView is a simple classifier that uses the extension information (plus other information) in your file name to classify the information. What’s especially important is that it not only tells you the information in a highly readable format, it also lets you navigate quickly to the information in which you are interested. Check out the flash demo. These folks really understand how to convey complex information – in this case a demonstration of how you would use a relatively complex product – to a reader. Such a demo is quickly becoming the only acceptable way to present this sort of complex information. The extra time invested today in setting up and recording such a demo not only generates more sales, but helps train users quickly so they get the most out of the product.

For sheer visual fun, however, I really like WinDirStat. I have one animated gif that basically demonstrates how it works, from LifeHacker.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>