Archives

Search algorithms in a product development environment

Recently I’ve been reading an information architecture perspective on internet search algorithms [1]. My perspective is that internet search is not corporate search. Of course many of the concepts are relevant and transferrable, but there are differences. Let’s start with a review, then talk about the differences.

The primary goal of internet search is to help the user find things they want to find, with particular emphasis on help. Search is not find. To find information a user may combine searching, following links, filtering, sorting, and navigating hierarchies. A given search algorithm yields results that are either high precision (that is, every hit is relevant) or high recall (where all relevant pages are returned and none are missed), but usually they are somewhere in between these two extremes. High precision algorithms may miss relevant articles because, for example, the search terms are not very close to each other in the article. High recall algorithms may find too many articles because, for example, the user wanted articles on “certificates of deposit” (CDs) but also got articles on DVDs and ripping. There are many search algorithms, and the job of the information architect is to help choose the algorithm that is most appropriate for the average user of the system. Perhaps the best-known algorithm is the one used by Google, which at its heart uses page ranking to order the search results: Pages achieve higher rank when they are linked to from other pages, in particular high quality pages.

That’s a good enough introduction for my purpose, which is to consider how internet search is different from corporate search. Without question, a large corporate intranet must address many of the same issues that an external website must address. But when the user is searching for project- or product-related information, such as features, requirements, defects, or documentation, then there are a couple of specific complications that must be accomodated.

First of all, in an ideal information architecture many of these artifacts will be found within the context of a number of documents since the goal is to reuse as much information as possible throughout the corporation. Effective reuse of information will result in certain documentation snippets being reused throughout an entire product line, for instance. When you do a search, you may get only one hit but it could be shared in a dozens of documents. Or, in other circumstances – depending on the underlying architecture – you may get dozens of hits (one for each document) that should be reported as a single hit found in dozens of documents. When confronted with this situation the emphasis of the search engine should be on helping the user navigate to the desired document. Often, it can be informed of the user’s likely (or actual) context if they are working in an IDE (integrated development environment) and are working on a particular task. Often it can rely on information about projects in which the user is currently a stakeholder. The system knows more about the user than a typical internet search engine can possibly know, though I must say that’s not for lack of trying. The internet search engines would love to know everything about the detailed minutea of your life if you would let them do so.

On the other hand there is the situation where the system knows too much about you. Sometimes, you are not wearing your “I work in quality assurance” hat or your “I am a product manager” hat when you are performing a query on the system. Sometimes you are simply doing a search on the system and somehow the system has to magically discern this fact. I am not sure of the best way to do this, but perhaps a special control under the search box (similar to the “search this web site” checkbox) can help the user inform the search engine about his intentions. This scenario may suggest a switch from a higher precision search engine to a higher recall search engine. When you are working within a particular context, higher precision is better because your working knowledge is more current and you will use better search terms. Outside of your “comfort zone” you may need a little extra help finding what you want, and hence higher recall is preferred.

The other significant difference between internet search and project/product search is that in a corporate environment the same information may be found in articles that are related to each other through ancestry. Each time you branch an article you create a new possible search engine hit. They are definitely not the same, and the search engine must not suggest they are. But they indeed related, and probably the best way to present them in the search results is to list the highest ranked result with links to all of its ancestral relations.

A final aspect of searching within a product or service development environment is the ability to search for information that existed in the system at some time in the past. This applies to any kind of versionable artifacts, whether it be a requirement or source code. Let’s say you wrote some code four years ago which has since been superceded and you would like to revisit that code. It would be really nice to be able to search through the archives for a particular time period (including all history) for a certain string. The search results in that case would indicate the versions of the file in which the string was found, and even better, information about why the string was removed from the file. If your defects and feature requests are linked to individual lines of code – a feature that is available in many current SCM (source code management) systems, then you can do this.

Searching for product information within a product or service organization is different from internet search.

  1. The same article may appear in many documents simultaneously.
  2. Articles may be related to each other through ancestry.
  3. The system may know a lot more about the user than a regular internet search engine.
  4. The system should be able to search throughout history.

I don’t know of any products that properly accomodates these special circumstances, but my guess is that they may be coming to your organization within a few years. Do you know of any? Reader feedback is always welcome!

[1] Information Architecture for the World Wide Web by P. Morville and L. Rosenfeld, (c) 2007 O’Reilly Media.

1 comment to Search algorithms in a product development environment

  • eventhough I spend the majority of my time on the net learning games like zynga poker or petville, I nonetheless like to take some spare time to surf a a small number of websites here and there and I’m fortunate to report this recent write-up is actually more or less good and substantially more beneficial than half the other junk I read today , anyways i’m off to play a smattering of hands of zynga poker

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>