Tuesday, June 21, 2011

The Glory of the Hunt

People care about their files. They are attached to the personal systems they put in place to find what they need, and become anxious if they are taken away. The file system is guarded territory, as fiercly protected as a parking stall.

It became clear years ago that it is not enough, as a records professional to build a pure filing system - elegant in design and intuitive to use. If I have not engaged the very human customers where they work and live, the system will fail.

Quoting google advanced help, "Search engines use a variety of techniques to imitate how people think and to approximate their behavior. As a result, most rules have exceptions. For example, the query [ for better or for worse ] will not be interpreted by Google as an OR query, but as a phrase that matches a (very popular) comic strip. Google will show calculator results for the query [ 34 * 87 ] rather than use the 'Fill in the blanks' operator. Both cases follow the obvious intent of the query." This means rather than being straightforward, search support services spend a lot of time getting inside the heads of searchers, to help them get the results they want (not what they say they want).

So how do we stow away and find things? It could be that our personal classification systems and style of hunting (foraging) are as established as our hunter-gatherer brains. Web and corporate Enterprise Content Management (ECM) systems have vastly extended our reach, but our searching instincts have not changed.

I will suggest that classifying and searching, tagging and recalling successful hunts, are part of our instinctive heritage. Steven Pinker in his book the Language Instinct, proposes fifteen instincts that are hard-wired in to all of us. Two of these instincts relate to searching and classifying:

4. Mental maps for large territories.

11. A mental Rolodex; a database of individuals, with blanks for kinship, status or rank, history of exchange of favors, and inherent skills and strengths, plus criteria that valuate each trait.

Imagining our forbear's steps, I imagined my ancestor following a familiar trail, noting edible plants along the way. She would retrace her steps later, when she knew the harvest would be ready; wild carrots in the summer, cattail tubers in the fall, and rose hips through the winter. She would have identified and classifed the edible plants, and remembered the trail to get there.
I couldn't find a comparable image on my google search, so I scanned my own, I did learn a little about the foraging habits of water pipits, larval green lacewings, and modern human urban foragers.   
So in many ways, classifying and searching is instinctive. We care about the results of the hunt, and not just for the practical purpose of getting the job done. This is personal.

You know what I am talking about. Anyone in our business will have hit a tough search that evades early detection. We dig in ever harder, searching out the obscure places where it might have been put. To place that record in our bosses hands, sweet.
The hunt is valued. A swift and successful hunt gives value to the organization. A hunter who provides consistent results is an asset; not just from an empirical, practical point of view, but at an instinctive, visceral level. I suggest again that if the GARP(C) principles were to be ranked, Availability is at the top. Not so say that the rest may be discarded. Together, they complete the framework for a robust records system.

When converting to a new file structure, be respectful of people's need to find their stuff. Anticipate the anxiety that accompanies change, and prepare for it. Make sure they have time to orient themselves to the new system, and reassure them that the materials they need daily will be at hand.

Monday, June 20, 2011

Advanced Search in the New Age

I've struggled with this subject all day. It's hard to pin down why. I enjoy running a great search, and I'm good at it. I think it must be because many of the tips and tools I'm highlighting, are as natural to use as breathing. It's tough not to step over my own feet when laboriously laying out all the steps. The problem these days - on google at least - is not the absence of results. The problem is too many results from a simple search.

When the internet was new, my girlfriend showed off her google search for "superman". Her son was a comic book buff, and she and her son marvelled at the speed of the return; four hits. When she demonstrated for me a couple months later, we found twenty sites. And goggled. My, how the internet was growing in leaps and bounds. Today, a google search on the same term gave me 168 million hits. My mind boggles at that number. In truth, I won't look past a couple pages. The likelihood that I would find a significant result any deeper is just too small.

To make sense of this mass of information at our fingertips has made search an art. Find a term significant and unique enough to bring back the result I need, but not so narrow that it filters out the gold. A way to develop this fine touch is to start with the narrowest search you can think of. Try enclosing your google phrase in quotation marks. If you get no results, broaden your search ever more slightly. After  a while, you will develop a fine touch. Here are two google searches I conducted recently, that required several google tries to find me what I wanted:

  • There is an archeaological dig on she shores of Galilee, profiled by the Naked Archaeologist . There's evidence of a fishing industry, and early Christian activity. What was the name of the dig? I'd forgotten. I searched filtering only Naked results, and found the name of the fishing village. I then broadened the search for Bethsaida. Google corrected my spelling, of course. And there it was, in satisfying detail, the results of a dig briefly profiled on Naked.
  • A student mentioned BlueCielo as an Electronic Content Management (ECM) tool that manages engineering drawings in Computer Aided Design, (CAD) format. After checking out the official site, I wondered what the community is saying. I used advanced search to limit the results to "Discussion:". Google found me what I wanted, but the discussions were empty. What is it with the community? Do they sit around the water-cooler to chat? Is there no twitter feed, no chatter, no casual trail for me to follow? I remind myself that this is not all bad. People talking. In person.
Before I go any further, I'll briefly discuss the differences in a corporate electronic file search and the world-wide web. Most of the time when conducting an internal search, you are looking for something you know exists. You either put it there yourself, or it is a manual/report/document that you have referred to in the past. You resort to search because you've forgotten in the webonious structure where you've last laid it. If it is an Explorer search, a panting dog may wag his way through to help you.
Failure to find the document you are looking for will likely lead to a few hours of frustration. Because unlike a google search, you must find the document of your recollection. The average information worker spends 8.8 hours a week searching for information. (Ref. The Importance of Enterprise Search, slide 13, IDC Hidden Costs of Information Work (2005) ). It is therefore critical that the electronic information management system that you select is capable of masterful (and swift) searches.

Similarly, in an e-discovery (may you never be blessed), search results must be consistent and complete. Correspondence has an annoying habit of referencing past correspondence. Does the search find both? Missing key documents will challenge the comprehensiveness of your records, and the reputation of your corporation.

Now that my little rabbit trail is done, I can go back to discussing advanced searching techniques. Most of these help you narrow your search. As I've mentioned before, a dearth of answers is not our problem. If you don't believe me, try running a search for "report" (5 billion hits on google). Advanced techniques include wildcard searches (named after the Joker in our decks), boolean searches (AND, OR and NOT), and a few more I found during my google search today; fuzzy, proximity and range. Though google calls these features by another name, you can practice wildcard and boolean in advanced search. Google has a great help page for advanced searchers.

Wildcard is replacing a character or range of characters with a symbol ("*" on most of the systems I looked at today). I would have found Bethsaida sooner if I had typed Bet*da. I'd mistakenly looked for it as Bethseda.

The boolean link I've referenced is a great tutorial that graphically illustrates the different sorts of results you get. Google uses these same boolean terms, so check out the results. AND and NOT gives you a narrower result. If you care to check out my internet presence, try the google result "jgnat -java" (jgnat NOT java).

I've begun reading up on fuzzy, proximity and range when reading the features of Apache Lucene, an open source search engine. I won't try and pretend to explain them fully here. Range can be very helpful to narrow to a period of time, (i.e. Business Plans for the first three quarters of 2009) and tricky to get right. Fuzzy claims to bring back words that sounds like (but are not spelled like) what you've asked for. This might also have helped me find Bethsaida.

It is very worthwhile as information professionals to master these techniques. Information workers need all the help they can get to find their information swiftly and consistently. Be the expert, and we will demonstrate our worth to the organization many times over.

Friday, June 17, 2011

The Glory of the Hunt - Searchable Records

Trending topics in the records world these days are e-discovery, security, and collaboration. I propose however, that the most valuable skills record-keepers have to offer in the 21st century is the the power of search.

Search falls in the Generally Accepted Recordkeeping Principles (GARP (C)) under "Availability", An organization shall maintain records in a manner that ensures timely, efficient, and accurate retrieval of information.

Why do I claim this is skill is valued over the others? Because any records system passes or fails on it's ability to deliver. If it can't promise to give your information back when you need it, why would you use it?

Over the next few days, I'll highlight the power of search; some of the advanced tools that we should be familiar with as records professionals, why the hunt is its own reward, and some technical marvels on the horizon.