Trending Now

Semantic search that works?

I worked with IBM folks back in the ’90s who did some of the early work on latent semantic analysis, which was later implemented in the more famous latent semantic indexing. The original concepts were used to identify the meanings of words–words with multiple meanings can be “disambiguated” (as the techies say) by looking at the surrounding words. But most people are thinking of something far more complicated than that when they talk about semantic search. Recently, I came across a company that makes it simple again, and I think they are onto something.


Herb Roitblat, a Ph.D. and co founder of OrcaTec, found that post I wrote a while back on semantic search, and he complained to me that, although my post laid out some of the problems of semantic search, the biggest problem is that the folks behind the semantic Web have appropriated the term “semantic search.” Herb said it concisely:

In my opinion, semantic web tools are not much good for search because they force everything into fixed categories and then try to force people to conform to those categories. Just tonight I had the experience of trying to pick a category for my blog into a category and had a hard time figuring out which category to put it in.

Herb went on to outline how OraTec’s new search engine, Truevert, focused on green search (environment and sustainability), that delivers true semantic search based on some of the same latent semantic analysis ideas that I saw at IBM ten years ago, but as usual in this field, the implementation is worth a lot more than anyone’s idea. Once again, let’s see how Herb explains the idea behind Truevert:

A search for clothing returns pages about eco-friendly clothing, not the location of the nearest Gap store. A search for meat returns pages that talk about organic meat, wholesome baby food, and the environmental impact of meat, just what you would expect from a green point of view.

Truevert performs such magic without having built its own search engine–it actually consists of a semantic search layer built atop Yahoo! Search. Herb uses and example to illustrate how Truevert works:

If the word “lawyer” appears in a document, then it is likely that “attorney,” “judge,” “case,” and “court,” will also occur. Conversely, if one or more of these related words occurs, then the document is likely to be about a “lawyer” even if that word is not present. Similarly, the word “court” in the company of other words like “ball,” “player,” and “basket” is more likely to be about basketball than about litigation. The model computes the patterns of word usage from a population of documents and uses those patterns to predict what the documents are about. Similarly, when people understand a sentence, each word in the sentence helps to disambiguate the other words in the sentence. For example, consider the sentence, “the tree surgeon examined the young man’s palm.” By the time you get to the word “palm,” you have a pretty good idea what that word means.

This approach is what distinguishes Truevert from the mainstream search engines, which can’t guess at what sense of meaning a person is using in their search. Herb explained how that can be helpful in a green vertical search:

For example, a search for “CFL” returns documents about compact fluorescent light bulbs, not the Canadian Football League. A search for refrigerators returns pages about solar and high-efficiency refrigerators. A search for coffee returns pages about fair-trade and organic coffee. It knows what words mean, not just in a dictionary sense, but also in the sense of what’s important to this community. People don’t have to work so hard to find the information that suits their interests.

Time will tell if Truevert is successful, and whether the technology behind Truevert will spawn a string of other subject-oriented vertical search engines. But the idea that word combinations reveal the subject of documents is a good one, and Truevert appears to be a strong implementation of that idea. I don’t expect vertical search engines to suddenly undermine Google’s dominance of the search business, but I do believe that a smart approach like Truevert’s might finally make some vertical search engines profitable.

Reblog this post [with Zemanta]

Mike Moran

Mike Moran is a Converseon, an AI powered consumer intelligence technology and consulting firm. He is also a senior strategist for SoloSegment, a marketing automation software solutions and services firm. Mike also served as a member of the Board of Directors of SEMPO. Mike spent 30 years at IBM, rising to Distinguished Engineer, an executive-level technical position. Mike held various roles in his IBM career, including eight years at IBM’s customer-facing website, ibm.com, most recently as the Manager of ibm.com Web Experience, where he led 65 information architects, web designers, webmasters, programmers, and technical architects around the world. Mike's newest book is Outside-In Marketing with world-renowned author James Mathewson. He is co-author of the best-selling Search Engine Marketing, Inc. (with fellow search marketing expert Bill Hunt), now in its Third Edition. Mike is also the author of the acclaimed internet marketing book, Do It Wrong Quickly: How the Web Changes the Old Marketing Rules, named one of best business books of 2007 by the Miami Herald. Mike founded and writes for Biznology® and writes regularly for other blogs. In addition to Mike’s broad technical background, he holds an Advanced Certificate in Market Management Practice from the Royal UK Charter Institute of Marketing and is a Visiting Lecturer at the University of Virginia’s Darden School of Business. He also teaches at Rutgers Business School. He was a Senior Fellow at the Society for New Communications Research and is now a Senior Fellow of The Conference Board. A Certified Speaking Professional, Mike regularly makes speaking appearances. Mike’s previous appearances include keynote speaking appearances worldwide

Join the Discussion

Your email address will not be published. Required fields are marked *

Back to top