Semantic searching with Topic Maps
Lars Marius Garshol, Bouvet
There’s no link to either a bio or an abstract for this one so here is the abstract in full:
“Garbage in, garbage out applies not just to information processing in general, but also especially to search. The quality of the data being searched determines the quality of the search result, and much of the effort required to make search work is spent cleaning up data to improve the quality. This presentation compares existing Topic Maps solutions for search with other existing search approaches to show similarities and differences. It then moves on to show what more can be done to exploit the semantics and structure of Topic Maps to improve the search experience beyond what has been done so far. “
An attempt to change how search works…from the full-text model to something more semantic. Instead of looking for a the words within the page fields the topic map search looks for topics on the website. The talk is not about web-wide searches but rather site-wide searches. This isn’t a google-killer but rather a way of handling a medium-sized website.
By tagging the documents with category/person/place/event for example he is able to do a pretty specific search within a wide rang of terms…but all the stop words are dropped (of, and, by, etc) and specifically the word ‘not’ is dropped. But the important thing seems to be that the hierarchies of things (like Montreal is a child of Canada and Hotel Europa is a child of Montreal) is very important. So a search for Canada would return things tagged with ‘Hotel Europa’ . He doesn’t explain how but I don’t think that is the point…or maybe my knowledge of Topic Maps is extremely limited (more likely the case).
No relevance ranking but all hits are definitely relevant to the query as given
Very closely tied to the topic map…if the wrong terms are used by the user then the approach doesn’t work. And worse still is that users don’t use the terms…people normally use 1-2 words and “how do you get someone to use a search when they have to read a 5 page document before…”?
i think that is the main thrust of my problem here….it sounds cool to me but I’m the type of person who has 10-15 tags for each photo. I worked at a company that produces a thesaurus of terms that librarians use for search. I am used to the concept of a set of structured tags for document…I doubt my friends who don’t use delicious, flikr, or digg are going to be excited about this interface. But maybe the general public isn’t who this is for…but rather an internal search for a company and/or select set of documents in a defined group. I can see how that would be useful for a growing set of documents when they need to be readily accessed.extrememarkup, extrememarkup07