Some IBM technotes definitely go into the 'Interesting - may prove useful in the future' category.  This one on IBM Connections search ranks highly for geekiness...

Question
When you search all IBM Connections components, the results are returned based on relevance. How does Connections determine which documents are most relevant?

Answer

The simplified explanation is the more times a search term appears in a document, wiki, blog article etc. the more relevant that content becomes.

The search is built on top of Apache Lucene, as are most of IBM's search components. Detail of the Apache Lucene project can be found here: http://lucene.apache.org/java/docs/index.html

Here is the formula that is used in Connections 3.0.x: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
The linked document describing the Similarity formula might help next time you are waiting for a flight or struggling to sleep:
The score of query q for document d correlates to the cosine-distance or dot-product between document and query vectors in a Vector Space Model (VSM) of Information Retrieval. A document whose vector is closer to the query vector in that model is scored higher. The score is computed as follows:

...
;-)


By: Stuart McIntyre | 4 Comments | On: 12 September 2011 08:18:38 AM | Tags:  connections  search  lucene  ranking 



Comments

No Comments Found


Add a comment

Subject:
   
Name:
E-mail:
Web Site:
 
Comment:  (No HTML - Links will be converted if prefixed http://)
 
Remember Me?