Some IBM technotes definitely go into the 'Interesting - may prove useful in the future' category. This one on IBM Connections search ranks highly for geekiness...
QuestionThe linked document describing the Similarity formula might help next time you are waiting for a flight or struggling to sleep:
When you search all IBM Connections components, the results are returned based on relevance. How does Connections determine which documents are most relevant?
The simplified explanation is the more times a search term appears in a document, wiki, blog article etc. the more relevant that content becomes.
The search is built on top of Apache Lucene, as are most of IBM's search components. Detail of the Apache Lucene project can be found here: http://lucene.apache.org/java/docs/index.html
Here is the formula that is used in Connections 3.0.x: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
The score of query q for document d correlates to the cosine-distance or dot-product between document and query vectors in a Vector Space Model (VSM) of Information Retrieval. A document whose vector is closer to the query vector in that model is scored higher. The score is computed as follows:;-)
By: Stuart McIntyre | 4 Comments | On: 12 September 2011 08:18:38 AM | Tags: connections search lucene ranking