Luhn’s Point of View: Median-Based Term Weighting Schemes

Abstract views: 80 / PDF downloads: 161

Authors

  • İlker KOCABAŞ
  • Bahar KARAOĞLAN
  • Bekir Taner DINÇER

Keywords:

Information retrieval, indexing, term importance

Abstract

In this study we replace the TF component of the TFxIDF term weighting method with a parameter derived from Luhn’s claim on term
importance. Luhn claims that the words with the mid frequencies are the most important ones, and the importance of a word fall as the frequency of
the word increases or decreases. We take the median frequency of the words in a document as the base and assess the importance of a word by the
difference between its frequency and the median frequency. The weighting functions are varied by two normalization approaches as using median
itself and standard deviation of medians and tested on TREC-6 through TREC-8 adhoc tracks. The experimental results of the weightings using
median itself, perform better retrieval than basic TFxIDF and BM25 with respect to MAP and R-P measures.

Downloads

Published

2019-06-01

How to Cite

KOCABAŞ, İlker, KARAOĞLAN, B., & DINÇER, B. T. (2019). Luhn’s Point of View: Median-Based Term Weighting Schemes. International Journal of Natural and Engineering Sciences, 5(3), 31–35. Retrieved from https://ijnes.org/index.php/ijnes/article/view/63

Issue

Section

Articles