A Farsi/Arabic Word Spotting Approach for Printed Document Images
Abstract views: 65 / PDF downloads: 53Keywords:
Farsi document image, word spotting, word searching, word image retrievalAbstract
Word spotting is finding and locating a query word through a dataset of document images. There are many papers about English (Latin) and
some papers about Arabic, but there isn’t any paper about Farsi word spotting. This paper is the first paper about it. In this paper using some
characteristics of Farsi scripts and some font size independent features such as number of sub words, and their aspect ratios, number of holes, dots,
ascenders and descenders, and a multi level matching process, instances of a query word is found through document images. This approach has
been applied on a dataset consisting of 400 Farsi document images in 4 font faces with font sizes from 8 up to 22, and precision rate 88.7% at a
recall rate of 78.5% has been obtained. Proposed approach is font size independent because used features are font size independent. This approach
is also applicable on Arabic and Urdu scripts.