Optical english font recognition in document images using eigenfaces

Authors

  • Hasan S. M. Al-Khaffaf Software Engineering and Embedded Systems (SEES) Research Group, Department of Computer Science, University of Duhok.
  • Nadia A. Musa Department of Physics, University of Duhok

DOI:

https://doi.org/10.15649/2346075X.466

Keywords:

Font Recognition; EigenFaces; EigenFonts; PCA.

Abstract

Introduction: In this paper, a system for recognizing fonts has been designed and implemented. The system is based on the Eigenfaces method. Because font recognition works in conjunction with other methods like Optical Character Recognition (OCR), we used Decapod and OCRopus software as a framework to present the method. Materials and Methods: In our experiments, text typeset with three English fonts (Comic Sans MS, DejaVu Sans Condensed,
Times New Roman) have been used. Results and Discussion: The system is tested thoroughly using synthetic and degraded data. The experimental results show that Eigenfaces algorithm is very good at recognizing fonts of synthetic clean data as well as degraded data. The correct recognition rate for synthetic data for Eigenfaces is 99% based on Euclidean Distance. The overall accuracy of Eigenfaces is 97% based on 6144 degraded samples and considering Euclidean Distance performance criterion. Conclusions: It is concluded from the experimental results that the Eigenfaces method is suitable for font recognition of degraded documents. The three percentage incorrect classification can be mediated by relying on intra-word font information.

Author Biographies

Hasan S. M. Al-Khaffaf, Software Engineering and Embedded Systems (SEES) Research Group, Department of Computer Science, University of Duhok.

Software Engineering and Embedded Systems (SEES) Research Group, Department of Computer Science, University of Duhok. 

Nadia A. Musa, Department of Physics, University of Duhok

Department of Physics, University of Duhok.

References

Nagy G. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2000; 22(1):38-62. https://doi.org/10.1109/34.824820

Solli M, Lenz R. FyFont: find-your-font in large font databases. In: Ersbøll B.K., Pedersen K.S. (eds) Image Analysis. SCIA 2007. Lecture Notes in Computer Science, Vol 4522. Springer, Berlin, Heidelberg; 2007, pp 432-441. https://doi.org/10.1007/978-3-540-73040-8_44

Ramakrishnan AG, Urala KB. Global and local features for recognition of online handwritten numerals and Tamil characters. In: Proceedings of the 4th International Workshop on Multilingual OCR; 2013 Aug 24; Washington, D.C., USA. ACM; 2012. p. 16. Available from: ACM Digital Library. https://doi.org/10.1145/2505377.2505391

Zramdini A, Ingold R. Optical font recognition using typographical features. IEEE Transactions on pattern analysis and machine intelligence. 1998; 20(8): 877-82. https://doi.org/10.1109/34.709616

Zhu Y, Tan T, Wang Y. Font recognition based on global texture analysis. IEEE Transactions on pattern analysis and machine intelligence. 2001;23(10):1192-200. https://doi.org/10.1109/34.954608

Allier B, Emptoz H. Font Type Extraction and Character Prototyping Using Gabor Filters. In International Conference on Document Analysis and Recognition; 2003 August 3-6; Washington, DC, USA. IEEE; 2002. p. 799-803.

Cutter MP, Beusekom JV, Shafait F, Breuel TM. Unsupervised font reconstruction based on token co-occurrence. In: Proceedings of the 10th ACM symposium on Document engineering; 2010 Sep 21-24; Manchester, United Kingdom. ACM; 2010. p. 143-150. Available from: ACM. https://doi.org/10.1145/1860559.1860589

Sevik A, Erdogmus P, Yalein E. Font and Turkish Letter Recognition in Images with Deep Learning. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT); 2018 Dec 3-4; Ankara,Turkey. IEEE;2018. p. 61-64. Available from: IEEE Xplore. https://doi.org/10.1109/IBIGDELFT.2018.8625333

Bharath V, Rani NS. A font style classification system for English OCR. In: 2017 International Conference on Intelligent Computing and Control (I2C2); 2017 Jun 23-24; Coimbatore, India.IEEE; 2017, p. 1-5. Available from: IEEE Xplore. https://doi.org/10.1109/I2C2.2017.8321962

Jaiem FK, Slimane F, Kherallah M. Arabic font recognition system applied to different text entity level analysis. In: 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C); 2017 Feb 17; Sfax, Tunisia. IEEE; 2017, p. 36-40. Available from: IEEE Xplore. https://doi.org/10.1109/SM2C.2017.8071847

Tao D, Lin X, Jin L, Li X. Principal component 2-D long short-term memory for font recognition on single Chinese characters. IEEE transactions on cybernetics. 2016;46(3):756-65. https://doi.org/10.1109/TCYB.2015.2414920

Shafait F, Cutter MP, Van Beusekom J, Bukhari SS, Breuel TM. Decapod: A flexible, low cost digitization solution for small and medium archives. In: International Workshop on Camera-Based Document Analysis and Recognition; 2011 Sep 22; Beijing, China: Springer, Berlin, Heidelberg; 2011. p. 101-111. https://doi.org/10.1007/978-3-642-29364-1_8

Turk M, Pentland A. Eigenfaces for recognition. Journal of cognitive neuroscience. 1991;3(1):71-86. https://doi.org/10.1162/jocn.1991.3.1.71

Turk MA, Pentland AP. Face recognition using eigenfaces. In Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 1991 Jun 3-6; Maui, HI, USA. IEEE; 1991. p.586-591. Available from: IEEE Xplore.

Lata YV, Tungathurthi CK, Rao HR, Govardhan A, Reddy LP. Facial recognition using eigenfaces by PCA. International Journal of Recent Trends in Engineering. 2009;1(1):587.

Sirovich L, Kirby M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A. 1987;4(3):519-24. https://doi.org/10.1364/JOSAA.4.000519

Jolliffe I. Principal component analysis, Wiley Online Library, 2002.

Kambhatla N, Leen TK. Dimension reduction by local principal component analysis. Neural computation. 1997;9(7):1493-516. https://doi.org/10.1162/neco.1997.9.7.1493

Yang J, Zhang DD, Frangi AF, Yang JY. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE transactions on pattern analysis and machine intelligence. 2004;26(1):131-137. https://doi.org/10.1109/TPAMI.2004.1261097

Saabni R, El-Sana J, Efficient Generation of Comprehensive Database for Online Arabic Script Recognition, In 10th International Conference on Document Analysis and Recognition; 2009 Jul 26-29; Barcelona, Spain. IEEE; 2009. Available from: IEEE Xplore. https://doi.org/10.1109/ICDAR.2009.258

Pearson K. "Liii. On lines and planes of closest fit to systems of points in space." The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1901;2(11):559-72. https://doi.org/10.1080/14786440109462720

Burges CJ, Geometric methods for feature extraction and dimensional reduction. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston, MA, USA. IEEE: Springer; 2005. p. 59-91.https://doi.org/10.1007/0-387-25465-X_4

Al-Khaffaf HSM, Shafait F, Cutter MP, Breuel T. M. On the performance of Decapod's digital font reconstruction. 21st International Conference on Pattern Recognition (ICPR); 2012 Nov 11-15;Tsukuba, Japan. IEEE; 2012, p. 649-652, Available from: IEEE Xplore.

Kanungo T, Haralick RM, Baird HS, Stuezle W, Madigan D. A statistical, nonparametric methodology for document degradation model validation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(11):1209-23. https://doi.org/10.1109/34.888707

Ko SJ, Lee YH. Center weighted median filters and their applications to image enhancement. IEEE Transactions on Circuits and Systems. 1991;38(9):984-993. https://doi.org/10.1109/31.83870

Al-Khaffaf HSM, Talib AZ, Salam RA. Removing Salt-and-Pepper Noise from Binary Images of Engineering Drawings. In: 19th International Conference on Pattern Recognition; 2008 Dec 08-11; Tampa, Florida, USA. IEEE;2008. p. 1271-1274. Available from: IEEE Xplore. https://doi.org/10.1109/ICPR.2008.4761425

Al-Khaffaf HSM, Talib AZ, Salam RA. Enhancing Salt-and-Pepper Noise Removal in Binary Images of Engineering Drawing. IEICE Transactions on Information and Systems. 2009;E92-D(4):689-704. https://doi.org/10.1587/transinf.E92.D.689

Revista Innovaciencia Facultad de Ciencias Exactas, Físicas y Naturales

Downloads

Published

2018-12-28

How to Cite

Al-Khaffaf, H. S. M. ., & Musa, N. A. . (2018). Optical english font recognition in document images using eigenfaces. Innovaciencia, 6(1), 1–11. https://doi.org/10.15649/2346075X.466

Issue

Section

Original research and innovation article

Altmetrics

Downloads

Download data is not yet available.