Optical English Font Recognition in Document Images Using Eigenfaces

  • Hasan S.M Al-Khaffat Software Engineering and Embedded Systems (SEES) Research Group, Department of Computer Science, University of Duhok https://orcid.org/0000-0002-2133-0602
  • Nadia A. Musa Department of Physics, University of Duhok

Resumen

Introduction: In this paper, a system for recognizing fonts has been designed and implemented. The system is based on the Eigenfaces method. Because font recognition works in conjunction with other methods like Optical Character Recognition (OCR), we used Decapod and OCRopus software as a framework to present the method. Materials and Methods: In our experiments, text typeset with three English fonts (Comic Sans MS, DejaVu Sans Condensed, Times New Roman) have been used. Results and Discussion: The system is tested thoroughly using synthetic and degraded data. The experimental results show that the Eigenfaces algorithm is very good at recognizing fonts of synthetic clean data as well as degraded data. The correct recognition rate for synthetic data for Eigenfaces is 99% based on Euclidean Distance. The overall accuracy of Eigenfaces is 97% based on 6144 degraded samples and considering the Euclidean Distance performance criterion. Conclusions: It is concluded from the experimental results that the Eigenfaces method is suitable for font recognition of degraded documents. The three percentage incorrect classification can be mediated by relying on intra-word font information.  

Citas

Nagy G. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2000; 22(1):38-62.

Solli M, Lenz R. FyFont: find-your-font in large font databases. In: Ersbøll B.K., Pedersen K.S. (eds) Image Analysis. SCIA 2007. Lecture Notes in Computer Science, Vol 4522. Springer, Berlin, Heidelberg; 2007, pp 432-441.

Ramakrishnan AG, Urala KB. Global and local features for recognition of online handwritten numerals and Tamil characters. In: Proceedings of the 4th International Workshop on Multilingual OCR; 2013 Aug 24; Washington, D.C., USA. ACM; 2012. p. 16. Available from: ACM Digital Library.

Zramdini A, Ingold R. Optical font recognition using typographical features. IEEE Transactions on pattern analysis and machine intelligence. 1998; 20(8): 877-82.

Zhu Y, Tan T, Wang Y. Font recognition based on global texture analysis. IEEE Transactions on pattern analysis and machine intelligence. 2001;23(10):1192-200.

Allier B, Emptoz H. Font Type Extraction and Character Prototyping Using Gabor Filters. In International Conference on Document Analysis and Recognition; 2003 August 3-6; Washington, DC, USA. IEEE; 2002. p. 799-803.

Cutter MP, Beusekom JV, Shafait F, Breuel TM. Unsupervised font reconstruction based on token co-occurrence. In: Proceedings of the 10th ACM symposium on Document engineering; 2010 Sep 21-24; Manchester, United Kingdom. ACM; 2010. p. 143-150. Available from: ACM.

Sevik A, Erdogmus P, Yalein E. Font and Turkish Letter Recognition in Images with Deep Learning. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT); 2018 Dec 3-4; Ankara,Turkey. IEEE;2018. p. 61-64. Available from: IEEE Xplore.

Bharath V, Rani NS. A font style classification system for English OCR. In: 2017 International Conference on Intelligent Computing and Control (I2C2); 2017 Jun 23-24; Coimbatore, India.IEEE; 2017, p. 1-5. Available from: IEEE Xplore.

Jaiem FK, Slimane F, Kherallah M. Arabic font recognition system applied to different text entity level analysis. In: 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C); 2017 Feb 17; Sfax, Tunisia. IEEE; 2017, p. 36-40. Available from: IEEE Xplore.

Tao D, Lin X, Jin L, Li X. Principal component 2-D long short-term memory for font recognition on single Chinese characters. IEEE transactions on cybernetics. 2016;46(3):756-65.

Shafait F, Cutter MP, Van Beusekom J, Bukhari SS, Breuel TM. Decapod: A flexible, low cost digitization solution for small and medium archives. In: International Workshop on Camera-Based Document Analysis and Recognition; 2011 Sep 22; Beijing, China: Springer, Berlin, Heidelberg; 2011. p. 101-111.

Turk M, Pentland A. Eigenfaces for recognition. Journal of cognitive neuroscience. 1991;3(1):71-86.

Turk MA, Pentland AP. Face recognition using eigenfaces. In Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 1991 Jun 3-6; Maui, HI, USA. IEEE; 1991. p.586-591. Available from: IEEE Xplore.

Lata YV, Tungathurthi CK, Rao HR, Govardhan A, Reddy LP. Facial recognition using eigenfaces by PCA. International Journal of Recent Trends in Engineering. 2009;1(1):587.

Sirovich L, Kirby M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A. 1987;4(3):519-24.

Jolliffe I. Principal component analysis, Wiley Online Library, 2002.

Kambhatla N, Leen TK. Dimension reduction by local principal component analysis. Neural computation. 1997;9(7):1493-516.

Yang J, Zhang DD, Frangi AF, Yang JY. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE transactions on pattern analysis and machine intelligence. 2004;26(1):131-137.

Saabni R, El-Sana J, Efficient Generation of Comprehensive Database for Online Arabic Script Recognition, In 10th International Conference on Document Analysis and Recognition; 2009 Jul 26-29; Barcelona, Spain. IEEE; 2009. Available from: IEEE Xplore.

Pearson K. “Liii. On lines and planes of closest fit to systems of points in space.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1901;2(11):559-72.

Burges CJ, Geometric methods for feature extraction and dimensional reduction. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston, MA, USA. IEEE: Springer; 2005. p. 59–91.

Al-Khaffaf HSM, Shafait F, Cutter MP, Breuel T. M. On the performance of Decapod's digital font reconstruction. 21st International Conference on Pattern Recognition (ICPR); 2012 Nov 11-15;Tsukuba, Japan. IEEE; 2012, p. 649-652, Available from: IEEE Xplore.

Kanungo T, Haralick RM, Baird HS, Stuezle W, Madigan D. A statistical, nonparametric methodology for document degradation model validation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(11):1209-23.

Ko SJ, Lee YH. Center weighted median filters and their applications to image enhancement. IEEE Transactions on Circuits and Systems. 1991;38(9):984–993.

Al-Khaffaf HSM, Talib AZ, Salam RA. Removing Salt-and-Pepper Noise from Binary Images of Engineering Drawings. In: 19th International Conference on Pattern Recognition; 2008 Dec 08-11; Tampa, Florida, USA. IEEE;2008. p. 1271–1274. Available from: IEEE Xplore.

Al-Khaffaf HSM, Talib AZ, Salam RA. Enhancing Salt-and-Pepper Noise Removal in Binary Images of Engineering Drawing. IEICE Transactions on Information and Systems. 2009;E92-D(4):689–704.

Publicado
2018-12-28
Cómo citar
Al-Khaffat, H. S., & Musa, N. A. (2018). Optical English Font Recognition in Document Images Using Eigenfaces. INNOVACIENCIA, 6(1), 1-11. https://doi.org/10.15649/2346075X.466
Sección
Artículo de investigación científica y tecnológica