Reconocimiento óptico de fuentes en inglés en documentos de imágenes utilizando eigenfaces
DOI:
https://doi.org/10.15649/2346075X.466Palabras clave:
Font Recognition; EigenFaces; EigenFonts; PCA.Resumen
Introduction: In this paper, a system for recognizing fonts has been designed and implemented. The system is based on the Eigenfaces method. Because font recognition works in conjunction with other methods like Optical Character Recognition (OCR), we used Decapod and OCRopus software as a framework to present the method. Materials and Methods: In our experiments, text typeset with three English fonts (Comic Sans MS, DejaVu Sans Condensed,
Times New Roman) have been used. Results and Discussion: The system is tested thoroughly using synthetic and degraded data. The experimental results show that Eigenfaces algorithm is very good at recognizing fonts of synthetic clean data as well as degraded data. The correct recognition rate for synthetic data for Eigenfaces is 99% based on Euclidean Distance. The overall accuracy of Eigenfaces is 97% based on 6144 degraded samples and considering Euclidean Distance performance criterion. Conclusions: It is concluded from the experimental results that the Eigenfaces method is suitable for font recognition of degraded documents. The three percentage incorrect classification can be mediated by relying on intra-word font information.
Referencias
Nagy G. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2000; 22(1):38-62. https://doi.org/10.1109/34.824820
Solli M, Lenz R. FyFont: find-your-font in large font databases. In: Ersbøll B.K., Pedersen K.S. (eds) Image Analysis. SCIA 2007. Lecture Notes in Computer Science, Vol 4522. Springer, Berlin, Heidelberg; 2007, pp 432-441. https://doi.org/10.1007/978-3-540-73040-8_44
Ramakrishnan AG, Urala KB. Global and local features for recognition of online handwritten numerals and Tamil characters. In: Proceedings of the 4th International Workshop on Multilingual OCR; 2013 Aug 24; Washington, D.C., USA. ACM; 2012. p. 16. Available from: ACM Digital Library. https://doi.org/10.1145/2505377.2505391
Zramdini A, Ingold R. Optical font recognition using typographical features. IEEE Transactions on pattern analysis and machine intelligence. 1998; 20(8): 877-82. https://doi.org/10.1109/34.709616
Zhu Y, Tan T, Wang Y. Font recognition based on global texture analysis. IEEE Transactions on pattern analysis and machine intelligence. 2001;23(10):1192-200. https://doi.org/10.1109/34.954608
Allier B, Emptoz H. Font Type Extraction and Character Prototyping Using Gabor Filters. In International Conference on Document Analysis and Recognition; 2003 August 3-6; Washington, DC, USA. IEEE; 2002. p. 799-803.
Cutter MP, Beusekom JV, Shafait F, Breuel TM. Unsupervised font reconstruction based on token co-occurrence. In: Proceedings of the 10th ACM symposium on Document engineering; 2010 Sep 21-24; Manchester, United Kingdom. ACM; 2010. p. 143-150. Available from: ACM. https://doi.org/10.1145/1860559.1860589
Sevik A, Erdogmus P, Yalein E. Font and Turkish Letter Recognition in Images with Deep Learning. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT); 2018 Dec 3-4; Ankara,Turkey. IEEE;2018. p. 61-64. Available from: IEEE Xplore. https://doi.org/10.1109/IBIGDELFT.2018.8625333
Bharath V, Rani NS. A font style classification system for English OCR. In: 2017 International Conference on Intelligent Computing and Control (I2C2); 2017 Jun 23-24; Coimbatore, India.IEEE; 2017, p. 1-5. Available from: IEEE Xplore. https://doi.org/10.1109/I2C2.2017.8321962
Jaiem FK, Slimane F, Kherallah M. Arabic font recognition system applied to different text entity level analysis. In: 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C); 2017 Feb 17; Sfax, Tunisia. IEEE; 2017, p. 36-40. Available from: IEEE Xplore. https://doi.org/10.1109/SM2C.2017.8071847
Tao D, Lin X, Jin L, Li X. Principal component 2-D long short-term memory for font recognition on single Chinese characters. IEEE transactions on cybernetics. 2016;46(3):756-65. https://doi.org/10.1109/TCYB.2015.2414920
Shafait F, Cutter MP, Van Beusekom J, Bukhari SS, Breuel TM. Decapod: A flexible, low cost digitization solution for small and medium archives. In: International Workshop on Camera-Based Document Analysis and Recognition; 2011 Sep 22; Beijing, China: Springer, Berlin, Heidelberg; 2011. p. 101-111. https://doi.org/10.1007/978-3-642-29364-1_8
Turk M, Pentland A. Eigenfaces for recognition. Journal of cognitive neuroscience. 1991;3(1):71-86. https://doi.org/10.1162/jocn.1991.3.1.71
Turk MA, Pentland AP. Face recognition using eigenfaces. In Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 1991 Jun 3-6; Maui, HI, USA. IEEE; 1991. p.586-591. Available from: IEEE Xplore.
Lata YV, Tungathurthi CK, Rao HR, Govardhan A, Reddy LP. Facial recognition using eigenfaces by PCA. International Journal of Recent Trends in Engineering. 2009;1(1):587.
Sirovich L, Kirby M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A. 1987;4(3):519-24. https://doi.org/10.1364/JOSAA.4.000519
Jolliffe I. Principal component analysis, Wiley Online Library, 2002.
Kambhatla N, Leen TK. Dimension reduction by local principal component analysis. Neural computation. 1997;9(7):1493-516. https://doi.org/10.1162/neco.1997.9.7.1493
Yang J, Zhang DD, Frangi AF, Yang JY. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE transactions on pattern analysis and machine intelligence. 2004;26(1):131-137. https://doi.org/10.1109/TPAMI.2004.1261097
Saabni R, El-Sana J, Efficient Generation of Comprehensive Database for Online Arabic Script Recognition, In 10th International Conference on Document Analysis and Recognition; 2009 Jul 26-29; Barcelona, Spain. IEEE; 2009. Available from: IEEE Xplore. https://doi.org/10.1109/ICDAR.2009.258
Pearson K. "Liii. On lines and planes of closest fit to systems of points in space." The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1901;2(11):559-72. https://doi.org/10.1080/14786440109462720
Burges CJ, Geometric methods for feature extraction and dimensional reduction. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston, MA, USA. IEEE: Springer; 2005. p. 59-91.https://doi.org/10.1007/0-387-25465-X_4
Al-Khaffaf HSM, Shafait F, Cutter MP, Breuel T. M. On the performance of Decapod's digital font reconstruction. 21st International Conference on Pattern Recognition (ICPR); 2012 Nov 11-15;Tsukuba, Japan. IEEE; 2012, p. 649-652, Available from: IEEE Xplore.
Kanungo T, Haralick RM, Baird HS, Stuezle W, Madigan D. A statistical, nonparametric methodology for document degradation model validation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(11):1209-23. https://doi.org/10.1109/34.888707
Ko SJ, Lee YH. Center weighted median filters and their applications to image enhancement. IEEE Transactions on Circuits and Systems. 1991;38(9):984-993. https://doi.org/10.1109/31.83870
Al-Khaffaf HSM, Talib AZ, Salam RA. Removing Salt-and-Pepper Noise from Binary Images of Engineering Drawings. In: 19th International Conference on Pattern Recognition; 2008 Dec 08-11; Tampa, Florida, USA. IEEE;2008. p. 1271-1274. Available from: IEEE Xplore. https://doi.org/10.1109/ICPR.2008.4761425
Al-Khaffaf HSM, Talib AZ, Salam RA. Enhancing Salt-and-Pepper Noise Removal in Binary Images of Engineering Drawing. IEICE Transactions on Information and Systems. 2009;E92-D(4):689-704. https://doi.org/10.1587/transinf.E92.D.689
Descargas
Publicado
Cómo citar
Número
Sección
Altmetrics
Descargas
Licencia
Todos los artículos publicados en esta revista científica están protegidos por los derechos de autor. Los autores retienen los derechos de autor y conceden a la revista el derecho de primera publicación con el trabajo simultáneamente licenciado bajo una Licencia Creative Commons Atribución-NoComercial 4.0 Internacional (CC BY-NC 4.0) que permite compartir el trabajo con reconocimiento de autoría y sin fines comerciales.
Los lectores pueden copiar y distribuir el material de este número de la revista para fines no comerciales en cualquier medio, siempre que se cite el trabajo original y se den crédito a los autores y a la revista.
Cualquier uso comercial del material de esta revista está estrictamente prohibido sin el permiso por escrito del titular de los derechos de autor.
Para obtener más información sobre los derechos de autor de la revista y las políticas de acceso abierto, por favor visite nuestro sitio web.