Fundamentación teórica para la creación de un programa académico de ingeniería y ciencia de datos: una aplicación bibliométrica

Frederick Andrés Mendoza-Lozano; Jose Wilmar Quintero-Peña; Oscar Leonardo Acevedo-Pabón; Jose Félix García-Rodríguez

doi:10.15649/2346030X.2586

Authors

Frederick Andrés Mendoza-Lozano Institución Universitaria Politécnico Grancolombiano - Bogotá, Colombia https://orcid.org/0000-0001-5087-4476
Jose Wilmar Quintero-Peña Institución Universitaria Politécnico Grancolombiano - Bogotá, Colombia https://orcid.org/0000-0002-6172-0453
Oscar Leonardo Acevedo-Pabón Institución Universitaria Politécnico Grancolombiano - Bogotá, Colombia https://orcid.org/0000-0002-1270-4166
Jose Félix García-Rodríguez Universidad Veracruzana - Xalapa-Enríquez, México https://orcid.org/0000-0002-7319-1472

DOI:

https://doi.org/10.15649/2346030X.2586

Keywords:

Data science, Bibliometry, Data mining, Big data, Classical statistics, Machine learning, Curriculum

Abstract

The aim was to define a theoretical approach to data science, which includes object of study and methods, as a previous step for the curricular design of an academic program. The text begins with a review of the literature regarding the evolution of the concept of data and the epistemological foundations of statistics and data analysis, through the use of algorithms. The bibliometry of the most relevant scientific production continues, making use of the thematic characterization approach, using keywords taken from works indexed in SCOPUS. It was found that most of the relevant keywords and themes refer to the methods of data modeling with algorithms and the management of technology for the administration of large databases. The productivity of the analysis of data derived from textual, multimedia and web information was characterized. The themes related to business applications aimed at knowledge management and business intelligence were also revealed. The concept of data, as an object of study, is extended thanks to the scope of data analysis with algorithms; This method is combined with the classical statistical approach, which provides formal models of better interpretation. It was concluded that the field of application of the new data science is quite broad, particularly when this science is used in interdisciplinary contexts. The above justifies the curricular design of an academic program focused on this subject.

References

J. W. Tukey, “The future of data analysis,” Ann. Math. Stat., vol. 33, no. 1, pp. 1–67, 1962.

C. Maldonado and N. A. Gómez-Cruz, El mundo de las ciencias de la complejidad. Un estado del arte. Bogotá, Colombia: Universidad del Rosario, 2010.

C. Merow et al., “What do we gain from simplicity versus complexity in species distribution models?,” Ecography (Cop.)., vol. 37, no. 12, pp. 1267–1281, 2014, doi: 10.1111/ecog.00845.

K. V Katsikopoulos, “Bounded rationality: the two cultures,” J. Econ. Methodol., vol. 21, no. 4, pp. 361–374, 2014, doi: 10.1080/1350178X.2014.965908.

R. Descartes, Discurso del método. Ediciones Colihue SRL, 2004.

M. Bunge, “La ciencia: su método y su filosofía,” 1978.

M. Frické, “Big data and its epistemology,” J. Assoc. Inf. Sci. Technol., vol. 66, no. 4, pp. 651–661, 2015, doi: 10.1002/asi.23212.

D. Donoho, “50 years of data science,” J. Comput. Graph. Stat., vol. 26, no. 4, pp. 745–766, 2017, doi: 10.1080/10618600.2017.1384734.

L. Breiman, “Statistical modeling: The two cultures (with comments and a rejoinder by the author),” Stat. Sci., vol. 16, no. 3, pp. 199–231, 2001, doi: 10.1111/j.1740-9713.2005.00129.x.

K. Mardia and W. Gilks, “Meeting the statistical needs of 21st-century science,” Significance, vol. 2, no. 4, pp. 162–165, 2005, doi: 10.1111/j.1740-9713.2005.00129.x.

W. M. Briggs, “Everything wrong with p-values under one roof,” Studies in Computational Intelligence, vol. 809. Springer Verlag, 340 E. 64th Apt 9A, New York, United States, pp. 22–44, 2019, doi: 10.1007/978-3-030-04200-4_2.

T. Derrick, “The criticism of inferential statistics,” Educ. Res., vol. 19, no. 1, pp. 35–40, 1976.

J. R. Jamison, “The use of inferential statistics in health and disease: a warning,” South African Med. J., vol. 57, no. 19, pp. 783–785, 1980.

B. L. Hopkins, B. L. Cole, and T. L. Mason, “A critique of the usefulness of inferential statistics in applied behavior analysis,” Behav. Anal., vol. 21, no. 1, pp. 125–137, 1998.

A. Charpentier, E. Flachaire, and A. Ly, “Econometrics and machine learning,” Econ. Stat., vol. 2018, no. 505–506, pp. 147–169, 2018, doi: 10.24187/ecostat.2018.505d.1970.

D. Qin, “Let’s take the bias out of econometrics,” J. Econ. Methodol., vol. 26, no. 2, pp. 81–98, 2019, doi: 10.1080/1350178X.2018.1547415.

S. Athey and G. W. Imbens, “Machine Learning Methods That Economists Should Know about,” Annu. Rev. Econom., vol. 11, pp. 685–725, 2019, doi: 10.1146/annurev-economics-080217-053433.

M. Molina and F. Garip, “Machine Learning for Sociology,” Annual Review of Sociology, vol. 45. Annual Reviews Inc., Department of Sociology, Cornell University, Ithaca, NY 14853, United States, pp. 27–45, 2019, doi: 10.1146/annurev-soc-073117-041106.

S. Mützel, “Facing big data: Making sociology relevant,” Big Data Soc., vol. 2, no. 2, p. 2053951715599179, 2015.

D. A. McFarland, K. Lewis, and A. Goldberg, “Sociology in the era of big data: The ascent of forensic social science,” Am. Sociol., vol. 47, no. 1, pp. 12–35, 2016.

K. Healy and J. Moody, “Data visualization in sociology,” Annu. Rev. Sociol., vol. 40, pp. 105–128, 2014.

P. Barrett, “What if there were no psychometrics? Constructs, complexity, and measurement,” J. Pers. Assess., vol. 85, no. 2, pp. 134–140, 2005, doi: 10.1207/s15327752jpa8502_05.

N. Bolger, “Data analysis in social psychology,” Handb. Soc. Psychol., vol. 1, pp. 233–265, 1998.

D. Bzdok and J. P. A. Ioannidis, “Exploration, Inference, and Prediction in Neuroscience and Biomedicine,” Trends Neurosci., vol. 42, no. 4, pp. 251–262, 2019, doi: 10.1016/j.tins.2019.02.001.

A. L. Boulesteix and M. Schmid, “Machine learning versus statistical modeling,” Biometrical J., vol. 56, no. 4, pp. 588–593, 2014, doi: 10.1002/bimj.201300226.

J. Wang and Q. Tao, “Machine learning: The state of the art,” IEEE Intell. Syst., vol. 23, no. 6, pp. 49–55, 2008.

R. Gould, “Data literacy is statistical literacy,” Stat. Educ. Res. J., vol. 16, no. 1, pp. 22–25, 2017.

P. Bühlmann, “Comments on: Data science, big data and statistics,” Test, vol. 28, no. 2, pp. 330–333, 2019, doi: 10.1007/s11749-019-00646-6.

S. Mullainathan and J. Spiess, “Machine learning: an applied econometric approach,” J. Econ. Perspect., vol. 31, no. 2, pp. 87–106, 2017, doi: 10.1257/jep.31.2.87.

J. Blumenstock, G. Cadamuro, and R. On, “Predicting poverty and wealth from mobile phone metadata,” Science (80-. )., vol. 350, no. 6264, pp. 1073–1076, 2015, doi: 10.1140/epjds/s13688-017-0125-5.

L. Dong, S. Chen, Y. Cheng, Z. Wu, C. Li, and H. Wu, “Measuring economic activities of China with mobile big data,” arXiv Prepr. arXiv1607.04451, 2016, doi: 10.1140/epjds/s13688-017-0125-5.

B. Yu, “Embracing statistical challenges in the information technology age,” Technometrics, vol. 49, no. 3, pp. 237–248, 2007, doi: 10.1198/004017007000000254.

S. Tonidandel, E. B. King, and J. M. Cortina, “Big Data Methods: Leveraging Modern Data Analytic Techniques to Build Organizational Science,” Organ. Res. Methods, vol. 21, no. 3, pp. 525–547, 2018, doi: 10.1177/1094428116677299.

B. Beaton, A. Acker, L. Di Monte, S. Setlur, T. Sutherland, and S. E. Tracy, “Debating data science: A roundtable,” Radic. Hist. Rev., vol. 2017, no. 127, pp. 133–148, 2017, doi: 10.1215/01636545-3690918.

D. V Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine learning interpretability: A survey on methods and metrics,” Electron., vol. 8, no. 8, 2019, doi: 10.3390/electronics8080832.

P. J. H. Daas, M. J. Puts, B. Buelens, and P. A. M. van den Hurk, “Big data as a source for official statistics,” J. Off. Stat., vol. 31, no. 2, pp. 249–262, 2015, doi: 10.1515/JOS-2015-0016.

M. Aria and C. Cuccurullo, “bibliometrix: An R-tool for comprehensive science mapping analysis,” J. Informetr., vol. 11, no. 4, pp. 959–975, 2017, doi: 10.1016/j.joi.2010.10.002.

M. J. Cobo, A. G. López-Herrera, E. Herrera-Viedma, and F. Herrera, “An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field,” J. Informetr., vol. 5, no. 1, pp. 146–166, 2011, doi: 10.1016/j.joi.2010.10.002.

V. Batagelj and M. Cerinšek, “On bibliographic networks,” Scientometrics, vol. 96, no. 3, pp. 845–864, 2013, doi: 10.1007/s11192-012-0940-1.

K. Börner, C. Chen, and K. W. Boyack, “Visualizing knowledge domains,” Annu. Rev. Inf. Sci. Technol., vol. 37, no. 1, pp. 179–255, 2003, doi: 10.1002/aris.1440370106.

C. Cuccurullo, M. Aria, and F. Sarto, “Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains,” Scientometrics, vol. 108, no. 2, pp. 595–611, 2016.

M. Callon, J. P. Courtial, and F. Laville, “Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry,” Scientometrics, vol. 22, no. 1, pp. 155–205, 1991.