Experimental environment of distributed data processing integrating devops in the software delivery cycle
DOI:
https://doi.org/10.15649/2346030X.3011Keywords:
data processing, big data devOps, hadoop, spark, data, cloud, aws, cluster, infrastructure as codeAbstract
Undoubtedly, the generation of large volumes of data from different sources has allowed organizations to obtain value and knowledge from the data generated. Therefore, companies need specialists who are able to digest this data and turn it into useful information. An important theme is the way in which students can adopt theoretical knowledge in a practical way in big data environments, cloud technologies and or tools demanded in the market avoiding extensive configurations.
In this paper we create an experimental big data environment, describing the concept as such, its reference architectures and components, designing and implementing an architecture for a distributed data processing cluster, integrating Devops in a continuous software delivery flow; through an automated deployment of big data processing infrastructure as code in the cloud.
References
C. Howard, “Top Priorities for IT: LeadershipVision for 2021, Data and Analytics Leaders,” 2020, [Online]. Available: gartner.com.
D. Smith, D. Villaba, M. Irvine, D. Stanke, and N. Harvey, “Accelerate State of DevOps 2021,” p. 45, 2021, [Online]. Available: https://cloud.google.com/blog/products/devops-sre/announcing-dora-2021-accelerate-state-of-devops-report.
T. Sousa, H. S. Ferreira, and F. F. Correia, “A Survey on the Adoption of Patterns for Engineering Software for the Cloud,” IEEE Trans. Softw. Eng., vol. 5589, no. c, pp. 1–13, 2021, doi: 10.1109/TSE.2021.3052177.
“What is a Cloud Engineer and How Do You Become One?” https://www.techtarget.com/searchcloudcomputing/definition/cloud-engineer (accessed Mar. 14, 2023).
E. Bello, “¿Qué es Data Engineering? Funciones, requisitos y salario,” Think. Innov., Oct. 2022, Accessed: Mar. 14, 2023. [Online]. Available: https://www.iebschool.com/blog/data-engineering-big-data/.
S. Ananthi and S. Hariganesh, “A comprehensive study on cloud computing,” ICIIECS 2015 - 2015 IEEE Int. Conf. Innov. Information, Embed. Commun. Syst., 2015, doi: 10.1109/ICIIECS.2015.7193151.
Q. Rida, “A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Solution,” Int. J. Educ. Manag. Eng., vol. 10, no. 4, pp. 8–17, 2020, doi: 10.5815/ijeme.2020.04.02.
B. Leonel Goldman Cita and B. Leonel Goldman, “El Big Data y la Analítica de Negocios en el capitalismo informacional,” p. 8, 2017, [Online]. Available: https://www.aacademica.org.
J. Cao, M. Lin, and X. Ma, “A survey of big data for IoT in cloud computing,” IAENG Int. J. Comput. Sci., vol. 47, no. 3, pp. 585–592, 2020.
S. Zhelev and A. Rozeva, “Big data processing in the cloud - Challenges and platforms,” AIP Conf. Proc., vol. 1910, no. December 2017, 2017, doi: 10.1063/1.5014007.
“Chapter 1: What is Software Architecture? | Microsoft Docs.” https://docs.microsoft.com/en-us/previous-versions/msp-n-p/ee658098(v=pandp.10)?redirectedfrom=MSDN (accessed Apr. 11, 2022).
P. Mell and T. Grance, “The NIST-National Institute of Standars and Technology- Definition of Cloud Computing,” NIST Spec. Publ. 800-145, p. 7, 2011.
M. I. Malik, “Cloud Computing-Technologies,” Int. J. Adv. Res. Comput. Sci., vol. 9, no. 2, pp. 379–384, 2018, doi: 10.26483/ijarcs.v9i2.5760.
I. Ashraf, “An Overview of Service Models of Cloud Computing,” Int. J. Multidiscip. Curr. Res., vol. 2, no. August 2014, pp. 779–783, 2014, [Online]. Available: http://ijmcr.com/wp-content/uploads/2014/08/Paper18779-783.pdf.
C. Ebert, G. Gallardo, J. Hernantes, and N. Serrano, “DevOps,” 2016.
M. Artac, T. Borovssak, E. Di Nitto, M. Guerriero, and D. A. Tamburri, “DevOps: Introducing infrastructure-as-code,” Proc. - 2017 IEEE/ACM 39th Int. Conf. Softw. Eng. Companion, ICSE-C 2017, no. May, pp. 497–498, 2017, doi: 10.1109/ICSE-C.2017.162.
S. E. Bibri and J. Krogstie, “Towards a novel model for smart sustainable city planning and development: A scholarly backcasting approach,” J. Futur. Stud., vol. 24, no. 1, pp. 45–62, 2019, doi: 10.6531/JFS.201909_24(1).0004.
宗成庆, “State of Software development,” p. 48, 2021.
G. Ruijun, “A Lightweight Experimental Platform for Big Data Based on Docker Containers,” J. Phys. Conf. Ser., vol. 1437, no. 1, 2020, doi: 10.1088/1742-6596/1437/1/012104.
K. Miao, J. Li, W. Hong, and M. Chen, “A Microservice-Based Big Data Analysis Platform for Online Educational Applications,” Sci. Program., vol. 2020, 2020, doi: 10.1155/2020/6929750.
M. Gupta, M. N. Chowdary, S. Bussa, and C. K. Chowdary, “Deploying Hadoop Architecture Using Ansible and Terraform,” 2021 5th Int. Conf. Inf. Syst. Comput. Networks, ISCON 2021, pp. 1–6, 2021, doi: 10.1109/ISCON52037.2021.9702299.
S. Saxena, S. K. Gupta, S. Poongodi, and P. Singh, “Turkish Journal of Computer and Mathematics Education Vol . 12 No . 11 ( 2021 ), 2507- 2521 Research Article A modern approach to building a data science framework delivery pipeline using DevOps practices,” vol. 12, no. 11, pp. 2507–2521, 2021.
D. Yang et al., “DevOps in practice for education management information system at ECNU,” Procedia Comput. Sci., vol. 176, pp. 1382–1391, 2020, doi: 10.1016/j.procs.2020.09.148.
D. Blazquez and J. Domenech, “Big Data sources and methods for social and economic analyses,” Technol. Forecast. Soc. Change, vol. 130, no. March 2017, pp. 99–113, 2018, doi: 10.1016/j.techfore.2017.07.027.
A. Gonçalves, F. Portela, M. F. Santos, and F. Rua, “Towards of a Real-time Big Data Architecture to Intensive Care,” Procedia Comput. Sci., vol. 113, pp. 585–590, 2017, doi: 10.1016/j.procs.2017.08.294.
N. Naik, “Docker container-based big data processing system in multiple clouds for everyone,” 2017 IEEE Int. Symp. Syst. Eng. ISSE 2017 - Proc., 2017, doi: 10.1109/SysEng.2017.8088294.
J. Bhimani, Z. Yang, M. Leeser, and N. Mi, “Accelerating big data applications using lightweight virtualization framework on enterprise cloud,” 2017 IEEE High Perform. Extrem. Comput. Conf. HPEC 2017, 2017, doi: 10.1109/HPEC.2017.8091086.
V. L., Camargo, J. J. Camargo-Ortega, and J. F. . Joyanes-Aguilar;, “Vista de Arquitectura vertida,” vol. 1, pp. 7–18, 2015, doi: https://doi.org/10.14483/udistrital.jour.RC.2015.21.a1.
“Terraform by HashiCorp.” https://www.terraform.io/ (accessed Apr. 14, 2022).
“¿Qué es AWS?” https://aws.amazon.com/es/what-is-aws/ (accessed Nov. 13, 2020).
“Apache Hadoop 3.3.2 – HDFS Architecture.” https://hadoop.apache.org/docs/r3.3.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction (accessed Apr. 14, 2022).
“Overview - Spark 3.2.1 Documentation.” https://spark.apache.org/docs/latest/ (accessed Apr. 15, 2022).
S. Salloum, R. Dautov, · Xiaojun Chen, · Patrick, X. Peng, and J. Z. Huang, “Big data analytics on Apache Spark,” Int. J. Data Sci. Anal., vol. 1, pp. 145–164, 2016, doi: 10.1007/s41060-016-0027-9.
Downloads
Published
How to Cite
Issue
Section
Altmetrics
Downloads
License
The journal offers open access under a Creative Commons Attibution License
This work is under license Creative Commons Attribution (CC BY 4.0).