Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Database management"

Sort by: Order: Results:

  • Hartzell, Kai (2023)
    The concept of big data has gained immense significance due to the constant growth of data sets. The primary challenge lies in effectively managing and extracting valuable conclusions from this ever-expanding data. To address this challenge, the need for more efficient data processing frameworks has become essential. This thesis delves deeply into the concept of big data by first introducing and defining it comprehensively. Subsequently, the thesis explores a range of widely used open-source frameworks, some of which have been in existence for a considerable period already, while others have been developed to enhance the efficiency and particular aspects further. At the beginning of the thesis, three popular frameworks—MapReduce, Apache Hadoop, and Spark—are introduced. Following this, the thesis introduces popular data storage concepts and SQL engines, highlighting the growing adoption of SQL as an effective way of interaction within the field of big data analytics. The reasons behind this choice are explored, and the performances and characteristics of these systems are compared. In the later sections of the thesis, the focus shifts towards big data cloud services, with a particular emphasis on AWS (Amazon Web Services). Alternative cloud service providers are also discussed in brief. The thesis culminates in a practical demonstration of data analysis conducted on a selected dataset within three selected AWS cloud services. This involves creating scripts to gather and process data, establishing ETL pipelines, configuring databases, conducting data analysis, and documenting the experiments. The goal is to assess the advantages and disadvantages of these services and to provide a comprehensive understanding of their functionalities.