Browsing by study line "Ohjelmistot"
Now showing items 1-20 of 80
-
(2024)Message-oriented middleware (MOM) serves as the intermediary component between the nodes of a distributed system, facilitating their communication and data exchange. By decoupling the interconnected nodes of a system, MOM technologies enable scalable and fault-tolerant messaging, supporting real-time data streams, event-driven architectures and microservices communication. Given the increasing reliance on distributed computing and data-intensive applications, understanding the performance and operational characteristics of MOM technologies is paramount. This master's thesis investigates the comparative performance and operational aspects of two prominent MOM solutions, Apache Kafka and Apache Pulsar, through a systematic literature review (SLR). The key characteristics under inspection are throughput, latency, resource utilization, fault tolerance, security and operational complexity. This study offers a comprehensive analysis to aid informed decision-making in real-world deployment scenarios and augments the existing body of literature. The results of this SLR show that consensus on throughput and latency superiority between Kafka and Pulsar remains elusive. Pulsar demonstrates advantages in resource utilization and security, whereas Kafka stands out for its maturity and operational simplicity.
-
(2024)Monolithic and microservice architectures represent two different approaches to building and organizing software systems. Monolithic architecture offers various advantages, such as simplicity in application deployment, smaller resource requirements, and lower latency. On the other hand, microservice architecture provides benefits in aspects including scalability, reliability, and availability. However, the advantages of each architecture may depend on various sectors especially when it comes to application performance and resource consumption. This thesis aims to provide insights into the differences in application performance and resource consumption between the two architectures by conducting a systematic literature review on the existing literature and research results in this regard and performing a benchmarking with various load tests on two applications of identical functionalities but using the two above mentioned different architectures. Results from the load tests revealed the applications in both software architectures delivered satisfactory outcomes. However, the test outputs indicated the microservice system outperformed by a high margin in nearly all test cases in aspects including throughput, efficiency, stability, scalability, and resource effectiveness. Based on the research outcomes from the reviewed literature, in general, monolithic design is more efficient and cost-effective for simple applications with small user loads. While microservice architecture is more advantageous for large and complex applications targeting high traffic and deployment in cloud environments. Nevertheless, the overall research results indicated both architectures have strengths and drawbacks in different aspects. Both architectures are used in many successful instances of applications. The differences between the two architectures in application performance and resource effectiveness depend on various factors, including application scale and complexity, traffic load, resource availability, and deployment environments.
-
(2022)In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
-
(2023)Context: The Bank of Finland, as the national monetary and central bank of Finland, possesses an extensive repository of data that fulfills both the statistical needs of international organizations and the federal requirements. Data scientists within the bank are increasingly interested in investing in machine learning (ML) capabilities to develop predictive models. MLOps offers a set of practices that ensure the reliable and efficient maintenance and deployment of ML models. Objective: In this thesis, we focus on addressing how to implement an ML pipeline within an existing environment. The case study is explorative in nature, with the primary objective of gaining deeper insight into MLOps tools and their practical implementation within the organization. Method: We apply the design science research methodology to divide design and development into six tasks: problem identification, objective definition, design and development, demonstration, evaluation, and communication. Results: We select the tools for the MLOps based on the user requirements and the existing environment, and then we design and develop a simplified end-to-end ML pipeline utilizing the chosen tools. Lastly, we conduct an evaluation to measure the alignment between the selected tools and the initial user requirements.
-
(2024)Agile software development and DevOps are both well studied methodologies in the field of computer science. Agile software development is an iterative development approach that focuses on collaboration, customer feedback and fast deliveries. DevOps on the other hand highlights the co-operation between the developers and IT operations personnel, in addition to describing how to continuously deploy working software with usage of tools and automation. Even though these two methodologies share similarities and DevOps as a concept can even be seen as a descendant of agile software development, the relationship between the two is not yet as explored as the effects of individual practices. In this thesis, a systematic literature review is conducted to examine the relationship between agile software development and DevOps. The aim was to find benefits and drawbacks of the combined implementation agile software development and DevOps in the field of software development, the key similarities and differences between the two and how the adoption of one methodology influences the implementation of the other. A systematic literature review was conducted to find information on how agile software development and DevOps are related and perform in combination. Results showed that agile software development and DevOps share a complex yet symbiotic relationship. The complementary role of each methodology enhances each other and in unison these methodologies address wider variety of aspects in software development lifecycle. This combination shows a wide array of promising benefits such as improvements in productivity, delivery speed and collaboration. It however presents challenges related to required culture shift and lack of knowledge, for example, that organizations need to be wary of and acknowledge.
-
(2024)AI is becoming more and more common in everyday life, and thus, setting guidelines to help create ethical AI is critical. To be able to set guidelines it is necessary to understand what is thought of as ethical AI. To tackle this issue, this study attempts to answer the following questions: what ethical values are thought of as the most important ones for artificial intelligence, are there differences between personal ethical values and ethical values for artificial intelligence, and does culture influence personal ethical values or ethical values chosen for artificial intelligence? The study uses data from the open online course Ethics of AI, where students study different ethical aspects of AI. From this course two exercises were chosen to be studied. In the first exercise students had to pick five of the most important ethical values out of 21. In the second exercise, students had to rate 18 ethical values according to how important they are for AI. As the course is arranged in Finnish and English, it gave the opportunity to compare the results between them and to create a third dataset from the English dataset after the Finnish version of the course was launched in late 2021. The English dataset contained 2650 students, the Finnish dataset 488 students, and the English 2022 onwards 1159 students. This data was studied from the different language datasets grouped up and individually. First the grouped data was studied to learn which were the most popular personal ethical values and most popular ethical values for AI. After the same analyzes were done to the individual datasets to learn what were the results for them and if they had different results. An exploratory factor analysis (EFA) was performed to find factors between each ethical value for artificial intelligence, and this was continued by a K-Means cluster analysis to classify different variations of ethical values students gave for AI. The results indicate that personal ethical values are shifted to more safe, fair and societal ethical values when considering what are important for AI. This reflects that the students found these safe, fair and societal values the most important for AI. While some differences were found between the ethical values students prioritized between the course iterations, each course iteration had similar ethical values as the driving force in each dataset.
-
(2023)Machine Learning Operations (MLOps), derived from DevOps, aims to unify the development, deployment, and maintenance of machine learning (ML) models. Continuous training (CT) automatically retrains ML models, and continuous deployment (CD) automatically deploys the retrained models to production. Therefore, they are essential for maintaining ML model performance in dynamic production environments. The existing proprietary solutions suffer from drawbacks such as a lack of transparency and potential vendor lock-in. Additionally, current MLOps pipelines built using open-source tools still lack flexible CT and CD for ML models. This study proposes a cloud-agnostic and open-source MLOps pipeline that enables users to retrain and redeploy their ML models flexibly. We applied the Design Science methodology, consisting of identifying the problem, defining the solution objectives, and implementing, demonstrating, and evaluating the solution. The resulting solution is an MLOps pipeline called CTCD-e MLOps pipeline. We formed a conceptual model of the needed functionalities of our MLOps pipeline and implemented the pipeline using only open-source tools. The CTCD-e MLOps pipeline runs atop Kubernetes. It can autonomously adapt ML models to dynamic production data by automatically starting retraining ML models when their performance degrades. It can also automatically A/B test the performance of the retrained models in production and fully deploys them only when they outperform their predecessors. Our demonstration and evaluation of the CTCD-e MLOps pipeline show that it is cloud-agnostic and can also be installed in on-premises environments. Additionally, the CTCD-e MLOps pipeline enables its users to flexibly configure model retraining and redeployment as well as production A/B test of the retrained models based on various requirements.
-
(2022)The rapid progress of communication technologies combined with the growing competition for talents and knowledge has made it necessary to reassess the potential of distributed development which has significantly changed the landscape of the IT industry introducing a variety of cooperation models and making notable changes to the software team work environment. Along with this, enterprises pay more attention to teams’ performance improvement, employing emerging management tools for building up efficient software teams, and trying to get the most out of understanding factors which significantly impact a team’s overall performance. The objective of the research is to systematize factors characterizing high-performing software teams; indicate the benefits of global software development (GSD) models positively influencing software teams’ development performance; and study how companies’ strategies can benefit from distributed development approaches in building high-performing software teams. The thesis is designed as a combination of a systematic literature review followed by qualitative research in the form of semi-structured interviews to validate the findings regarding classification of GSD models’ benefits and their influence on the development of high-performing software teams. At a literature review stage, the research (1) introduces a team performance factors’ model reflecting the aspects which impact the effectiveness of development teams; (2) suggests the classification of GSD models based on organizational, legal, and temporal characteristics, and (3) describes the benefits of GSD models which influence the performance of software development teams. Within the empirical part of the study, we refine the classification of GSD models’ benefits based on the qualitative analysis results of semi-structured interviews with practitioners from IT industry, form a comparison table of GSD benefits depending on the model in question, and introduce recommendations for company and team management regarding the application of GSD in building high-performing software teams. IT corporations, to achieve their strategic goals, can enrich their range of available tools for managing high-performing teams by considering the peculiarities of different GSD models. Company and team management should evaluate the advantages of the distributed operational models, and use the potential and benefits of available configurations to increase teams’ performance and build high-performing software teams.
-
(2024)The MOOC Center of University of Helsinki maintains a learning management system, primarily used in the online courses offered by the Department of Computer Science. The learning management system is being used in more courses, leading to a need for additional exercise types. In order to satisfy this need, we plan to use additional teams of developers to create these exercise types. However, we would like to minimize any negative effects that the new exercise types may have on the overall system, specifically regarding stability and security. In this work, we propose a plugin system for creating new exercise types, and implement it to production system used by real students. The system's plugins are deployed as separate services and use sandboxed IFrames for their user interfaces. Communication with the plugins occurs through the use of HTTP requests and message passing. The designed plugin system fulfilled its aims and worked in its production deployment. Notably, it was concluded that it is challenging for plugins to disrupt the host system. This plugin system serves as an example that it is possible to create a plugin system where the plugins are isolated from the host system.
-
(2023)Artificial Intelligence (AI) has revolutionized various domains of software development, promising solutions that can adapt and learn. However, the rise of AI systems has also been accompanied by ethical concerns, primarily related to the unintentional biases these systems can inherit during the development process. This thesis presents a thematic literature review aiming to identify and examine the existing methodologies and strategies for preventing bias in iterative AI software development. Methods employed for this review include a formal search strategy using defined inclusion and exclusion criteria, and a systematic process for article sourcing, quality assessment, and data collection. 29 articles were analyzed, resulting in the identification of eight major themes concerning AI bias mitigation within iterative software development, ranging from bias in data and algorithmic processes to fairness and equity in algorithmic design. Findings indicate that while various approaches for bias mitigation exist, gaps remain. These include the need for adapting strategies to agile or iterative frameworks, resolving the trade-off between effectiveness and fairness, understanding the complexities of bias for tailored solutions, and assessing the real-world applicability of these techniques. This synthesis of key trends and insights highlights these specific areas requiring further research.
-
(2022)RESTful web APIs have gained significant interest over the past decade, especially among large businesses and organizations. However, an important part of being able to use these public web APIs is the knowledge on how to access, consume, and integrate them into applications. Since developers are the primary audience that will be doing the integration it is important to support them throughout their API adoption journey. For this, many of today's companies that are heavily invested in web APIs provide an API developer portal as part of their API management program. However, very little accessible and comprehensive information on how to build and structure API developer portals exist yet. This thesis presents a conducted exploratory multi-case case study of three publicly available API developer portals of three different commercial businesses. The objective of the case study was to identify the developer (end-user) oriented features and capabilities present on the selected developer portals, in order to understand the kinds of information and capabilities API developer portals could provide for developers in general. The exploration was split into three key focus areas: developer onboarding, web API documentation, and developer support and engagement. Based on these, three research questions were formulated respectively. The data consisted of field notes that described observations about the portals. These notes were grouped by location and action, and analyzed to identify a key feature or capability as well as any smaller, compounding features and capabilities. The results describe the identified features and capabilities present on the studied API developer portals. Additionally, some differences between the portals are noted. The key contribution of this thesis are the results themselves, which can be used as a checklist when building new API developer portal. However, the main limitation of this study is that its data collection and analysis processes were subjective and the findings are not properly validated. Such improvements will remain for future work.
-
(2023)A case study that studied the performance impact of a node.js component when it was refactored from monolith environment into independent service. The performance study studied the response time of the blocking part of JavaScript code in the component. The non blocking part of the code and the added network overhead from the refactoring were excluded from the performance review. Literature review didn’t show any related research that studied the performance impact of a node.js component when it was refactored from monolith into microservices. Many found studies were found that studied the response time and throughput of REST API build with node.js with comparisons to other programming languages. A study were found that related to refactoring an application from monolith into microservices. None of the found studies were directly related to the studied case. It was noted that the response time of the component improved by 46.5% when it was refactored from monolith into microservice. It is possible that when a node.js monolith application grows it starts to affect the throughput of the event loop affecting performance critical components. For the case component it was beneficial to refactor it into independent service in order to gain the 92.6ms in the mean response time.
-
(2023)Background: BIM (Building Information Modelling) has helped the construction industry with better workflow and collaboration. To further integrate technologies into the construction industry, research and applications are actively integrating cloud technologies into traditional BIM design workflow. Such integration can be referred to as Cloud BIM, which is considered the second generation of BIM development. Cloud BIM is related to many aspects including technical implementation, workflow improvement, and collaboration of different roles. Aims: In this thesis, we want to find the current situation of Cloud BIM, identifying the benefits and challenges as well as possible technical solutions to the challenges. Methods: We conducted a literature review and analyzed eleven selected papers to gather the necessary data for this thesis. We then did a case study of an integration of two applications, to understand the real challenges in an actual implementation of a cloud-based BIM solution. Results: Cloud BIM can mainly benefit collaboration and information exchange. However, many challenges still exist both in technical and non-technical aspects that require more work. Our integration explored a deeper and more cloud-based solution in a certain process of BIM projects. The main challenge we faced is inconsistent data standards. Conclusions: The results show that the industry is on the way to integrating the cloud into BIM. However, more work needs to be done to overcome the challenges.
-
(2023)This Master’s Thesis describes an original user study that took place at the University of Helsinki. The study compares and evaluates the usability of three different methods that are used in meeting rooms to share a private device’s screen on a big public screen in order to give a slideshow presentation: HDMI, VIA, and Ubicast. There were 18 participants. The study was conducted in a controlled environment, replicating a typical meeting room setup. The experiment consisted of screen mirroring tasks and an interview. In a screen mirroring task, the participants were asked to share their screen using each of the three technologies. They were provided with the necessary equipment and user guides if needed. Then the participants were given training on how to use the technologies, and they performed the tasks again. During the task, the time taken to complete each screen mirroring session was recorded, and any errors or difficulties encountered were noted. After completing the screen mirroring tasks, participants were interviewed to gather qualitative data on their experiences and preferences. They were asked about the ease of use, efficiency, and any difficulties they faced while using each technology. This information was used to gain insights into user preferences and potential areas for improvement in the respective technologies. To analyze the data, the System Usability Scale (SUS) scores and time taken to complete the screen mirroring tasks were calculated for each technology. Statistical analyses were conducted to determine any significant differences in SUS scores and time across the three technologies. Additionally, the interview data was analyzed using thematic analysis to identify common themes and patterns in the experiences of the users. HDMI emerged on the top, with Ubicast not far behind.
-
(2023)The concept of big data has gained immense significance due to the constant growth of data sets. The primary challenge lies in effectively managing and extracting valuable conclusions from this ever-expanding data. To address this challenge, the need for more efficient data processing frameworks has become essential. This thesis delves deeply into the concept of big data by first introducing and defining it comprehensively. Subsequently, the thesis explores a range of widely used open-source frameworks, some of which have been in existence for a considerable period already, while others have been developed to enhance the efficiency and particular aspects further. At the beginning of the thesis, three popular frameworks—MapReduce, Apache Hadoop, and Spark—are introduced. Following this, the thesis introduces popular data storage concepts and SQL engines, highlighting the growing adoption of SQL as an effective way of interaction within the field of big data analytics. The reasons behind this choice are explored, and the performances and characteristics of these systems are compared. In the later sections of the thesis, the focus shifts towards big data cloud services, with a particular emphasis on AWS (Amazon Web Services). Alternative cloud service providers are also discussed in brief. The thesis culminates in a practical demonstration of data analysis conducted on a selected dataset within three selected AWS cloud services. This involves creating scripts to gather and process data, establishing ETL pipelines, configuring databases, conducting data analysis, and documenting the experiments. The goal is to assess the advantages and disadvantages of these services and to provide a comprehensive understanding of their functionalities.
-
(2024)Accurately predicting a ship’s fuel consumption is essential for an efficient shipping operation. A prediction model has to be regularly retrained to minimize drift between its predictions and the actual consumption of the ship since a ship’s performance is constantly changing because of weather influences and constant hull fouling. Continuous Learning (CL) promises repeated retraining of an ML model while also mitigating catastrophic forgetting. The so-called catastrophic forgetting happens when a model is trained on new data without proper measures to “remind” the model of its previous knowledge. In the context of Ship Performance Prediction, this might be previously encountered weather or performance patterns in certain conditions. This thesis explores the adaptability of CL to set up a production-ready training pipeline to regularly retrain a model that predicts a ship’s fuel consumption.
-
(2023)Software testing is an important part of ensuring software quality. Studies have shown that having more tests results in a lower count of defects. Code coverage is a tool used in software testing to find parts of the software that require further testing and to learn which parts have been tested. Code coverage is generated automatically by the test suites during test execution. Many types of code coverage metrics exist, the most common being line coverage, statement coverage, function coverage, and branch coverage metrics. These four common metrics are usually enough, but there are many specific coverage types for specific purposes, such as condition coverage which tells how many boolean conditions have been evaluated as true and false. Each different metric gives hints on how the codebase is tested. A common consensus amongst practitioners is that code coverage does not correlate much with software quality. The correlation of software quality with code coverage is a historically broadly researched topic, which has importance both in academia and professional practice. This thesis investigates if code coverage correlates with software quality by performing a literature review. Surprising results are derived from the literature review, as most studies included in this thesis point towards code coverage correlating with software quality. This positive correlation comes from 22 studies conducted between 1995-2021, which include Academic and Industrial studies, with studies put into multiple categories, such as Correlation or No correlation based on the key finding, and categories such as Survey studies, Case studies, Open-source studies, based on the study type. Each category has most studies pointing towards a correlation. This finding is in contradiction with the opinions of professional practitioners.
-
(2022)One of the problems with the modern widespread use of cloud services pertains to geographical location. Modern services often employ location-dependent content, in some cases even data that should not end up outside a certain geographical region. A cloud service provider may however have reasons to move services to other locations. An application running in a cloud environment should have a way to verify the location of both it and its data. This thesis describes a new solution to this problem by employing a permanently deployed hardware device which provides geolocation data to other computers in the same local network. A protocol suite for applications to check their geolocation is developed using the methodology of design science research. The protocol suite thus created uses many tried-and-true cryptographic protocols. A secure connection is established between an application server and the geolocation device, during which the authenticity of the device is verified. The location of data is ensured by checking that a storage server indeed has access to the data. Geographical proximity is checked by measuring round-trip times and setting limits for them. The new solution, with the protocol suite and hardware, is shown to solve the problem and fulfill strict requirements. It improves on the results presented in earlier work. A prototype is implemented, showing that the protocol suite can be feasible both in theory and practice. Details will however require further research.
-
Design, Implementation, and Validation of a Uniform Control Interface for Drawing Robots with ROS2 (2023)This master's thesis describes the development and validation of a uniform control interface for drawing robots with ROS2. The robot control software was tasked with taking SVG images as input and producing them as drawings with three different robots. These robots are the Evil Mad Scientist AxiDraw V3/A3, UFACTORY xArm Lite6, and virtual xArm Lite6. The intended use case for the robots and companion control software is experiments studying human perception of the creativity of the drawing robots. The control software was implemented over the course of a little over six months and used a combination of C++ and Python. The design of the software utilizes ROS2 abstractions such as nodes and topics to combine different components of the software. The control software is validated against the given requirements and found to fulfil the main objectives of the project. The most important of these are that the robots successfully draw SVG images, that they do so in a similar time frame, and that these images look very similar. Drawing similarity was tested by scanning images, aligning them using using minimal error, and then comparing them visually after overlaying the images. Comparing aligned images was useful in detecting subtle differences in the drawing similarity of the robots and was used to discover issues with the robot control software. MSE and SSIM were also calculated for a set of these aligned images, allowing for the effect of future changes made to the robot control software to be quantitatively evaluated. Drawing time for the robots was evaluated by measuring the time taken for drawing a set of images. This testing showed that the Axidraw's velocity and acceleration needed to be reduced by 56% so that the xArm Lite6 could draw in similar time.
-
(2024)The concept of digital twins, proposed over a decade ago, has recently gathered increasing attention from both industry and academia. Digital twins are real-time or near-real-time simulations of their physical counterparts and can be implemented across various sectors. In mobile networks, digital twins are valuable for maintenance, long-term planning, and expansion by simulating the effects of new infrastructure and technology upgrades. This capability enables network operators to make informed investment and growth decisions. Challenges in implementing digital twins for mobile networks include resource limitations on mobile devices and scaling the system to a broader level. This thesis introduces a modular and flexible architecture for representing network signals from mobile devices within a digital twin environment. It also proposes a suitable platform for digital twins of mobile network signals and resource-efficient protocols for data transmission. The focus is on developing solutions that ensure scalable and resource-efficient synchronization of real or near-real-time data between digital twins and their physical counterparts. The architecture was evaluated through performance testing in two setups: one where data preprocessing occurs on the devices, and another where preprocessing is entirely offloaded to the digital twin platform. Additionally, scalability was assessed by analyzing the platform's ability to handle connections and data transfer from multiple devices simultaneously. The results demonstrate the system's effectiveness and scalability, providing insights into its practical application in real-world scenarios. These findings underscore the potential for widespread adoption and further development of digital twin technologies in mobile networks.
Now showing items 1-20 of 80