Browsing by Subject "open source"
Now showing items 1-5 of 5
-
(2023)Machine Learning Operations (MLOps), derived from DevOps, aims to unify the development, deployment, and maintenance of machine learning (ML) models. Continuous training (CT) automatically retrains ML models, and continuous deployment (CD) automatically deploys the retrained models to production. Therefore, they are essential for maintaining ML model performance in dynamic production environments. The existing proprietary solutions suffer from drawbacks such as a lack of transparency and potential vendor lock-in. Additionally, current MLOps pipelines built using open-source tools still lack flexible CT and CD for ML models. This study proposes a cloud-agnostic and open-source MLOps pipeline that enables users to retrain and redeploy their ML models flexibly. We applied the Design Science methodology, consisting of identifying the problem, defining the solution objectives, and implementing, demonstrating, and evaluating the solution. The resulting solution is an MLOps pipeline called CTCD-e MLOps pipeline. We formed a conceptual model of the needed functionalities of our MLOps pipeline and implemented the pipeline using only open-source tools. The CTCD-e MLOps pipeline runs atop Kubernetes. It can autonomously adapt ML models to dynamic production data by automatically starting retraining ML models when their performance degrades. It can also automatically A/B test the performance of the retrained models in production and fully deploys them only when they outperform their predecessors. Our demonstration and evaluation of the CTCD-e MLOps pipeline show that it is cloud-agnostic and can also be installed in on-premises environments. Additionally, the CTCD-e MLOps pipeline enables its users to flexibly configure model retraining and redeployment as well as production A/B test of the retrained models based on various requirements.
-
(2020)Software development is massive industry today. Billions of dollars are spent and created on im- material products, and the stakes are very high. The failure modes of commercial software projects have been extensively studied, motivated by the large amounts of money in play. Meanwhile, failures in open source projects have been studied less, despite open-source projects forming a massive part of the computing industry ecosystem today. As more and more companies depend on open-source projects, it becomes imperative to understand the motivations and problems in the volunteer-staffed projects. This thesis opens with an introduction into the history of open source, followed by an overview of the actual day-to-day minutia of the tools and processes used to run a project. After this background context has been established, the existing body of research into open-source motivation is surveyed. The motivation of the people working on these projects has been studied extensively, as it seems illogical that highly-skilled volunteers pour their efforts into supporting a trillion-dollar industry. The existing body of motivation research establishes why people work on open-source projects in general, but it does not explain which projects they choose to work on. Developers drift between projects unguided, as they are free to choose where they allocate their time and energies. The contributions into open-source projects follow a Pareto distribution, and the majority of projects never manage to attract large amounts of contributors. Others lose steam after an internal or external shock drives away the contributors. To explore the latter phenomenon, four case studies are done into crises various open-source projects have faced, and how the actions of project leadership has affected the outcome for the project. Two of the shocks are caused by illegal activities by a project member, and two are caused by social disagreements.
-
(Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2006)The study examines various uses of computer technology in acquisition of information for visually impaired people. For this study 29 visually impaired persons took part in a survey about their experiences concerning acquisition of infomation and use of computers, especially with a screen magnification program, a speech synthesizer and a braille display. According to the responses, the evolution of computer technology offers an important possibility for visually impaired people to cope with everyday activities and interacting with the environment. Nevertheless, the functionality of assistive technology needs further development to become more usable and versatile. Since the challenges of independent observation of environment were emphasized in the survey, the study led into developing a portable text vision system called Tekstinäkö. Contrary to typical stand-alone applications, Tekstinäkö system was constructed by combining devices and programs that are readily available on consumer market. As the system operates, pictures are taken by a digital camera and instantly transmitted to a text recognition program in a laptop computer that talks out loud the text using a speech synthesizer. Visually impaired test users described that even unsure interpretations of the texts in the environment given by Tekstinäkö system are at least a welcome addition to complete perception of the environment. It became clear that even with a modest development work it is possible to bring new, useful and valuable methods to everyday life of disabled people. Unconventional production process of the system appeared to be efficient as well. Achieved results and the proposed working model offer one suggestion for giving enough attention to easily overlooked needs of the people with special abilities. ACM Computing Classification System (1998): K.4.2 Social Issues: Assistive technologies for persons with disabilities I.4.9 Image processing and computer vision: Applications
-
(2024)Machine learning operations (MLOps) is an intersection paradigm between machine learning (ML), software engineering, and data engineering. It focuses on the development and operations of software engineering by providing principles, components, and workflows that form the MLOps operational support system (OSS) platform. The increasing use of ML with increasing data size and model complexity has created a challenge where the MLOps OSS platforms require cloud and high-performance computing environments to achieve flexible and efficient scalability for different workflows. Unfortunately, there are not many open-source solutions that are user-friendly or viable enough to be utilized by an MLOps OSS platform, which is why this thesis proposes a bridge solution utilized by a pipeline to address the problem. We used Design Science Methodology to define the problem, set objectives, design the implementation, demonstrate the implementation, and evaluate the solution. The resulting solutions are an environment bridge called the HTC-HPC bridge and a pipeline called the Cloud-HPC pipeline that uses it. We defined a general model for Cloud-HPC MLOps pipelines to implement the used functions in a use case suitable infrastructure ecosystem and MLOps OSS platform using open-source, provided, and self-implemented software. The demonstration and evaluation showed that the HTC-HPC bridge and Cloud-HPC pipeline provide easy setup, utilized, customizable, and scalable workflow automation, which can be used for typical ML research workflows. However, it also showed that the bridge needed improved multi-tenancy design and that the pipeline required templates for a better user experience. These aspects, alongside testing use case potential and finding real-world use cases, are part of future work.
-
(2023)In my thesis, I explore the roles and responsibilities of software developers as data controllers under the General Data Protection Regulation (hereinafter ‘GDPR’), focusing on the complexities arising from centralised and decentralised software development processes. I address two research questions: (i) taking into account the factors and considerations specific to centralised and decentralised software development processes, how can the roles and responsibilities of software developers as data controllers be determined under the GDPR? and (ii) how may the unique features of Decentralised Applications (hereinafter ‘dApps’) influence the assignment of data controllership in the context of the GDPR? To answer my research questions, I first start by establishing a comprehensive understanding of some relevant core concepts: data controllership, software development, and the varying levels of centralisation in software development. Thereafter, I analyse the roles of individuals within Software Development Companies, SDAs, open source projects, dApps, and smart contracts. In centralised development, assigning controllership is more straightforward, but some complex situations like joint controllership may arise in certain cases. Decentralised software development processes, like in open source projects, complicates the determination of data controllership due to dispersed decision-making across various roles. Examining these roles and different project categories helps to better understand potential data controllership allocations. Furthermore, I discuss specific challenges in determining data controllership in dApps and smart contracts. The totally decentralised nature of dApps and the immutability of its source code further complicates things when trying to identify a single entity with control over the processing of personal data. Additionally, establishing accountability (which is a cornerstone of data controllership), is difficult without control. Currently, no definitive guidance on this matter exists, suggesting that additional legislation may be needed to address the intricacies of decentralised systems within the context of the GDPR. Throughout my thesis, I emphasise the importance of a case-by-case analysis for determining data controllership, and provide insights into potential assessment outcomes. Overall, my research serves as a foundation for understanding software developers’ roles and responsibilities as data controllers in various development processes under the GDPR.
Now showing items 1-5 of 5