Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "pipeline"

Sort by: Order: Results:

  • Mäkinen, Sasu (2021)
    Deploying machine learning models is found to be a massive issue in the field. DevOps and Continuous Integration and Continuous Delivery (CI/CD) has proven to streamline and accelerate deployments in the field of software development. Creating CI/CD pipelines in software that includes elements of Machine Learning (MLOps) has unique problems, and trail-blazers in the field solve them with the use of proprietary tooling, often offered by cloud providers. In this thesis, we describe the elements of MLOps. We study what the requirements to automate the CI/CD of Machine Learning systems in the MLOps methodology. We study if it is feasible to create a state-of-the-art MLOps pipeline with existing open-source and cloud-native tooling in a cloud provider agnostic way. We designed an extendable and cloud-native pipeline covering most of the CI/CD needs of Machine Learning system. We motivated why Machine Learning systems should be included in the DevOps methodology. We studied what unique challenges machine learning brings to CI/CD pipelines, production environments and monitoring. We analyzed the pipeline’s design, architecture, and implementation details and its applicability and value to Machine Learning projects. We evaluate our solution as a promising MLOps pipeline, that manages to solve many issues of automating a reproducible Machine Learning project and its delivery to production. We designed it as a fully open-source solution that is relatively cloud provider agnostic. Configuring the pipeline to fit the client needs uses easy-to-use declarative configuration languages (YAML, JSON) that require minimal learning overhead.
  • Almusa, Henrikki (2013)
    The next-generation sequencing (NGS) platforms create a large amount of sequence in short amount of time, when compared to first generation sequencers. An overview of the NGS platforms is provided with more in-depth look into Illumina Genome Analyzer II as that is used to create the data for the thesis. There were two main aims in this thesis. First, to create a pipeline which can be used to analyse genomic sequencing. Second, to use the pipeline to compare whole human exome capture methods from two manufacturers, Roche Nimblegen and Agilent. The pipeline is describe in detail in material and methods. All the inputs for the pipeline are described and examples shown. In the pipeline the given sequences are first aligned against the reference genome. Then various separate analysis is performed to retrieve variants and coverage of the sequencing. Supplementary results include paired-end anomalies, larger insertion and deletion polymorphisms and assembly of non-aligned sequences. The two capture methods are also described and changes to the manufacturers' recommended protocols are listed. Finally, the section has the options and various inputs used in the pipeline runs of the exome data. The results of the pipeline is a basic level of analysis of the sequencing as well as various graphs showing the quality of the run. All the output files intended for user are described. By using the results of the pipeline, the user can do more in-depth analysis as required by the project. When comparing the two exome capture methods, the Nimblegen capture was shown to be more efficient in capturing the CCDS exome. While the Agilent capture kit provided better one fold coverage over the exome, higher fold coverage (over 10 fold), which is required for reliable variant calling in nextgeneration sequencing, was better reached using the Nimblegen capture kit. Also, significantly fewer false positive paired-end anomalies were observed in the library created by using the Nimblegen capture.