Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by study line "Ohjelmistot"

Sort by: Order: Results:

  • Pulkka, Robert (2022)
    In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
  • Ratilainen, Katja-Mari (2023)
    Context: The Bank of Finland, as the national monetary and central bank of Finland, possesses an extensive repository of data that fulfills both the statistical needs of international organizations and the federal requirements. Data scientists within the bank are increasingly interested in investing in machine learning (ML) capabilities to develop predictive models. MLOps offers a set of practices that ensure the reliable and efficient maintenance and deployment of ML models. Objective: In this thesis, we focus on addressing how to implement an ML pipeline within an existing environment. The case study is explorative in nature, with the primary objective of gaining deeper insight into MLOps tools and their practical implementation within the organization. Method: We apply the design science research methodology to divide design and development into six tasks: problem identification, objective definition, design and development, demonstration, evaluation, and communication. Results: We select the tools for the MLOps based on the user requirements and the existing environment, and then we design and develop a simplified end-to-end ML pipeline utilizing the chosen tools. Lastly, we conduct an evaluation to measure the alignment between the selected tools and the initial user requirements.
  • Yumo, Luo (2023)
    Machine Learning Operations (MLOps), derived from DevOps, aims to unify the development, deployment, and maintenance of machine learning (ML) models. Continuous training (CT) automatically retrains ML models, and continuous deployment (CD) automatically deploys the retrained models to production. Therefore, they are essential for maintaining ML model performance in dynamic production environments. The existing proprietary solutions suffer from drawbacks such as a lack of transparency and potential vendor lock-in. Additionally, current MLOps pipelines built using open-source tools still lack flexible CT and CD for ML models. This study proposes a cloud-agnostic and open-source MLOps pipeline that enables users to retrain and redeploy their ML models flexibly. We applied the Design Science methodology, consisting of identifying the problem, defining the solution objectives, and implementing, demonstrating, and evaluating the solution. The resulting solution is an MLOps pipeline called CTCD-e MLOps pipeline. We formed a conceptual model of the needed functionalities of our MLOps pipeline and implemented the pipeline using only open-source tools. The CTCD-e MLOps pipeline runs atop Kubernetes. It can autonomously adapt ML models to dynamic production data by automatically starting retraining ML models when their performance degrades. It can also automatically A/B test the performance of the retrained models in production and fully deploys them only when they outperform their predecessors. Our demonstration and evaluation of the CTCD-e MLOps pipeline show that it is cloud-agnostic and can also be installed in on-premises environments. Additionally, the CTCD-e MLOps pipeline enables its users to flexibly configure model retraining and redeployment as well as production A/B test of the retrained models based on various requirements.
  • Stepanenko, Artem (2022)
    The rapid progress of communication technologies combined with the growing competition for talents and knowledge has made it necessary to reassess the potential of distributed development which has significantly changed the landscape of the IT industry introducing a variety of cooperation models and making notable changes to the software team work environment. Along with this, enterprises pay more attention to teams’ performance improvement, employing emerging management tools for building up efficient software teams, and trying to get the most out of understanding factors which significantly impact a team’s overall performance. The objective of the research is to systematize factors characterizing high-performing software teams; indicate the benefits of global software development (GSD) models positively influencing software teams’ development performance; and study how companies’ strategies can benefit from distributed development approaches in building high-performing software teams. The thesis is designed as a combination of a systematic literature review followed by qualitative research in the form of semi-structured interviews to validate the findings regarding classification of GSD models’ benefits and their influence on the development of high-performing software teams. At a literature review stage, the research (1) introduces a team performance factors’ model reflecting the aspects which impact the effectiveness of development teams; (2) suggests the classification of GSD models based on organizational, legal, and temporal characteristics, and (3) describes the benefits of GSD models which influence the performance of software development teams. Within the empirical part of the study, we refine the classification of GSD models’ benefits based on the qualitative analysis results of semi-structured interviews with practitioners from IT industry, form a comparison table of GSD benefits depending on the model in question, and introduce recommendations for company and team management regarding the application of GSD in building high-performing software teams. IT corporations, to achieve their strategic goals, can enrich their range of available tools for managing high-performing teams by considering the peculiarities of different GSD models. Company and team management should evaluate the advantages of the distributed operational models, and use the potential and benefits of available configurations to increase teams’ performance and build high-performing software teams.
  • Jaana, Hautala (2023)
    Artificial Intelligence (AI) has revolutionized various domains of software development, promising solutions that can adapt and learn. However, the rise of AI systems has also been accompanied by ethical concerns, primarily related to the unintentional biases these systems can inherit during the development process. This thesis presents a thematic literature review aiming to identify and examine the existing methodologies and strategies for preventing bias in iterative AI software development. Methods employed for this review include a formal search strategy using defined inclusion and exclusion criteria, and a systematic process for article sourcing, quality assessment, and data collection. 29 articles were analyzed, resulting in the identification of eight major themes concerning AI bias mitigation within iterative software development, ranging from bias in data and algorithmic processes to fairness and equity in algorithmic design. Findings indicate that while various approaches for bias mitigation exist, gaps remain. These include the need for adapting strategies to agile or iterative frameworks, resolving the trade-off between effectiveness and fairness, understanding the complexities of bias for tailored solutions, and assessing the real-world applicability of these techniques. This synthesis of key trends and insights highlights these specific areas requiring further research.
  • Garmuyev, Pavel (2022)
    RESTful web APIs have gained significant interest over the past decade, especially among large businesses and organizations. However, an important part of being able to use these public web APIs is the knowledge on how to access, consume, and integrate them into applications. Since developers are the primary audience that will be doing the integration it is important to support them throughout their API adoption journey. For this, many of today's companies that are heavily invested in web APIs provide an API developer portal as part of their API management program. However, very little accessible and comprehensive information on how to build and structure API developer portals exist yet. This thesis presents a conducted exploratory multi-case case study of three publicly available API developer portals of three different commercial businesses. The objective of the case study was to identify the developer (end-user) oriented features and capabilities present on the selected developer portals, in order to understand the kinds of information and capabilities API developer portals could provide for developers in general. The exploration was split into three key focus areas: developer onboarding, web API documentation, and developer support and engagement. Based on these, three research questions were formulated respectively. The data consisted of field notes that described observations about the portals. These notes were grouped by location and action, and analyzed to identify a key feature or capability as well as any smaller, compounding features and capabilities. The results describe the identified features and capabilities present on the studied API developer portals. Additionally, some differences between the portals are noted. The key contribution of this thesis are the results themselves, which can be used as a checklist when building new API developer portal. However, the main limitation of this study is that its data collection and analysis processes were subjective and the findings are not properly validated. Such improvements will remain for future work.
  • Valentine, Nicolas (2023)
    A case study that studied the performance impact of a node.js component when it was refactored from monolith environment into independent service. The performance study studied the response time of the blocking part of JavaScript code in the component. The non blocking part of the code and the added network overhead from the refactoring were excluded from the performance review. Literature review didn’t show any related research that studied the performance impact of a node.js component when it was refactored from monolith into microservices. Many found studies were found that studied the response time and throughput of REST API build with node.js with comparisons to other programming languages. A study were found that related to refactoring an application from monolith into microservices. None of the found studies were directly related to the studied case. It was noted that the response time of the component improved by 46.5% when it was refactored from monolith into microservice. It is possible that when a node.js monolith application grows it starts to affect the throughput of the event loop affecting performance critical components. For the case component it was beneficial to refactor it into independent service in order to gain the 92.6ms in the mean response time.
  • Wu, Qinglu (2023)
    Background: BIM (Building Information Modelling) has helped the construction industry with better workflow and collaboration. To further integrate technologies into the construction industry, research and applications are actively integrating cloud technologies into traditional BIM design workflow. Such integration can be referred to as Cloud BIM, which is considered the second generation of BIM development. Cloud BIM is related to many aspects including technical implementation, workflow improvement, and collaboration of different roles. Aims: In this thesis, we want to find the current situation of Cloud BIM, identifying the benefits and challenges as well as possible technical solutions to the challenges. Methods: We conducted a literature review and analyzed eleven selected papers to gather the necessary data for this thesis. We then did a case study of an integration of two applications, to understand the real challenges in an actual implementation of a cloud-based BIM solution. Results: Cloud BIM can mainly benefit collaboration and information exchange. However, many challenges still exist both in technical and non-technical aspects that require more work. Our integration explored a deeper and more cloud-based solution in a certain process of BIM projects. The main challenge we faced is inconsistent data standards. Conclusions: The results show that the industry is on the way to integrating the cloud into BIM. However, more work needs to be done to overcome the challenges.
  • Akkanen, Saara (2023)
    This Master’s Thesis describes an original user study that took place at the University of Helsinki. The study compares and evaluates the usability of three different methods that are used in meeting rooms to share a private device’s screen on a big public screen in order to give a slideshow presentation: HDMI, VIA, and Ubicast. There were 18 participants. The study was conducted in a controlled environment, replicating a typical meeting room setup. The experiment consisted of screen mirroring tasks and an interview. In a screen mirroring task, the participants were asked to share their screen using each of the three technologies. They were provided with the necessary equipment and user guides if needed. Then the participants were given training on how to use the technologies, and they performed the tasks again. During the task, the time taken to complete each screen mirroring session was recorded, and any errors or difficulties encountered were noted. After completing the screen mirroring tasks, participants were interviewed to gather qualitative data on their experiences and preferences. They were asked about the ease of use, efficiency, and any difficulties they faced while using each technology. This information was used to gain insights into user preferences and potential areas for improvement in the respective technologies. To analyze the data, the System Usability Scale (SUS) scores and time taken to complete the screen mirroring tasks were calculated for each technology. Statistical analyses were conducted to determine any significant differences in SUS scores and time across the three technologies. Additionally, the interview data was analyzed using thematic analysis to identify common themes and patterns in the experiences of the users. HDMI emerged on the top, with Ubicast not far behind.
  • Tauriainen, Juha (2023)
    Software testing is an important part of ensuring software quality. Studies have shown that having more tests results in a lower count of defects. Code coverage is a tool used in software testing to find parts of the software that require further testing and to learn which parts have been tested. Code coverage is generated automatically by the test suites during test execution. Many types of code coverage metrics exist, the most common being line coverage, statement coverage, function coverage, and branch coverage metrics. These four common metrics are usually enough, but there are many specific coverage types for specific purposes, such as condition coverage which tells how many boolean conditions have been evaluated as true and false. Each different metric gives hints on how the codebase is tested. A common consensus amongst practitioners is that code coverage does not correlate much with software quality. The correlation of software quality with code coverage is a historically broadly researched topic, which has importance both in academia and professional practice. This thesis investigates if code coverage correlates with software quality by performing a literature review. Surprising results are derived from the literature review, as most studies included in this thesis point towards code coverage correlating with software quality. This positive correlation comes from 22 studies conducted between 1995-2021, which include Academic and Industrial studies, with studies put into multiple categories, such as Correlation or No correlation based on the key finding, and categories such as Survey studies, Case studies, Open-source studies, based on the study type. Each category has most studies pointing towards a correlation. This finding is in contradiction with the opinions of professional practitioners.
  • Hippeläinen, Sampo (2022)
    One of the problems with the modern widespread use of cloud services pertains to geographical location. Modern services often employ location-dependent content, in some cases even data that should not end up outside a certain geographical region. A cloud service provider may however have reasons to move services to other locations. An application running in a cloud environment should have a way to verify the location of both it and its data. This thesis describes a new solution to this problem by employing a permanently deployed hardware device which provides geolocation data to other computers in the same local network. A protocol suite for applications to check their geolocation is developed using the methodology of design science research. The protocol suite thus created uses many tried-and-true cryptographic protocols. A secure connection is established between an application server and the geolocation device, during which the authenticity of the device is verified. The location of data is ensured by checking that a storage server indeed has access to the data. Geographical proximity is checked by measuring round-trip times and setting limits for them. The new solution, with the protocol suite and hardware, is shown to solve the problem and fulfill strict requirements. It improves on the results presented in earlier work. A prototype is implemented, showing that the protocol suite can be feasible both in theory and practice. Details will however require further research.
  • Hiillos, Nicolas (2023)
    This master's thesis describes the development and validation of a uniform control interface for drawing robots with ROS2. The robot control software was tasked with taking SVG images as input and producing them as drawings with three different robots. These robots are the Evil Mad Scientist AxiDraw V3/A3, UFACTORY xArm Lite6, and virtual xArm Lite6. The intended use case for the robots and companion control software is experiments studying human perception of the creativity of the drawing robots. The control software was implemented over the course of a little over six months and used a combination of C++ and Python. The design of the software utilizes ROS2 abstractions such as nodes and topics to combine different components of the software. The control software is validated against the given requirements and found to fulfil the main objectives of the project. The most important of these are that the robots successfully draw SVG images, that they do so in a similar time frame, and that these images look very similar. Drawing similarity was tested by scanning images, aligning them using using minimal error, and then comparing them visually after overlaying the images. Comparing aligned images was useful in detecting subtle differences in the drawing similarity of the robots and was used to discover issues with the robot control software. MSE and SSIM were also calculated for a set of these aligned images, allowing for the effect of future changes made to the robot control software to be quantitatively evaluated. Drawing time for the robots was evaluated by measuring the time taken for drawing a set of images. This testing showed that the Axidraw's velocity and acceleration needed to be reduced by 56% so that the xArm Lite6 could draw in similar time.
  • Harhio, Säde (2022)
    The importance of software architecture design decisions has been known for almost 20 years. Knowledge vaporisation is a problem in many projects, especially in the current fast-paced culture, where developers often switch from project to another. Documenting software architecture design decisions helps developers understand the software better and make informed decisions in the future. However, documenting architecture design decisions is highly undervalued. It does not create any revenue in itself, and it is often the disliked and therefore neglected part of the job. This literature review explores what methods, tools and practices are being suggested in the scientific literature, as well as, what practitioners are recommending within the grey literature. What makes these methods good or bad is also investigated. The review covers the past five years and 36 analysed papers. The evidence gathered shows that most of the scientific literature concentrates on developing tools to aid the documentation process. Twelve out of nineteen grey literature papers concentrate on Architecture Decision Records (ADR). ADRs are small template files, which as a collection describe the architecture of the entire system. The ADRs appear to be what practitioners have become used to using over the past decade, as they were first introduced in 2011. What is seen as beneficial in a method or tool is low-cost and low-effort, while producing concise, good quality content. What is seen as a drawback is high-cost, high-effort and producing too much or badly organised content. The suitability of a method or tool depends on the project itself and its requirements.
  • Sinikallio, Laura (2022)
    Parlamentaaristen aineistojen digitointi ja rakenteistaminen tutkimuskäyttöön on nouseva tutkimuksenala, jonka tiimoilta esimerkiksi Euroopassa on tällä hetkellä käynnissä useita kansallisia hankkeita. Tämä tutkielma on osa Semanttinen parlamentti -hanketta, jossa Suomen eduskunnan täysistuntojen puheenvuorot saatetaan ensimmäistä kertaa yhtenäiseksi, harmonisoiduksi aineistoksi koneluettavaan muotoon aina eduskunnan alusta vuodesta 1907 nykypäivään. Puheenvuorot ja niihin liittyvät runsaat kuvailutiedot on julkaistu kahtena versiona, parlamentaaristen aineistojen kuvaamiseen käytetyssä Parla-CLARIN XML -formaatissa sekä linkitetyn avoimen datan tietämysverkkona, joka kytkee aineiston osaksi laajempaa kansallista tietoinfrastruktuuria. Yhtenäinen puheenvuoroaineisto tarjoaa ennennäkemättömiä mahdollisuuksia tarkastella suomalaista parlamentarismia yli sadan vuoden ajalta monisyisesti ja automatisoidusti. Aineisto sisältää lähes miljoona erillistä puheenvuoroa ja linkittyy tiiviisti eduskunnan toimijoiden biografisiin tietoihin. Tässä tutkielmassa kuvataan puheenvuorojen esittämistä varten kehitetyt tietomallit ja puheenvuoroaineistojen keräys- ja muunnosprosessi sekä tarkastellaan prosessin ja syntyneen aineiston haasteita ja mahdollisuuksia. Toteutetun aineistojulkaisun hyödyllisyyden arvioimiseksi on Parla-CLARIN-muotoista aineistoa jo hyödynnetty poliittiseen kulttuuriin liittyvässä digitaalisten ihmistieteiden tutkimuksessa. Linkitetyn datan pohjalta on kehitetty semanttinen portaali, Parlamenttisampo, aineistojen julkaisemista ja tutkimista varten verkossa.
  • Harjunpää, Jonas (2022)
    Ohjelmistotuotannon ammattilaiset tarvitsevat monenlaisia kompetensseja. Yksi näistä kompetensseista on kyky elinikäiseen oppimiseen, joka on tarpeellinen laajalla ja jatkuvasti muutoksessa olevalla alalla. ICT-aloille muodostuneen osaajatarpeen myötä elinikäisen oppimisen rooli onkin alkanut korostumaan entisestään. Tutkielman tarkoituksena on ollut lisätä ymmärrystä elinikäisen oppimisen roolista ohjelmistotuotannon ammattilaisen näkökulmasta. Tutkielmassa on pyritty tunnistamaan, mitä oppimisen muotoja hyödynnetään sekä millaisiin tarkoituksiin niitä käytetään, mitkä elinikäisen oppimisen kompetenssin osatekijät ovat tärkeitä sekä mitä haasteita elinikäiseen oppimiseen liittyy. Tutkimuksen aineisto on kerätty puolistrukturoiduilla haastatteluilla ohjelmistotuotannon ammattilaisten kanssa. Näiden haastattelujen tuloksia on verrattu tutkielmaa varten suoritetun kirjallisuuskatsauksen tuloksiin. Oppimisen muodoista informaalia oppimista hyödynnetään eniten ja erityisesti pienempiin oppimistarpeisiin. Nonformaalia ja formaalia oppimista taas hyödynnetään isompiin tarpeisiin, mutta harvemmin. Motivaatio, tiedonhaku ja metaoppiminen korostuvat keskeisinä elinikäisen oppimisen kompetenssin osatekijöinä. Ajanpuute ja itsensä motivoiminen mielletään yleisimmiksi haasteiksi elinikäistä oppimista koskien. Myös tiedonlähteisiin liittyvät puutteet sekä puutteellinen ymmärrys metaoppimisesta mielletään vaikeuttavan elinikäistä oppimista. Tutkielman havainnot tukevat elinikäisen oppimisen kompetenssin keskeistä roolia ohjelmistotuotannon ammattilaisilla. Kehitettävää löytyy kuitenkin vielä ohjelmistotuotannon ammattilaisten valmiuksista elinikäiseen oppimiseen, esimerkiksi metaoppimista koskien. Havainnot perustuvat kuitenkin lyhyemmän aikaa ohjelmistotuotannon ammattilaisina työskennelleiden kokemuksiin, joten lisää tutkimusta tarvitaan etenkin pitempään työskennelleiltä ohjelmistotuotannon ammattilaisilta.
  • Pukkila, Eero (2022)
    Etuuspohjaisen eläkejärjestelyn laskennan tavoitteena on selvittää eläkevakuutuksen ottajan säästö- ja eläkesuunnitelmien yhteensopivuus ottaen samalla huomioon sopimukseen kuuluvat turvat ja muut kulut. Vapaaehtoisiin eläkesopimuksiin tehtyjen lakimuutosten seurauksena tällainen laskenta on monimutkaistunut huomattavasti 2000-luvun aikana ja vanhoille järjestelmille luodut laskentamallit eivät aina suoriudu toivotulla nopeudella. Tämän tutkielman aiheena on Profit Software Oy:n Profit Life & Pension -vakuutustenhallintajärjestelmän optimointi edellä kuvatun laskennan osalta.
  • Nieminen, Jeremi (2023)
    This thesis examines the render speeds of WebViews in React Native applications. React Native is a popular cross-platform framework for developing mobile applications, and WebViews allow embedding web content within mobile applications. While WebViews offer the advantage of bringing readily available web content in applications, the cost of using this technology in terms of applications responsiveness is not well researched. The goal of this thesis is to evaluate this cost so that developers and stakeholders can make more informed decisions regarding the use of WebViews in React Native applications. A series of tests was performed using a React Native application that was developed for the purpose of this study. In these tests, we rendered WebViews and similarly appearing views that consist of React Native components, and measured their mean render times. Our analysis of these results revealed that using React Native components instead of WebViews offers significant benefits in terms of rendering performance on both, iOS and Android platforms. The use of WebViews in rendering user interfaces can bring a notable disadvantage in the matter of user experience, especially on Android devices. These findings suggest that rendering Native user interface components instead of WebViews should be preferred if we want to maximize user experience across different devices and platforms.
  • Ahlfors, Dennis (2022)
    While the role of IT and computer science in the society is on the rise, interest in computer science education is also on the rise. Research covering study success and study paths is important for understanding both student needs and developing the educational programmes further. Using a data set covering student records from 2010 to 2020, this thesis aims to find key insights and base research in the topic of computer science study success and study paths in the University of Helsinki. Using novel visualizations and descriptive statistics this thesis builds a picture of the evolution of study paths and student success during a 10-year timeframe, providing much needed contextual information to be used as inspiration for future focused research into the phenomena discovered. The visualizations combined with statistical results show that certain student groups seem to have better study success and that there are differences in the study paths chosen by the student groups. It is also shown that the graduation rates from the Bachelor’s Programme in Computer Science are generally low, with some student groups showing higher than average graduation rates. Time from admission to graduation is longer than suggested and the sample study paths provided by the university are not generally followed, leading to the conclusion that the programme structure would need some assessment to better incorporate students with diverse academic backgrounds and differing personal study plans.
  • Tilander, Vivianna (2023)
    Context: An abundance of research on the productivity of software development teams and developers exists identifying many factors and their effects in different contexts and concerning different aspects of productivity. Objective: This thesis aims to collect and analyse existing recent research results of factors that are related to or directly influence the productivity of teams or developers, how they influence it in different contexts and briefly summarise the metrics used in recent studies to measure productivity. Method: The method selected to reach for these aims was to conduct a systematic literature review on relevant studies published between 2017 and 2022. Altogether, 48 studies were selected and analysed during the review. Results: The metrics used by the reviewed studies for measuring productivity range from time used for completing a task to self-evaluated productivity to the amount of commits contributed. Some of these are used by multiple studies and many by only one or a few and measure productivity from different angles. Various factors were found and these range from team size to experienced emotion to working from home during the COVID-19 pandemic. The relationships found between these factors and some aspects of the productivity of developers and teams range from positive to negative and sometimes both depending on the context and the productivity metric in question. Conclusions: While many relationships were found between various factors and the productivity of software developers and development teams in this review, these do not cover all possible factors, relationships or measurable productivity aspects in all possible contexts. Additionally, one should keep in mind that most of the found relationships do not imply causality.
  • Steenari, Jussi (2023)
    Ship traffic is a major source of global greenhouse gas emissions, and the pressure on the maritime industry to lower its carbon footprint is constantly growing. One easy way for ships to lower their emissions would be to lower their sailing speed. The global ship traffic has for ages followed a practice called "sail fast, then wait", which means that ships try to reach their destination in the fastest possible time regardless and then wait at an anchorage near the harbor for a mooring place to become available. This method is easy to execute logistically, but it does not optimize the sailing speeds to take into account the emissions. An alternative tactic would be to calculate traffic patterns at the destination and use this information to plan the voyage so that the time at anchorage is minimized. This would allow ships to sail at lower speeds without compromising the total length of the journey. To create a model to schedule arrivals at ports, traffic patterns need to be formed on how ships interact with port infrastructure. However, port infrastructure is not widely available in an easy-to-use form. This makes it difficult to develop models that are capable of predicting traffic patterns. However, ship voyage information is readily available from commercial Automatic Information System (AIS) data. In this thesis, I present a novel implementation, which extracts information on the port infrastructure from AIS data using the DBSCAN clustering algorithm. In addition to clustering the AIS data, the implementation presented in this thesis uses a novel optimization method to search for optimal hyperparameters for the DBSCAN algorithm. The optimization process evaluates possible solutions using cluster validity indices (CVI), which are metrics that represent the goodness of clustering. A comparison with different CVIs is done to narrow down the most effective way to cluster AIS data to find information on port infrastructure.