Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "object detection"

Sort by: Order: Results:

  • Laitala, Julius (2021)
    Arranging products in stores according to planograms, optimized product arrangement maps, is important for keeping up with the highly competitive modern retail market. The planograms are realized into product arrangements by humans, a process which is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting – retail product detection – to automate planogram compliance evaluation. We introduce the relevant problems, the state-of- the-art approaches for solving them and background information necessary for understanding them. We then propose a computer vision based planogram compliance evaluation pipeline based on the current state of the art. We build our proposed models and algorithms using PyTorch, and run tests against public datasets and an internal dataset collected from a large Nordic retailer. We find that while the retail product detection performance of our proposed approach is quite good, the planogram compliance evaluation performance of our whole pipeline leaves a lot of room for improvement. Still, our approach seems promising, and we propose multiple ways for improving the performance enough to enable possible real world utility. The code used for our experiments and the weights for our models are available at
  • Soukainen, Arttu (2023)
    Insect pests substantially impact global agriculture, and pest control is essential for global food production. However, some pest control measures, such as intensive insecticide use, can have adverse ecological and economic effects. Consequently, there is a growing need for advanced pest management tools that can be integrated into intelligent farming strategies and precision agriculture. This study explores the potential of a machine learning tool to automatically identify and quantify fruit fly pests from images in the context of Ghanaian mango orchards in West Africa. Fruit flies provide a special challenge for computer vision-based deep learning due to their small size and taxonomic diversity. Insects were captured using sticky traps together with attractant pheromones. The traps were then photographed in the field using regular smartphone cameras. The image data contained 1434 examples of the targeted pests, and it was used to train a convolutional neural network model (CNN) for counting and classifying the fruit flies into two different genera: Bactrocera and Ceratits. High-resolution images were used to train the YOLOv7 object detection algorithm. The training involved manual hyper-parameter optimization emphasizing pre-selected hyper parameters. The focus was on employing appropriate evaluation metrics during model training. The final model had a mean average precision (mAP) of 0.746 and was able to identify 82% of the Ceratitis and 70% of the Bactrocera examples in the validation data. Results promote the advantages of a computer vision-based solution for automated multi-class insect identification and counting. Low-effort data collection using smartphones is sufficient to train a modern CNN model efficiently, even with a limited number of field images. Further research is needed to effectively integrate this technology into decision-making systems for pre cision agriculture in tropical Africa. Nevertheless, this work serves as a proof of concept, show casing the serious potential of computer vision-based models in automated or semi-automated pest monitoring. Such models can enable new strategies for monitoring pest populations and targeting pest control methods. The same technology has potential not only in agriculture but in insect monitoring in general.
  • Saukkoriipi, Mikko (2022)
    Two factors define the success of a deep neural network (DNN) based application; the training data and the model. Nowadays, many state-of-the-art DNN models are available free of charge, and training and deploying these models is easier than ever before. As a result, anyone can set up a state-of-the-art DNN algorithm within days or even hours. In the past, most of the focus has been given to the model when researchers were building faster and more accurate deep learning architectures. These research groups commonly use large and high-quality datasets in their work, which is not the case when one wants to train a new model for a specific use case. Training a DNN algorithm for a specific task requires collecting a vast amount of unlabelled data and then labeling the training data. To train a high-performance model, the labeled training dataset must be large and diverse to cover all relevant scenarios of the intended use case. This thesis will present an efficient and straightforward active learning method to sample the most informative images to train a powerful anchor-free Intersection over Union (IoU) predicting objector detector. Our method only uses classification confidences and IoU predictions to estimate the image informativeness. By collecting the most informative images, we can cover the whole diversity of the images with fewer human-annotated training images. This will save time and resources, as we avoid labeling images that would not be beneficial.
  • Pelvo, Nasti (2024)
    Object detection and multi-object tracking are crucial components of computer vision systems aiming for comprehensive scene understanding and reliable autonomous decision making. While methods developed for visual input data are widely studied, they are susceptible to environmental factors such as poor lighting and weather conditions. Thermal imaging, on the other hand, is robust against most adversarial environmental conditions and thus presents an intriguing alternative to visual photography. Due to the characteristics of thermal images, current state-of-the-art object detection and tracking methods perform poorly when presented with thermal input. Open source thermal data for training large neural network models is not widely available: existing datasets are small and homogenenous, and the resulting models lack the generalizability required for their application on real world input data. The effect is especially relevant for transformer-based methods, which exhibit a lack of visual inductive bias and thus require large-scale training. This thesis presents the first in-depth literature review and experimental study into transformer-based object detection and tracking on challenging thermal and aerial data. By conducting an analysis on existing transformer-based multi-object tracking methods, we argue for the application of the joint detection and tracking paradigm, where multi-object tracking is treated as an end-to-end problem. Our experiments on two transformer-based multi-object tracking models confirm that fully exploiting multi-frame input can increase the stability of object detection and enforce robustness against the domain issues prevalent in thermal images. Due to the high training data requirement of transformers, the methods are, however, held back by the lack of open source training data. We thus introduce two novel data augmentation techniques which aim to supplement and diversify existing training data, and thus improve the transferability of detection and tracking methods between the visual and thermal domains.
  • Wang, Ruilin (2023)
    This thesis is an integral part of an interdisciplinary research endeavor that provides computer science-driven approaches and deep learning methodologies that seamlessly integrate into the broader research conducted by scholars in the field of digital humanities and historians. Utilizing deep learning techniques, this research investigates the printers and publishers of 18th-century books, with a specific focus on the prominent family-based printing dynasty called Tonson. By identifying common visual elements, the thesis facilitates a comparative analysis of associations between different printers, providing valuable insights into the historical context and artistic characteristics of the Tonson dynasty. The thesis begins by discussing various deep learning models trained on an expert-annotated dataset, enabling the extraction of five main categories and sixteen subcategories of visual elements. Notably, the MaskRCNN model demonstrates superior performance, particularly in detecting headpieces and initials. The study then delves into the grouping of initials and headpieces within the dataset. The Sim CLR model is employed using data augmentation techniques that simulate the inherent noise present in the dataset. This enables the generation of distinct embeddings for each initial and headpiece. Various unsupervised learning methods are applied, with Hierarchical Clustering emerging as the most effective technique. Higher similarity scores for headpieces compared to initials indicate greater ease in identifying similar headpieces. We further discuss the potential applications from a historical perspective including book pricing, future avenues for accurately identifying related printers, and temporal research concerning the Tonson dynasty. In conclusion, this thesis presents a novel integration of computer science and deep learning methodologies within the field of digital humanities and historical studies. By focusing on the Tonson dynasty, it provides a comprehensive analysis of printers and publishers in 18th-century books, ultimately contributing to a deeper understanding of this historical period.