Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "computer vision"

Sort by: Order: Results:

  • Sarapisto, Teemu (2022)
    In this thesis we investigate the feasibility of machine learning methods for estimating the type and the weight of individual food items from images taken of customers’ plates at a buffet- style restaurant. The images were collected in collaboration with the University of Turku and Flavoria, a public lunch-line restaurant, where a camera was mounted above the cashier to automatically take a photo of the foods chosen by the customer when they went to pay. For each image, an existing system of scales at the restaurant provided the weights for each individual food item. We describe suitable model architectures and training setups for the weight estimation and food identification tasks and explain the models’ theoretical background. Furthermore we propose and compare two methods for utilizing a restaurant’s daily menu information for improving model performance in both tasks. We show that the models perform well in comparison to baseline methods and reach accuracy on par with other similar work. Additionally, as the images were captured automatically, in some of the images the food was occluded or blurry, or the image contained sensitive customer information. To address this we present computer vision techniques for preprocessing and filtering the images. We publish the dataset containing the preprocessed images along with the corresponding individual food weights for use in future research. The main results of the project have been published as a peer-reviewed article in the International Conference in Pattern Recognition Systems 2022. The article received the best paper award of the conference.
  • Laitala, Julius (2021)
    Arranging products in stores according to planograms, optimized product arrangement maps, is important for keeping up with the highly competitive modern retail market. The planograms are realized into product arrangements by humans, a process which is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting – retail product detection – to automate planogram compliance evaluation. We introduce the relevant problems, the state-of- the-art approaches for solving them and background information necessary for understanding them. We then propose a computer vision based planogram compliance evaluation pipeline based on the current state of the art. We build our proposed models and algorithms using PyTorch, and run tests against public datasets and an internal dataset collected from a large Nordic retailer. We find that while the retail product detection performance of our proposed approach is quite good, the planogram compliance evaluation performance of our whole pipeline leaves a lot of room for improvement. Still, our approach seems promising, and we propose multiple ways for improving the performance enough to enable possible real world utility. The code used for our experiments and the weights for our models are available at https://github.com/laitalaj/cvpce
  • Vesalainen, Ari (2022)
    Digitization has changed history research. The materials are available, and online archives make it easier to find the correct information and speed up the search for information. The remaining challenge is how to use modern digital methods to analyze the text of historical documents in more detail. This is an active research topic in digital humanities and computer science areas. Document layout analysis is where computer vision object detection methods can be applied to historical documents to identify the document pages’ present objects (i.e., page elements). The recent development in deep learning based computer vision provides excellent tools for this purpose. However, most reviewed systems focus on coarse-grained methods, where only the high-level page elements are detected (e.g., text, figures, tables). Fine-grained detection methods are required to be able to analyze texts on a more detailed level; for example, footnotes and marginalia are distinguished from the body text to enable proper analysis. The thesis studies how image segmentation techniques can be used for fine-grained OCR document layout analysis. How to implement fine-grained page segmentation and region classification systems in practice, and what are the accuracy and the main challenges of such a system? The thesis includes implementing a layout analysis model that uses the instance segmentation method (Mask R-CNN). This implementation is compared against another existing layout analysis using the semantic segmentation method (U-net based P2PaLA implementation).
  • Saukkoriipi, Mikko (2022)
    Two factors define the success of a deep neural network (DNN) based application; the training data and the model. Nowadays, many state-of-the-art DNN models are available free of charge, and training and deploying these models is easier than ever before. As a result, anyone can set up a state-of-the-art DNN algorithm within days or even hours. In the past, most of the focus has been given to the model when researchers were building faster and more accurate deep learning architectures. These research groups commonly use large and high-quality datasets in their work, which is not the case when one wants to train a new model for a specific use case. Training a DNN algorithm for a specific task requires collecting a vast amount of unlabelled data and then labeling the training data. To train a high-performance model, the labeled training dataset must be large and diverse to cover all relevant scenarios of the intended use case. This thesis will present an efficient and straightforward active learning method to sample the most informative images to train a powerful anchor-free Intersection over Union (IoU) predicting objector detector. Our method only uses classification confidences and IoU predictions to estimate the image informativeness. By collecting the most informative images, we can cover the whole diversity of the images with fewer human-annotated training images. This will save time and resources, as we avoid labeling images that would not be beneficial.
  • Leinonen, Matti (2021)
    3D Object detection and tracking are computer vision methods used in many applications. It is necessary for autonomous vehicles and robots to be able to reliably extract 3D localization information about objects in their environment to operate safely. Currently most 3D object detection and tracking algorithms use high quality LiDAR-sensors which are very expensive. This is why research into methods that use cheap monocular camera images as inputs is an active field in computer vision research. Most current research into monocular 3D object detection and tracking is focused in autonomous driving. This thesis investigates how well current monocular methods are suited for use in industrial settings where the environment and especially the camera perspective can be very different compared to what it is in an automobile. This thesis introduces some of the most used 3D object detection and tracking methods and techniques and tests one detection method on a dataset where the environment and the point of view differs from what it would be in autonomous driving. This thesis also analyzes the technical requirements for a detection and tracking system that could be be used for autonomous robots in an industrial setting and what future research would be necessary to develop such a system.
  • Kutvonen, Konsta (2020)
    With modern computer vision algorithms, it is possible to solve many different kinds of problems, such as object detection, image classification, and image segmentation. In some cases, like in the case of a camera-based self-driving car, the task can't yet be adequately solved as a direct mapping from image to action with a single model. In such situations, we need more complex systems that can solve multiple computer vision tasks to understand the environment and act based on it for acceptable results. Training each task on their own can be expensive in terms of storage required for all weights and especially for the inference time as the output of several large models is needed. Fortunately, many state-of-the-art solutions to these problems use Convolutional Neural Networks and often feature some ImageNet backbone in their architecture. With multi-task learning, we can combine some of the tasks into a single model, sharing the convolutional weights in the network. Sharing the weights allows for training smaller models that produce outputs faster and require less computational resources, which is essential, especially when the models are run on embedded devices with constrained computation capability and no ability to rely on the cloud. In this thesis, we will present some state-of-the-art models to solve image classification and object detection problems. We will define multi-task learning, how we can train multi-task models, and take a look at various multi-task models and how they exhibit the benefits of multi-task learning. Finally, to evaluate how training multi-task models changes the basic training paradigm and to find what issues arise, we will train multiple multi-task models. The models will mainly focus on image classification and object detection using various data sets. They will combine multiple tasks into a single model, and we will observe the impact of training the tasks in a multi-task setting.
  • Laakso, Joosua (2023)
    Semantic segmentation is a computer vision problem of partitioning an image based on what type of an object each part represents, with pixel-level precision. Producing labeled datasets to train deep learning models for semantic segmentation can be laborious due to the demand for pixel-level precision. On the other hand, a deep learning model trained on one dataset might have inferior performance when applied on another dataset, depending on how different those datasets are. Unsupervised domain adaptation attempts to narrow this performance gap by adapting the model to the other dataset, even if ground-truth labels for that dataset are not available. In this work, we review some of the pre-existing methods for unsupervised domain adaptation in semantic segmentation. We then present our own efforts to develop novel methods for the problem. Those include a new type of loss function for unsupervised output shaping, unsupervised training of the model backbone based on the feature statistics and a method for unsupervised adaptation of the model backbone using an auxiliary network that attempts to mimic the gradients of supervised training. We present empirical results of the performance of these methods. We additionally present our findings on the effects of changes in the statistics of the batch normalization layers on domain adaptation performance.
  • Goriachev, Vladimir (2018)
    In the case of remote inspection and maintenance operations, the quality and amount of information available to the operator on demand plays a significant role. In knowledge-intensive tasks performed remotely or in a hazardous environment, augmented and virtual reality technologies are often seen as a solution capable of providing the required level of information support. Application of these technologies faced many obstacles over the years, mostly due to the insufficient maturity level of their technical implementations. This thesis contains a description of the research work related to the usage of augmented and virtual reality in remote inspection and maintenance operations, and is aimed at solving some of the most common problems associated with the application of these technologies. During the project, an optical see-through augmented reality glasses calibration method was developed, as well as a virtual reality application for robotic teleoperation. The implemented teleoperation system was tested in two different simulated scenarios, and the additional questions of the immersive environment reconstruction, spatial user interface, connection between virtual and real worlds are addressed in this thesis report.