Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Deep learning"

Sort by: Order: Results:

  • Valjakka, Jorma (2024)
    Audio signals offer invaluable insights into system operational conditions and potential malfunctions. Proactive fault detection in machinery and other infrastructures through audio monitoring provides significant advantages in numerous sectors, such as industrial maintenance, healthcare, and urban security. Localizing anomalies within the spectral content of audio data opens possibilities to not only diagnose but also effectively address the underlying issues. This thesis addresses the challenge of comprehensively capturing the full context of anomalies detected within audio data. To achieve this, we have developed a novel unsupervised method that adapts visual anomaly localization techniques specifically for the analysis of audio data. This approach utilizes visual representations of audio signals, particularly spectrograms, to apply the Student-Teacher Feature Pyramid Matching Method (STFPM) within an unsupervised learning framework. By harnessing the inherent visual patterns in audio data, our method enables precise localization of anomalies. By augmenting the MIMII dataset with synthetic anomalies and conducting extensive testing, we validated our approach’s ability to localize anomalies in audio data. The findings confirm that our model not only detects but also precisely pinpoints the location of these artificially introduced anomalies within audio spectrograms in terms of both time and frequency. This demonstrates the precision and reliability of our approach, highlighting its potential as a promising solution for accurately localizing anomalies in various audio applications.
  • Kotola, Mikko Markus (2021)
    Image captioning is the task of generating a natural language description of an image. The task requires techniques from two research areas, computer vision and natural language generation. This thesis investigates the architectures of leading image captioning systems. The research question is: What components and architectures are used in state-of-the-art image captioning systems and how could image captioning systems be further improved by utilizing improved components and architectures? Five openly reported leading image captioning systems are investigated in detail: Attention on Attention, the Meshed-Memory Transformer, the X-Linear Attention Network, the Show, Edit and Tell method, and Prophet Attention. The investigated leading image captioners all rely on the same object detector, the Faster R-CNN based Bottom-Up object detection network. Four out of five also rely on the same backbone convolutional neural network, ResNet-101. Both the backbone and the object detector could be improved by using newer approaches. Best choice in CNN-based object detectors is the EfficientDet with an EfficientNet backbone. A completely transformer-based approach with a Vision Transformer backbone and a Detection Transformer object detector is a fast-developing alternative. The main area of variation between the leading image captioners is in the types of attention blocks used in the high-level image encoder, the type of natural language decoder and the connections between these components. The best architectures and attention approaches to implement these components are currently the Meshed-Memory Transformer and the bilinear pooling approach of the X-Linear Attention Network. Implementing the Prophet Attention approach of using the future words available in the supervised training phase to guide the decoder attention further improves performance. Pretraining the backbone using large image datasets is essential to reach semantically correct object detections and object features. The feature richness and dense annotation of data is equally important in training the object detector.
  • Viljamaa, Venla (2022)
    In bioinformatics, new genomes are sequenced at an increasing rate. To utilize this data in various bioinformatics problems, it must be annotated first. Genome annotation is a computational problem that has traditionally been approached by using statistical methods such as the Hidden Markov model (HMM). However, implementing these methods is often time-consuming and requires domain knowledge. Neural network-based approaches have also been developed for the task, but they typically require a large amount of pre-labeled data. Genomes and natural language share many properties, not least the fact that they both consist of letters. Genomes also have their own grammar, semantics, and context-based meanings, just like phrases in the natural language. These similarities give motivation to the use of Natural language processing (NLP) techniques in genome annotation. In recent years, pre-trained Transformer neural networks have been widely used in NLP. This thesis shows that due to the linguistic properties of genomic data, Transformer network architecture is also suitable for gene predicting. The model used in the experiments, DNABERT, is pre-trained using the full human genome. Using task-specific labeled data sets, the model is then trained to classify DNA sequences into genes and non-genes. The main fine-tuning dataset is the genome of the Escherichia coli bacterium, but preliminary experiments are also performed on human chromosome data. The fine-tuned models are evaluated for accuracy, F1-score and Matthews correlation coefficient (MCC). A customized estimation method is developed, in which the predictions are compared to ground-truth labels at the nucleotide level. Based on that, the best models achieve a 90.15% accuracy and an MCC value of 0.4683 using the Escherichia coli dataset. The model correctly classifies even the minority label, and the execution times are measured in minutes rather than hours. These suggest that the NLP-based Transformer network is a powerful tool for learning the characteristics of gene and non-gene sequences.