Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "principal component analysis"

Sort by: Order: Results:

  • Holma, Paula (2011)
    Metabolomics is a rapidly growing research field that studies the response of biological systems to environmental factors, disease states and genetic modifications. It aims at measuring the complete set of endogenous metabolites, i.e. the metabolome, in a biological sample such as plasma or cells. Because metabolites are the intermediates and end products of biochemical reactions, metabolite compositions and metabolite levels in biological samples can provide a wealth of information on on-going processes in a living system. Due to the complexity of the metabolome, metabolomic analysis poses a challenge to analytical chemistry. Adequate sample preparation is critical to accurate and reproducible analysis, and the analytical techniques must have high resolution and sensitivity to allow detection of as many metabolites as possible. Furthermore, as the information contained in the metabolome is immense, the data set collected from metabolomic studies is very large. In order to extract the relevant information from such large data sets, efficient data processing and multivariate data analysis methods are needed. In the research presented in this thesis, metabolomics was used to study mechanisms of polymeric gene delivery to retinal pigment epithelial (RPE) cells. The aim of the study was to detect differences in metabolomic fingerprints between transfected cells and non-transfected controls, and thereafter to identify metabolites responsible for the discrimination. The plasmid pCMV-β was introduced into RPE cells using the vector polyethyleneimine (PEI). The samples were analyzed using high performance liquid chromatography (HPLC) and ultra performance liquid chromatography (UPLC) coupled to a triple quadrupole (QqQ) mass spectrometer (MS). The software MZmine was used for raw data processing and principal component analysis (PCA) was used in statistical data analysis. The results revealed differences in metabolomic fingerprints between transfected cells and non-transfected controls. However, reliable fingerprinting data could not be obtained because of low analysis repeatability. Therefore, no attempts were made to identify metabolites responsible for discrimination between sample groups. Repeatability and accuracy of analyses can be influenced by protocol optimization. However, in this study, optimization of analytical methods was hindered by the very small number of samples available for analysis. In conclusion, this study demonstrates that obtaining reliable fingerprinting data is technically demanding, and the protocols need to be thoroughly optimized in order to approach the goals of gaining information on mechanisms of gene delivery.
  • Nykänen, Venla (2022)
    Herbs are valued for culinary and health purposes and their metabolism and chemical composition can be influenced with LED lighting. This Master’s Thesis aimed to study how different spectra (green, blue, and white light) affect the sensory properties of hydroponically grown dill (Anethum graveolens L.) and coriander (Coriandrum sativum). The hypothesis was that green light produces more soapy and musty flavours in coriander, whereas blue light produces more citrus and typical coriander-like flavours. For dill the hypothesis was that blue and green light treatments produce stronger flavours compared to white light. A generic descriptive analysis method was chosen, and trained panels created sensory profiles for three light treatment and one commercial coriander and dill samples. Intensities of smell, taste and flavour attributes were evaluated using a line scale (0 = not at all to 10 = extremely) in three replicates. Study was conducted during the COVID-19 pandemic in the sensory laboratory conditions (ISO 8589). One-way ANOVA showed that light treatments had only slight impact on the sensory profiles of coriander and dill. In coriander blue light produced significantly lower lemon odour intensity compared to green light treatment. In dill total odour intensity was significantly lower in blue light sample compared to white light and commercial samples. Otherwise, one-way ANOVA did not show significant differences between samples. However, principal component analysis (PCA) implied that samples differed. Two-way ANOVA results showed that neither panel worked uniformly and deviation among intensity scores was observed. Herb samples proved to be rather difficult to evaluate and more extensive training could have improved panel’s performance. In future consumer study could be performed to study if spectrum affects the hedonic response to these herbs.
  • Kyrö, Minna (2011)
    FTIR spectroscopy (Fourier transform infrared spectroscopy) is a fast method of analysis. The use of interferometers in Fourier devices enables the scanning of the whole infrared frequency region in a couple of seconds. There is no need to elaborate sample preparation when the FTIR spectrometer is equipped with an ATR accessory and the method is therefore easy to use. ATR accessory facilitates the analysis of various sample types. It is possible to measure infrared spectra from samples which are not suitable for traditional sample preparation methods. The data from FTIR spectroscopy is frequently combined with statistical multivariate analysis techniques. In cluster analysis the data from spectra can be grouped based on similarity. In hierarchical cluster analysis the similarity between objects is determined by calculating the distance between them. Principal component analysis reduces the dimensionality of the data and establishes new uncorrelated principal components. These principal components should preserve most of the variation of the original data. The possible applications of FTIR spectroscopy combined with multivariate analysis have been studied a lot. For example in food industry its feasibility in quality control has been evaluated. The method has also been used for the identification of chemical compositions of essential oils and for the detection of chemotypes in oil plants. In this study the use of the method was evaluated in the classification of hog's fennel extracts. FTIR spectra of extracts from different plant parts of hog's fennel were compared with the measured FTIR spectra of standard substances. The typical absorption bands in the FTIR spectra of standard substances were identified. The wave number regions of the intensive absorption bands in the spectra of furanocoumarins were selected for multivariate analyses. Multivariate analyses were also performed in the fingerprint region of IR spectra, including the wave number region 1785-725 cm-1. The aim was to classify extracts according to the habitat and coumarin concentration of the plants. Grouping according to habitat was detected, which could mainly be explained by coumarin concentrations as indicated by analyses of the wave number regions of the selected absorption bands. In these analyses extracts mainly grouped and differed by their total coumarin concentrations. In analyses of the wave number region 1785-725 cm-1 grouping according to habitat was also detected but this could not be explained by coumarin concentrations. These groupings may have been caused by similar concentrations of other compounds in the samples. Analyses using other wave number regions were also performed, but the results from these experiments did not differ from previous results. Multivariate analyses of second-order derivative spectra in the fingerprint region did not reveal any noticeable changes either. In future studies the method could perhaps be further developed by investigating narrower carefully selected wave number regions of second-order derivative spectra.
  • Ikonen, Juha (2018)
    Study research how finnish farmers react to risk. Outcome is that finnish farmers are in average risk averse, and they weight lower probabilities more than high. Questionnaire was sent to 5 000 farmers, which 820 farmers sent their answer. Questionnaire included questions related to principal component analysis to confirm reliability. After analysis there were to principal components, which were compared in regression analysis with risk parameters alfa (value function parameter) and gamma (weighting function) with farmer's background information. Two principal components were not significant when alfa or gamma was dependent variable. Production sector was significant variable when weighting function parameter gamma acted as dependent variable. Age, amount of field owned or farms location did not have any meaning in attitudes towards risk. Study research how finnish farmers react to risk. Outcome is that finnish farmers are in average risk averse, and they weight lower probabilities more than high. Questionnaire was sent to 5 000 farmers, which 820 farmers sent their answer. Questionnaire included questions related to principal component analysis to confirm reliability. After analysis there were to principal components, which were compared in regression analysis with risk parameters alfa (value function parameter) and gamma (weighting function) with farmer's background information. Two principal components were not significant when alfa or gamma was dependent variable. Production sector was significant variable when weighting function parameter gamma acted as dependent variable. Age, amount of field owned or farms location did not have any meaning in attitudes towards risk.
  • Riikonen, Juha (2023)
    Population structure refers to the patterns of genetic variation within and between populations, which arises from various evolutionary processes such as genetic drift, natural selection and migration. Understanding this structure in human populations provides insights about our own evolutionary history and past migration patterns. Controlling for underlying population structure is also an essential step in genetic association analyses to ensure that the associations between genetic variants and traits of interest are not confounded by differences in ancestry. Results from such analyses are essential for the research and development of personalised medicine. Principal component analysis (PCA) is a method that has been widely used to study the patterns of genetic variability within populations. In this study, PCA is applied to a genotype data set of 38,113 samples born in Finland using data from Finnish study cohorts FINRISK, GeneRISK, FinHealth 2017 and Health 2000. The first ten principal components are extracted using PLINK 2.0 software. Novel discoveries of association between genetic variants and a disease often motivates further studies on the geographical distribution of such risk variants. Here, the genetic population structure is proposed as an alternative, higher dimensional space for studying the distribution of genetic variants within a population. This study presents a framework for quantifying and visualising the allele frequency variability across the genetic structure defined by principal components. Using an empirical Bayes model, the posterior minor allele frequency is estimated in discrete areas of the principal component space. The variability of these estimates is visualised as heatmaps, using a colouring scheme that provides statistical guarantees for frequency differences between different colours. The framework is demonstrated on five biallelic variants known to be associated with a disease or a disorder. The results show that visualising the pairwise components complemented with data on sample birth location reveals the major patterns of genetic variability within the Finnish population. The framework is able to distinguish areas in the genetic structure with differing levels of allele frequency, and visualise this variability as heatmaps that enable meaningful visual interpretation. The levels of allele frequency differences found in the principal component space are comparable to the differences found geographically, which suggests that studying individual variants within the genetic structure on top of geographical frequency maps can provide additional information on their distribution in a population.
  • Kukkonen, Tommi (2022)
    Eutrophication and harmful substances of anthropogenic origin threaten the state of the Baltic Sea and especially its geochemistry and oxygen levels near the seafloor. Water exchange between the Baltic Sea and the Atlantic Ocean can affect oxygen circulation and sedimentation rates, but they are considered very sporadic and it is unclear how the water circulation and flow rates affect element concentrations and sediment deposition in the near seafloor environments. One of the less studied basins is the Western Gulf of Finland and its seafloor environment. During the 2019 voyage, the seafloor located to the south of the city of Hanko was investigated through bathymetric sounding tools and other measurements in which element concentration and sediment deposition rate data was acquired. The sounding revealed a large channel cutting the seafloor which was hypothesized to influence the nearbottom conditions. The obtained data consisted of samples from 13 short, 40 cm sediment cores which were analysed for 137Cs activity, organic content, and grain size distribution. The goal of the thesis was to determine the intensity of water exchange taking place in the seabed channels between the mid-Baltic Sea and the Western Gulf of Finland and investigate the effect of the seafloor channel and flow rates on sediment and element deposition, their relationships, and how they affect the overall conditions in the study area. These relationships were analyzed through spatial and statistical methods by utilizing GIS-tools to interpolate the data obtained from the study locations by using the Inverse Distance Weighting (IDW) method, and by multielement analyses in the R-environment, namely Principal Component Analysis (PCA) and Partial Least Regression (PLS) to analyze grain size and element concentration correlations and combine them with obtained flow rate data. The results showed strong correlation in flow rate intensities between the Western Gulf of Finland and the mid-Baltic Sea, and they are strongly linked with sedimentation and element deposition rates. However, no long-term trend was identified for the seafloor channel velocity frequencies. The Cs-activity shows stronger sedimentation activity on the western side of the seafloor channel. The overall element and sediment deposition in the study area was largely controlled by monthly and seasonal current velocity fluctuations among other processes. The element concentration comparison showed weakened oxygen conditions in the study area with increased eutrophication and carbon burial since the 1950s. The principal Component Analysis showed smaller grain sizes (0.15 - 2 mm) having a stronger influence on the datasets with Mo, N, and C providing largest variation in the data. Interpolation showed oxygen, pH, and H2S to have more fluctuation in the study area, which can indicate changes in the vertical gradients in each sample point. It could also be determined that other measured concentrations, such as temperature, turbidity, and salinity do not respond very sensitively to water inflow fluctuations or sedimentation rate changes. The results indicate that harmful substances and eutrophication are most likely going to increase in the near-bottom environment in the Western Gulf of Finland, contributed by anthropogenic activity. Water exchange is likely to become more and more uneven, thus affecting the flow rate effects to the sediment deposition in the Baltic Sea. Further studies are needed to link these processes to large-scale global changes and the general state of changes happening in the Baltic Sea and its surrounding areas. The seafloor of the Western Gulf of Finland could also be studied further to gain a better understanding of longer timescale changes on the seafloor channel currents, and element and sediment deposition rates.
  • Smith, Dianna (2024)
    Statistician C. R. Rao made many contributions to multivariate analysis over the span of his career. Some of his earliest contributions continue to be used and built upon almost eighty years later, while his more recent contributions spur new avenues of research. This thesis discusses these contributions, how they helped shape multivariate analysis as we see it today, and what we may learn from reviewing his works. Topics include his extension of linear discriminant analysis, Rao’s perimeter test, Rao’s U statistic, his asymptotic expansion of Wilks’ Λ statistic, canonical factor analysis, functional principal component analysis, redundancy analysis, canonical coordinates, and correspondence analysis. The examination of his works shows that interdisciplinary collaboration and the utilization of real datasets were crucial in almost all of Rao’s impactful contributions.
  • Malmberg, Anni (2020)
    A population is said to be genetically structured when it can be divided into subpopulations based on genetic differences between the individuals. As in case of Finland for example, the population has been shown to consist of genetic subpopulations that correspond strongly to geographical subgroups. Such information may be interesting when seeking answers to questions related to the settlement and migration history of some population. Information about genetic population structure is also required for example in studies looking for associations between genetic variants and some inheritable disease to ensure that the groups with and without diagnosis of the disease resemble each other genetically except for the genetic variants causing the disease. In my thesis, I have compared how two different mathematical models, principal component analysis (PCA) and generative topographic mapping (GTM), visualize ancestry and identify genetic structure in Finnish population. PCA was introduced already in 1901, and nowadays it is a standard tool in identifying genetic structure and visualizing ancestry. GTM instead was published relatively recently, in 1998, and has not yet been applied in population structure studies as widely than PCA. Both PCA and GTM transform high-dimensional data to a low-dimensional, interpretable representation where relationships between observations of the data are summarized. In case of data containing genetic heterogeneity between individuals, this representation gives a visual approximation of the genetic structure of the population. However, Hèlèna A. Gaspar and Gerome Breen found in 2018 that GTM is able to classify ancestry of populations from around the world more accurately than PCA: the differences recognized by PCA were mainly between geographically most distant populations, while GTM detected also more their subpopulations. My aims in the thesis were to examine whether applying the methods for Finnish data would give similar results, and to give thorough presentations of the mathematical background for both the methods. I also discuss how the results fit into what is currently known about the genetic population structure in Finland. The study results are based on data from the FINRISK Study Survey collected by the National Institute for Health and Welfare (THL) in 1992-2012 and include 35 499 samples. After performing quality control on the data, I analysed the data with SmartPCA program and ugtm Python package implementing PCA and GTM, respectively. The final results have been presented for such 2010 individuals that participated the FINRISK Study Survey in 1997 and whose both parents were born close to each other. I have assigned the individuals into distinct geographical subgroups according to the birthplaces of their mothers to find out whether PCA and GTM identify individuals having a similar geographical origin to be genetically close to each other. Based on the results, the genetic structure in Finland is clearly geographically clustered, which fits into what is known from earlier studies. The results were also similar to those observed by Gaspar and Breen: Both the methods identified the genetic substructure but GTM was able to recognize more subtle differences in ancestry between the geographically defined subgroups than PCA. For example, GTM discovered the group corresponding to the region of Northern Ostrobothnia to consist of four smaller separate subgroups, while PCA interpreted the individuals with a Northern Ostrobothnian origin to be genetically rather homogeneous. Locating these individuals on the map of Finland according to the birthplaces of their mothers reveals that they also make four geographical clusters corresponding to the genetic subpopulations detected by GTM. As a final conclusion I state that GTM is a noteworthy alternative to PCA for studying genetic population structure, especially when it comes to identifying substructures from a population that PCA may interpret to be genetically homogeneous. I also note that the reason why GTM generally seems to be capable of more fine-grained clustering than PCA, is probably that PCA as a linear model may cause more bias to the results than GTM which accounts for also non-linear relationships when transforming the data into a more interpretable form.