Browsing by master's degree program "Matematiikan ja tilastotieteen maisteriohjelma"

Now showing items 61-80 of 152

Logistisen regression tulkinnasta : ulkomaalaistaustaisten nuorten kokemus terveydentilastaan, perheen resurssien ja elintapojen valossa

Salow, Olga-Tuulia (2021)

Tässä tutkielmassa esitetään logistisen regressiomallin teoriaa sekä havainnollistetaan sen soveltuvuutta terveystieteelliseen tutkimukseen. Tutkielman tarkoituksena on tarkastella logistisen regressiomallin parametrin estimaattien tulkintaa. Mallin estimaatteja voidaan tulkita kolmen eri metriikan avulla mutta usein tarkastelut rajoittuu vain yhteen. Tutkielmassa käydään läpi kaikki kolme metriikkaa, eli todennäköisyys-, logit- sekä ristisuhdemetriikka ja tarkastelaan näitä teorian ja empiirisen esimerkin avulla. Esimerkissä käytetty aineisto koostuu THL:n Kouluterveyskyselyyn vastanneiden vantaalaisten 8. ja 9. luokan oppilaiden vastauksista ja on tehty yhteistyössä Vantaan kaupungin kanssa. Tutkielman analyysit on tehty Stata ohjelmistolla minkä käytöstä esitetään muutama esimerkki. Tutkielman alussa käydään läpi logistisen regressiomallin teoriaa kuten yleistettyjen lineaaristen mallien teoriaa sekä mallin sovitus suurimman uskottavuuden menetelmällä. Tämän jälkeen käydään läpi metriikan valintaa ja tulkintaa sekä nostetaan esiin myös mallin yhteisvaikutustermin tulkintaan liittyviä huomioita. Tutkielman lopussa havainnollistetaan logistisen regressiomallin soveltuvuutta laadullisiin tutkimuskysymyksiin. Analyyseissä keskitytään tarkastelemaan ilmeneekö terveyden kokemuksessa eroja ulkomaalaistaustaisten ja suomalaistaustaisten nuorten välillä ja muuttaako perheen resursseihin ja elintapoihin liittyvien muuttujien lisääminen malleihin näitä havaintoja. Mallin kolmen eri metriikan teoreettinen sekä empiirinen tarkastelu osoittavat, että tulkinta on riippuvainen metriikan valinnasta mutta tehtävät johtopäätökset eivät välttämättä ole riippuvaisia metriikasta. Erityisesti laadullisen tulkinnan kannalta on haastavaa muuttujien yhteyksien suuruuden tulkinta sekä tilastollisen merkitsevyyden toteamisessa ilmenee eroja. Vaikka tulkinta on riippuvainen metriikan valinnasta oli tutkielmassa laadulliset johtopäätökset kuitenkin lopulta samankaltaiset. Logistisen regressiomallin analyysit toivat siis esiin samankaltaiset päätelmät, riippumatta käytettävästä metriikasta. Analyysit osoittavat, että Vantaalla nuoren ulkomaalaistausta ei ole vahva selittävä tekijä nuoren terveyden kokemukselle. Kuitenkin sukupolvien välillä ilmenee merkitseviä eroja suomalaistaustaisiin nuoriin verratuna. Nuorten kokemus perheen huonosta taloudellisesta tilanteesta sekä arkeen kuuluvien terveyteen positiivisesti vaikuttavien elintapojen puuttuminen selittivät merkitsevän osan nuorten terveyden kokemuksesta.
Lognormaalijakauma ja sen soveltaminen pörssiosakkeiden tuotonlaskennassa

Lehdonvirta, Otso (2022)

Tutkielmassa annetaan teoreettinen oikeutus sille, että pörssiosakkeen tuotto on lognormaalijakautunut kunhan se täyttää tietyn tyyppiset ehdot. Kun oletamme, että pörssiosakkeen tuotto täyttää nämä ehdot, voimme todistaa Lindebergin-Fellerin raja-arvolauseen avulla, että silloin pörssiosakkeen tuotto lähenee lognormaalijakaumaa mitä useammin pörssiosakkeella tehdään kauppaa tarkastetun ajanjakson aikana. Kokeilemme Coca-Colan ja Freeport-McMoranin osakkeilla empiirisiesti, noudattavatko niiden pörssiosakeiden tuotot lognormaalijakaumaa käyttämällä Kolmogorovin-Smirnovin -testiä. Nämä kyseiset osakkeet edustavat eri teollisuudenaloja, joten niiden pörssiosakkeet käyttäytyvät eri lailla. Lisäksi ne ovat hyvin likvidejä ja niillä käydään kauppaa tiheästi. Testeistä käy ilmi, että emme voi poissulkea Coca-Colan pörssiosakkeen tuoton noudattavan lognormaalijakaumaa, mutta Freeport-McMoranin voimme. Usein kirjallisuudessa oletetaan, että pörssiosakkeen tuotto on lognormaalijakautunut. Esimerkiksi alkuperäisessä Black-Scholes-mallissa oletetaan, että pörssiosakkeentuotto on lognormaalijakautunut. Se miten pörssiosakkeen tuotto on jakautunut vaikuttaa siihen, miten Black-Scholes-mallin mallintamat osakejohdannaiset hinnoitellaan ja kyseistä hinnoittelumallia saatetaan käyttää yritysten kirjanpidossa. Black-Scholes-malli, jossa pörssiosakkeen tuotto on lognormaalijakautunut, esitetään tutkielmassa.
Log-optimaalinen salkku ja minimaalinen markkinamalli

Nuutinen, Joonas (2021)

Tässä tutkielmassa käsitellään log-optimaalisen salkun käsitettä jatkuvassa markkinamallissa. Jatkuva markkinamalli koostuu instrumenteista, joiden arvoja mallinnetaan jatkuvilla stokastisilla prosesseilla. Mahdollisia sijoitusstrategioita kuvataan salkuilla, jotka ovat instrumenttien määristä koostuvia moniulotteisia stokastisia prosesseja. Log-optimaalinen salkku määritellään siten, että se jokaisella hetkellä maksimoi salkun arvon logaritmin lyhyen aikavälin muutoksen odotusarvon. Lokaalisti optimaalinen salkku puolestaan maksimoi jokaisella hetkellä salkun arvon lyhyen aikavälin muutoksen odotusarvon valitulla varianssilla. Tutkielmassa todistetaan, että jokainen lokaalisti optimaalinen salkku voidaan esittää yhdistelmänä log-optimaalista salkkua ja pankkitalletusta vastaavaa instrumenttia. Saman osoitetaan pätevän myös log-optimaalisen salkun ja instrumenttien kokonaismääristä koostuvan markkinasalkun välillä, jos jokaisella markkinoilla toimivista sijoittajista on jokin lokaalisti optimaalinen salkku. Tutkielmassa käsitellään lisäksi minimaalista markkinamallia, joka on eräs yksinkertainen malli log-optimaaliseksi oletettavan markkinasalkun arvolle. Tähän liittyen johdetaan myös yksittäisten instrumenttien arvoja mallintava jatkuva markkinamalli, jossa instrumentteja vakiomäärät sisältävä markkinasalkku on minimaalisen markkinamallin mukainen log-optimaalinen salkku.
Matrix Product State Decomposition in Infinite-Dimensional Hilbert Spaces

Heikkinen, Niilo (2024)

In this thesis, we prove the existence of a generalization of the matrix product state (MPS) decomposition in infinite-dimensional separable Hilbert spaces. Matrix product states, as a specific type of tensor network, are typically applied in the context of finite-dimensional spaces. However, as quantum mechanics regularly makes use of infinite-dimensional Hilbert spaces, it is an interesting mathematical question whether certain tensor network methods can be extended to infinite dimensions. It is a well-known result that an arbitrary vector in a tensor product of finite-dimensional Hilbert spaces can be written in MPS form by applying repeated singular value or Schmidt decompositions. In this thesis, we use an analogous method in the infinitedimensional context based on the singular value decomposition of compact operators. In order to acquire sufficient theoretical background for proving the main result, we first discuss compact operators and their spectral theory, and introduce Hilbert-Schmidt operators. We also provide a brief overview of the mathematical formulation of quantum mechanics. Additionally, we introduce the reader to tensor products of Hilbert spaces, in both finite- and infinite-dimensional contexts, and discuss their connection to Hilbert-Schmidt operators and quantum mechanics. We also prove a generalization of the Schmidt decomposition in infinite-dimensional Hilbert spaces. After establishing the required mathematical background, we provide an overview of matrix product states in finite-dimensional spaces. The thesis culminates in the proof of the existence of an MPS decomposition in infinite-dimensional Hilbert spaces.
Measuring Succinctness With The Formula Size Game

Hackman, Axel; Hackman, Axel (2024)

The question of how much one logic can express compared to another can be measured with formula size, and important results have been reached with formula size games. These games can separate two classes of structures from each other within a given number of moves. Since formula size can also be expressed through extended syntax trees, we are interested in seeing what attributes or benefits games or trees have in different situations. First-order logic and its fragments are particularly interesting. This thesis discusses formula size games and analyses their use in known succinctness results between fragments of first-order logic and also between first-order logic and modal logic. While extended syntax trees may be preferred for results between fragments of first-order logic, the formula size game can be easily constructed for different languages. We find that both methods have advantages depending on the two logics that are compared to each other.
Mechanistic derivation and qualitative analysis of a predator-prey population model with strong Allee effect in the predator

Karjalainen, Miko (2023)

Predator-prey models are mathematical models widely used in ecology to study the dynamics of predator and prey populations, to better understand the stability of such ecosystems and to elucidate the role of various ecological factors in these dynamics. An ecologically important phenomenon studied with these models is the so-called Allee effect, which refers to populations where individuals have reduced fitness at low population densities. If an Allee effect results in a critical population threshold below which a population cannot sustain itself it is called a strong Allee effect. Although predator-prey models with strong Allee effects have received a lot of research attention, most of the prior studies have focused on cases where the phenomenon directly impacts the prey population rather than the predator. In this thesis, the focus is placed on a particular predator-prey model where a strong Allee effect occurs in the predator population. The studied population-level dynamics are derived from a set of individual-level behaviours so that the model parameters retain their interpretation at the level of individuals. The aim of this thesis is to investigate how the specific individual-level processes affect the population dynamics and how the population-level predictions compare to other models found in the literature. Although the basic structure of the model precedes this paper, until now there has not been a comprehensive analysis of the population dynamics. In this analysis, both the mathematical and biological well-posedness of the model system are established, the feasibility and local stability of coexistence equilibria are examined and the bifurcation structure of the model is explored with the help of numerical simulations. Based on these results, the coexistence of both species is possible either in a stable equilibrium or in a stable limit cycle. Nevertheless, it is observed that the presence of the Allee effect has an overall destabilizing effect on the dynamics, often entailing catastrophic consequences for the predator population. These findings are largely in line with previous studies of predator-prey models with a strong Allee effect in the predator.
Metric thickenings and complexes on metric spaces

Vuorenmaa, Elmo (2021)

In topology, one often wishes to find ways to extract new spaces out of existing spaces. For example, the suspension of a space is a fundamental technique in homotopy theory. However, in recent years there has been a growing interest in extracting topological information out of discrete structures. In the field of topological data-analysis one often considers point clouds, which are finite sets of points embedded in some R^m. The topology of these sets is trivial, however, often these sets have more structure. For example, one might consider a uniformly randomly sampled set of points from a circle S1. Clearly, the resulting set of points has some geometry associated to it, namely the geometry of S1. The use of certain types of topological spaces called Vietoris-Rips and Cech complexes allows one to study the "underlying topology" of point clouds by standard topological means. This in turn enables the application of tools from algebraic topology, such as homology and cohomology, to be applied to point clouds. Vietoris-Rips and Cech complexes are often not metrizable, even though they are defined on metric spaces. The purpose of this thesis is to introduce a homotopy result of Adams and Mirth concerning Vietoris-Rips metric thickenings. In the first chapter, we introduce the necessary measure theory for the main result of the thesis. We construct the 1-Wasserstein distance, and prove that it defines a metric on Polish spaces. We also note, that the 1-Wasserstein distance is a metric on general metric spaces. In the sequel, we introduce various complexes on spaces. We study simplicial complexes on R^n and introduce the concept of a realization. We then prove a theorem on the metrizability of a realization of a simplicial complex. We generalize simplicial complexes to abstract simplicial complexes and study the geometric realization of some complexes. We prove a theorem on the existence of geometric realizations for abstract simplicial complexes. Finally, we define Vietoris-Rips and Cech complexes, which are complexes that are formed on metric spaces. We introduce the nerve lemma for Cech complexes, and prove a version of it for finite CW-complexes. The third chapter introduces the concept of reach, which in a way measures the curvature of the boundary of a subset of R^n. We prove a theorem that characterizes convex, closed sets of R^n by their reach. We also introduce the nearest point projection map π, and prove its continuity. In the final chapter, we present some more measure theory, which leads to the definitions of Vietoris-Rips and Cech metric thickenings. The chapter culminates in constructing an explicit homotopy equivalence between a metric space X of positive reach and its Vietoris-Rips metric thickening.
Milnorin lause : Äärellisesti viritettyjen ratkeavien ryhmien kasvusta

Metsälampi, Lilja (2021)

Tutkielman päämääränä on esitellä ja todistaa Milnorin lause (John Milnor, 1968) geometrisen ryhmäteorian alalta. Milnorin lause on olennainen osa äärellisesti viritettyjen ratkeavien ryhmien kasvun luokittelua. Se kertoo, että äärellisesti viritetyt ratkeavat ryhmät joko kasvavat eksponentiaalisesti tai ovat polysyklisiä. Polysyklisten ryhmien kasvun tiedetään olevan joko polynomista tai eksponentiaalista. Näin ollen äärellisesti viritetyt ratkeavat ryhmät kasvavat joko polynomisesti tai eksponentiaalisesti. Tutkielman ensimmäinen luku on johdantoa ja toinen luku on esitietoja. Tutkielman kolmannessa luvussa esitellään ryhmät ja aakkostot. Erityisesti esitellään, mitä tarkoittaa ajatella ryhmän alkioita jonkin aakkoston sanoina. Lisäksi määritellään vapaat ryhmät ja ryhmien esitykset. Tämän jälkeen neljännessä luvussa ryhmiin määritellään ryhmän Cayley graafin avulla sanametriikaksi kutsuttu metriikka. Todistetaan, että eri virittäjäjoukkojen suhteen muodostetut sanametriikat ovat keskenään bilipschitzekvivalentit. Lopulta määritellään ryhmien kasvu ja todistetaan, että ryhmän kasvu ei riipu valitusta virittäjäjoukosta. Viidennessä luvussa esitellään ratkeavat ryhmät, nilpotentit ryhmät ja polysykliset ryhmät ja muutamia konkreettisia esimerkkejä näistä ryhmistä. Lisäksi esitellään näiden ryhmien keskeisiä ominaisuuksia ja niiden välisiä suhteita. Todistetaan esimerkiksi, että jokainen nilpotentti ja polysyklinen ryhmä on myös ratkeava ryhmä. Kuudennessa luvussa todistetaan tutkielman päätulos, Milnorin lause. Se tapahtuu induktiolla ratkeavalle ryhmälle ominaisen subnormaalin laskevan jonon pituuden suhteen. Lisäksi esitellään ja todistetaan tarvittavia aputuloksia. Luvun lopussa esitellään Wolfin lause (Joseph Wolf, 1968) ja yhdistetään Milnorin ja Wolfin lauseet yhdeksi tulokseksi, Milnor-Wolfin lauseeksi. Milnor-Wolfin lauseen nojalla äärellisesti viritettyjen ratkeavien ryhmien kasvu saadaan luokiteltua.
Möbius-kuvaukset

Andberg, Sari (2022)

Tutkielman aiheena ovat Möbius-kuvaukset, jotka ovat olennainen osa kompleksianalyysia ja täten edelleen analyysia. Möbius-kuvauksiin tutustutaan yleensä matematiikan syventävällä kurssilla Kompleksianalyysi 1, jonka lisäksi lukijalta toivotaan analyysin perustulosten tuntemista. Möbius-kuvaukset ovat helposti lähestyttäviä ja mielenkiintoisia ensimmäisen asteen rationaalifunktioita. Kuvauksilla on useita hyödyllisiä geometrisia ominaisuuksia ja niillä voidaan ratkaista kätevästi erilaisia kuvaustehtäviä, minkä vuoksi ne ovatkin erityisen tärkeitä. Tutkielman luku 1 on lyhyt johdatus Möbius-kuvauksiin. Luvussa 2 tutustutaan Möbius-kuvausten kannalta olennaisiin kompleksianalyysin käsitteisiin, kuten laajennettu kompleksitaso, Riemannin pallo sekä alkeisfunktiot. Kolmannessa luvussa määritellään itse Möbius-kuvaukset ja esitetään esimerkkejä erilaisista Möbius-kuvauksista. Luvussa näytetään lisäksi muun muassa, että Möbius-kuvaukset ovat bijektioita sekä konformisia, ja tutkitaan kuvausten analyyttisuutta. Luvussa 4 tutustutaan kaksoissuhteen käsitteeseen ja todistetaan Möbius-kuvausten myös säilyttävän kaksoisuhteet. Luvussa määritellään lisäksi kompleksitason erilaisia puolitasoja sekä ratkaistaan kaksoissuhteen avulla erilaisia kuvaustehtäviä tätä myös kuvin havainnollistaen. Viidennessä luvussa tutustutaan kvasihyperboliseen metriikkaan ja näytetään Möbius-kuvaukset hyperbolisiksi isometrioiksi. Aineistonani tutkielmassa on käytetty pääsääntöisesti Ritva Hurri-Syrjäsen Kompleksianalyysi 1- kurssin sisältöä. Lisäksi luvussa 5 pohjataan Paula Rantasen työhön Uniformisista alueista sekä F. W. Gehringin ja B. P. Palkan teokseen Quasiformally homogeneous domains.
Modeling a graduation process with survival and event history analysis

Mäkinen, Eetu (2023)

In this thesis, we model the graduation of Mathematics and Statistics students at the University of Helsinki. The interest is in the graduation and drop-out times of bachelor’s and master’s degree program students. Our aim is to understand how studies lead up to graduation or drop-out, and which students are at a higher risk of dropping out. As the modeled quantity is time-to-event, the modeling is performed with survival analysis methods. Chapter 1 gives an introduction to the subject, while in Chapter 2 we explain our objectives for the research. In Chapter 3, we present the available information and the possible variables for modeling. The dataset covers a 12-year period from 2010/11 to 2021/22 and includes information for 2268 students in total. There were many limitations, and the depth of the data allowed the analysis to focus only on the post-2017/18 bachelor’s program. In Chapter 4, we summarize the data with visual presentation and some basic statistics of the follow-up population and different cohorts. The statistical methods are presented in Chapter 5. After introducing the characteristic concepts of time-to-event analysis, the main focus is on two alternative model choices; the Cox regression and the accelerated failure time models. The modeling itself was conducted with programming language R, and the results are given in Chapter 6. In Chapter 7, we introduce the main findings of the study and discuss how the research could be continued in the future. We found that most drop-outs happen early, during the first and second study year, with the grades from early courses such as Raja-arvot providing some early indication of future success in studies. Most graduations in the post-2017/18 program occur between the end of the third study year and the end of the fourth study year, with the median graduation time being 3,2 years after enrollment. Including the known graduation times from the pre-2017/18 data, the median graduation time from the whole follow-up period was 3,8 years. Other relevant variables in modeling the graduation times were gender and whether or not a student was studying in the Econometrics study track. Female students graduated faster than male students, and students in the Econometrics study track graduated slower than students in other study tracks. In future continuation projects, the presence of more specific period-wise data is crucial, as it would allow the implementation of more complex models and a reliable validation for the results presented in this thesis. Additionally, more accuracy could be attained for the estimated drop-out times.
Modeling of the colonization success of lowland herbs in open tundra

Kolehmainen, Ilmari (2022)

This thesis analyses the colonization success of lowland herbs in open tundra using Bayesian inference methods. This was done with four different models that analyse the the effects of different treatments, grazing levels and environmental covariates on the probability of a seed growing into a seedling. The thesis starts traditionally with an introduction chapter. The second chapter goes through the data; where and how it was collected, different treatments used and other relevant information. The third chapter goes through all the methods that you need to know to understand the analysis of this thesis, which are the basics of Bayesian inference, generalized linear models, generalized linear mixed models, model comparison and model assessment. The actual analysis starts in the fourth chapter that introduces the four models used in this thesis. All of the models are binomial generalized linear mixed models that have different variables. The first model only has the different treatments and grazing levels as variables. The second model also includes interactions between these treatment and grazing variables. The third and fourth models are otherwise the same as the first and the second but they also have some environmental covariates as additional variables. Every model also has the block number, where the seeds were sown as a random effect. The fifth chapter goes through the results of the models. First it shows the comparison of the predictive accuracy of all models. Then the gotten fixed effects, random effects and draws from posterior predictive distribution are presented for each model separately. Then the thesis ends with the sixth conclusions chapter
Multilevel Logistic Modelling : A Register-Based Study of Mental Disorder, Socioeconomic Status and Regional Variation among Children in Finnish Municipalities

Niemi, Ripsa (2022)

Mental disorders are common during childhood and they are associated with various negative consequences later in life, such as lower educational attainment and unemployment. In addition, the reduction of socioeconomic health disparities has attracted both political, research and media interest. While mental health inequalities have been found consistently in literature and regional disparities in health have been well documented in Finland altogether, the question of possible variation in mental disorder inequalities during childhood among Finnish regions is not fully examined. This master’s thesis contributes to this gap in the research with a statistical perspective and use of a multilevel logistic model, which allows random variation between levels. Using register-based data, I ask whether the association between socioeconomic status and mental disorder in childhood varies between the child’s municipality of residence, and which regional factors possibly explain the differences. The second objective of this thesis is to find out whether the use of a multilevel logistic model provides additional value to this context. The method used in the thesis is a multilevel logistic model, which can also be called a generalized linear mixed-effects model. In multilevel models, the observations are nested within hierarchical levels, which all have corresponding variables. Both intercept and slopes of independent variables can be allowed to vary between the Level 2 units. Intraclass correlation coefficient and median odds ratio (MOR) are used to measure group level variation. In addition, centering of variables and choosing a suitable analysis strategy are central steps in model application. High-quality Finnish register data from Statistics Finland and the Finnish Institute of Health and Welfare is utilised. The study sample consists of 815 616 individuals aged 4–17 living in Finland in the year 2018. The individuals who are used as Level 1 units are nested within 306 Level 2 units based on their municipality of residence. The dependent variable is a dichotomous variable indicating a mental disorder and it is based on visits and psychiatric diagnoses given in specialised healthcare during 2018. Independent variables in Level 1 are maternal education level and household income quintile, and models are controlled for age group, gender, family structure and parental mental disorders. In Level 2, the independent variables are urbanisation, major region, share of higher-educated population and share of at-risk-of-poverty children. In the final model, children with the lowest maternal education level are more likely (OR=1.37, SE=0.0026) to have mental disorders than children with the highest maternal education level. Odds ratios for the household income quintile mostly decline close to one when control variables are included. Interestingly, children from the poorest quintile have slightly lower odds for mental disorder (OR=0.84, SE=0.017) compared with children from the richest quintile. Urbanisation, share of higher-educated population and share of at-risk-of-poverty children are statistically insignificant variables. Differences are found between major regions; children from Åland are more likely (OR=1.5, SE=0.209) to have a mental disorder compared with Helsinki-Uusimaa residents, whereas children from Western Finland (OR=0.71, SE 0.053) have lower odds compared to the same reference. Random slopes for maternal education are not significant, and the model fit does not improve. However, there is some variation among municipalities (MOR=1.34), and this finding defends the usefulness of the multilevel model in the context of mental disorders in childhood. The results show that mental disorder inequalities persist in childhood, but there is complexity. Although no variation in socioeconomic inequalities among municipalities is found, there are still contextual effects between municipalities. Health policies should focus on reducing overall mental health inequalities in the young population, but it is an encouraging finding that disparities in childhood mental disorders are not shown to be stronger in some municipalities than others. Multilevel models can contribute to the methodology of future mental disorder research, if societal context is assumed to affect the outcomes of individuals.
Multiple Schramm-Löwner evolution in the simply-connected critical planar Ising model

Aarnos, Mikko (2023)

A major innovation in statistical mechanics has been the introduction of conformal field theory in the mid 1980’s. The theory postulates the existence of conformally invariant scaling limits for many critical 2D lattice models, and then uses representation theory of a certain algebraic object that can be associated to these limits to derive exact solvability results. Providing mathematical foundations for the existence of these scaling limits has been a major ongoing project ever since, and lead to the introduction of Schramm-Löwner evolution (or SLE for short) in the early 2000’s. The core insight behind SLE is that if a conformally invariant random planar curve can be described by Löwner evolution and fulfills a condition known as the domain Markov property, it must be driven by a Wiener process with no drift. Furthermore, the variance of the Wiener process can be used to define a family SLE𝜅 of random curves which are simple, self-touching or space-filling depending on 𝜅 ≥ 0. This combination of flexibility and rigidity has allowed the scaling limits of various lattice models, such as the loop-erased random walk, the harmonic explorer, and the critical Ising model with a single interface, to be described by SLE. Once we move (for example) to the critical Ising model with multiple interfaces it turns out that the standard theory of SLE is inadequate. As such we would like establish the existence of multiple SLE to handle these more general situations. However, conformal invariance and the domain Markov property no longer guarantee uniqueness of the object so the situation is more complicated. This has led to two main approaches to the study of multiple SLE, known as the global and local approaches. Global methods are often simpler, but they often do not yield explicit descriptions of the curves. On the other hand, local methods are far more involved but as a result give descriptions of the laws of the curves. Both approaches have lead to distinct proofs that the laws of the driv- ing terms of the critical Ising model on a finitely-connected domain are described by multiple SLE3 . The aim of this thesis is to provide a proof of this result on a simply-connected domain that is simpler than the ones found in the literature. Our idea is to take the proof by local approach as our base, simplify it after restricting to a simply-connected domain, and bypass the hard part of dealing with a martingale observable. We do this by defining a function as a ratio of what are known as SLE3 partition functions, and use it as a Radon-Nikodym derivative with respect to chordal SLE3 to construct a new measure. A convergence theorem for fermionic observables shows that this measure is the scaling limit of the law of the driving term of the critical Ising model with multiple interfaces, and due to our knowledge of the Radon-Nikodym derivative an application of Girsanov’s theorem shows that the measure we constructed is just local multiple SLE3.
Multivariate Regular Variation

Bernardo, Alexandre (2020)

In insurance and reinsurance, heavy-tail analysis is used to model insurance claim sizes and frequencies in order to quantify the risk to the insurance company and to set appropriate premium rates. One of the reasons for this application comes from the fact that excess claims covered by reinsurance companies are very large, and so a natural field for heavy-tail analysis. In finance, the multivariate returns process often exhibits heavy-tail marginal distributions with little or no correlation between the components of the random vector (even though it is a highly correlated process when taking the square or the absolute values of the returns). The fact that vectors which are considered independent by conventional standards may still exhibit dependence of large realizations leads to the use of techniques from classical extreme-value theory, that contains heavy-tail analysis, in estimating an extreme quantile of the profit-and-loss density called value-at-risk (VaR). The need of the industry to understand the dependence between random vectors for very large values, as exemplified above, makes the concept of multivariate regular variation a current topic of great interest. This thesis discusses multivariate regular variation, showing that, by having multiple equivalent characterizations and and by being quite easy to handle, it is an excellent tool to address the real-world issues raised previously. The thesis is structured as follows. At first, some mathematical background is covered: the notions of regular variation of a tail distribution in one dimension is introduced, as well as different concepts of convergence of probability measures, namely vague convergence and $\mathbb{M}^*$-convergence. The preference in using the latter over the former is briefly discussed. The thesis then proceeds to the main definition of this work, that of multivariate regular variation, which involves a limit measure and a scaling function. It is shown that multivariate regular variation can be expressed in polar coordinates, by replacing the limit measure with a product of a one-dimensional measure with a tail index and a spectral measure. Looking for a second source of regular variation leads to the concept of hidden regular variation, to which a new hidden limit measure is associated. Estimation of the tail index, the spectral measure and the support of the limit measure are next considered. Some examples of risk vectors are next analyzed, such as risk vectors with independent components and risk vectors with repeated components. The support estimator presented earlier is then computed in some examples with simulated data to display its efficiency. However, when the estimator is computed with real-life data (the value of stocks for different companies), it does not seem to suit the sample in an adequate way. The conclusion is drawn that, although the mathematical background for the theory is quite solid, more research needs to be done when applying it to real-life data, namely having a reliable way to check whether the data stems from a multivariate regular distribution, as well as identifying the support of the limit measure.
Non-parametric Bayesian Time Series With Gaussian Processes

Tiihonen, Iiro (2020)

Työni aihe on Gaussisten prosessien (Gp) soveltaminen aikasarjojen analysointiin. Erityisesti lähestyn aikasarjojen analysointia verrattain harvinaisen sovellusalan, historiallisten aikasarja-aineistojen analysoinnin näkökulmasta. Bayesilaisuus on tärkeä osa työtä: parametreja itsessään kohdellaan satunnaismuuttujina, mikä vaikuttaa sekä mallinnusongelmien muotoiluun että uusien ennusteiden tekemiseen työssä esitellyillä malleilla. Työni rakentuu paloittain. Ensin esittelen Gp:t yleisellä tasolla, tilastollisen mallinnuksen työkaluna. Gp:eiden keskeinen idea on, että Gp-prosessin äärelliset osajoukot noudattavat multinormaalijakaumaa, ja havaintojen välisiä yhteyksiä mallinnetaan ydinfunktiolla (kernel), joka samaistaa havaintoja niihin liittyvien selittäjien ja parametriensa funktiona. Oikeanlaisen ydinfunktion valinta ja datan suhteen optimoidut parametrit mahdollistavat hyvinkin monimutkaisten ja heikosti ymmärrettyjen ilmiöiden mallintamisen Gp:llä. Esittelen keskeiset tulokset, jotka mahdollistavat sekä GP:n sovittamisen aineistoon että sen käytön ennusteiden tekemiseen ja mallinnetun ilmiön alatrendien erittelyyn. Näiden perusteiden jälkeen siirryn käsittelemään sitä, miten GP-malli formalisoidaan ja sovitetaan, kun lähestymistapa on Bayesilainen. Käsittelen sekä eri sovittamistapojen vahvuuksia ja heikkouksia, että mahdollisuutta liittää Gp osaksi laajempaa tilastollista mallia. Bayesilainen lähestymistapa mahdollistaa mallinnettua ilmiötä koskevan ennakkotiedon syöttämisen osaksi mallin formalismia parametrien priorijakaumien muodossa. Lisäksi se tarjoaa systemaattisen, todennäköisyyksiin perustuvan tavan puhua sekä ennakko-oletuksista että datan jälkeisistä parametreihin ja mallinnetun ilmiön tuleviin arvoihin liittyvistä uskomuksista. Seuraava luku käsittelee aikasarjoihin erityisesti liittyviä Gp-mallintamisen tekniikoita. Erityisesti käsittelen kolmea erilaista mallinnustilannetta: ajassa tapahtuvan Gp:n muutoksen, useammasta eri alaprosessista koostuvan Gp:n ja useamman keskenään korreloivan Gp:n mallintamista. Tämän käsittelyn jälkeen työn teoreettinen osuus on valmis: aikasarjojen konkreettinen analysointi työssä esitellyillä työkaluilla on mahdollista. Viimeinen luku käsittelee historiallisten ilmiöiden mallintamista aiemmissa luvuissa esitellyillä tekniikoilla. Luvun tarkoitus on ensisijaisesti esitellä lyhyesti useampi potentiaalinen sovelluskohde, joita on yhteensä kolme. Ensimmäinen luvussa käsitelty mahdollisuus on usein vain repalaisesti havaintoja sisältävien historiallisten aikasarja-aineistojen täydentäminen GP-malleista saatavilla ennusteilla. Käytännön tulokset korostivat tarvetta vahvoille prioreille, sillä historialliset aikasarjat ovat usein niin harvoja, että mallit ovat valmiita hylkäämän havaintojen merkityksen ennustamisessa. Toinen esimerkki käsittelee historiallisia muutoskohtia, esimerkkitapaus on Englannin sisällissodan aikana äkillisesti räjähtävä painotuotteiden määrä 1640-luvun alussa. Sovitettu malli onnistuu päättelemään sisällissodan alkamisen ajankohdan. Viimeisessä esimerkissä mallinnan painotuotteiden määrää per henkilö varhaismodernissa Englannissa, käyttäen ajan sijaan selittäjinä muita ajassa kehittyviä muuttujia (esim. urbanisaation aste), jotka tulkitaan alaprosesseiksi. Tämänkin esimerkin tekninen toteutus onnistui, mikä kannustaa sekä tilastollisesti että historiallisesti kattavampaan analyysiin. Kokonaisuutena työni sekä esittelee että demonstroi Gp-lähestymistavan mahdollisuuksia aikasarjojen analysoinnissa. Erityisesti viimeinen luku kannustaa jatkokehitykseen historiallisten ilmiöiden mallintamisen uudella sovellusalalla.
Numerically Approximating Parabolic PDEs using Deep Learning

Sanders, Julia (2022)

In this thesis, we demonstrate the use of machine learning in numerically solving both linear and non-linear parabolic partial differential equations. By using deep learning, rather than more traditional, established numerical methods (for example, Monte Carlo sampling) to calculate numeric solutions to such problems, we can tackle even very high dimensional problems, potentially overcoming the curse of dimensionality. This happens when the computational complexity of a problem grows exponentially with the number of dimensions. In Chapter 1, we describe the derivation of the computational problem needed to apply the deep learning method in the case of the linear Kolmogorov PDE. We start with an introduction to a few core concepts in Stochastic Analysis, particularly Stochastic Differential Equations, and define the Kolmogorov Backward Equation. We describe how the Feynman-Kac theorem means that the solution to the linear Kolmogorov PDE is a conditional expectation, and therefore how we can turn the numerical approximation of solving such a PDE into a minimisation. Chapter 2 discusses the key ideas behind the terminology deep learning; specifically, what a neural network is and how we can apply this to solve the minimisation problem from Chapter 1. We describe the key features of a neural network, the training process, and how parameters can be learned through a gradient descent based optimisation. We summarise the numerical method in Algorithm 1. In Chapter 3, we implement a neural network and train it to solve a 100-dimensional linear Black-Scholes PDE with underlying geometric Brownian motion, and similarly with correlated Brownian motion. We also illustrate an example with a non-linear auxiliary Itô process: the Stochastic Lorenz Equation. We additionally compute a solution to the geometric Brownian motion problem in 1 dimensions, and compare the accuracy of the solution found by the neural network and that found by two other numerical methods: Monte Carlo sampling and finite differences, as well as the solution found using the implicit formula for the solution. For 2-dimensions, the solution of the geometric Brownian motion problem is compared against a solution obtained by Monte Carlo sampling, which shows that the neural network approximation falls within the 99\% confidence interval of the Monte Carlo estimate. We also investigate the impact of the frequency of re-sampling training data and the batch size on the rate of convergence of the neural network. Chapter 4 describes the derivation of the equivalent minimisation problem for solving a Kolmogorov PDE with non-linear coefficients, where we discretise the PDE in time, and derive an approximate Feynman-Kac representation on each time step. Chapter 5 demonstrates the method on an example of a non-linear Black-Scholes PDE and a Hamilton-Jacobi-Bellman equation. The numerical examples are based on the code by Beck et al. in their papers "Solving the Kolmogorov PDE by means of deep learning" and "Deep splitting method for parabolic PDEs", and are written in the Julia programming language, with use of the Flux library for Machine Learning in Julia. The code used to implement the method can be found at https://github.com/julia-sand/pde_approx
On Elliptic Qualitative Homogenization

Lohi, Heikki (2023)

Stochastic homogenization consists of qualitative and quantitative homogenization. It studies the solutions of certain elliptic partial differential equations that exhibit rapid random oscillations in some heterogeneous physical system. Our aim is to homogenize these perturbations to some regular large-scale limiting function by utilizing particular corrector functions and homogenizing matrices. This thesis mainly considers elliptic qualitative homogenization and it is based on a research article by Scott Armstrong and Tuomo Kuusi. The purpose is to elaborate the topics presented there by viewing some other notable references in the literature of stochastic homogenization written throughout the years. An effort has been made to explain further details compared to the article, especially with respect to the proofs of some important results. Hopefully, this thesis can serve as an accessible introduction to the qualitative homogenization theory. In the first chapter, we will begin by establishing some notations and preliminaries, which will be utilized in the subsequent chapters. The second chapter considers the classical case, where every random coefficient field is assumed to be periodic. We will examine the general situation later that does not require periodicity. However, the periodic case still provides useful results and strategies for the general situation. Stochastic homogenization theory involves multiple random elements and hence, it heavily applies probability theory to the theory of partial differential equations. For this reason, the third chapter assembles the most important probability aspects and results that will be needed. Especially, the ergodic theorems for R^d and Z^d will play a central part later on. The fourth chapter introduces the general case, which does not require periodicity anymore. The only assumption needed for the random coefficient fields is stationarity, that is, the probability measure P is translation invariant with respect to translations in Zd. We will state and prove important results such as the homogenization for the Dirichlet problem and the qualitative homogenization theorem for stationary random coefficient fields. In the fifth chapter, we will briefly consider another approach to qualitative homogenization. This so-called variational approach was discovered in the 1970s and 1980s, when Ennio De Giorgi and Sergio Spagnolo alongside with Gianni Dal Maso and Luciano Modica studied qualitative homogenization. We will provide a second proof for the qualitative homogenization theorem that is based on their work. An additional assumption regarding the symmetricity of the random coefficient fields is needed. The last chapter is dedicated to the large-scale regularity theory of the solutions for the uniformly elliptic equations. We will concretely see the purpose of the stationarity assumption as it turns out that it guarantees much greater regularity properties compared to non-stationary coefficient fields. The study of large-scale regularity theory is very important, especially in the quantitative side of stochastic homogenization.
On Heavy-Tailed Probability Distributions and Portfolio Diversification

Luhtala, Juuso (2023)

''Don't put all your eggs in one basket'' is a common saying that applies particularly well to investing. Thus, the concept of portfolio diversification exists and is generally accepted to be a good principle. But is it always and in every situation preferable to diversify one's investments? This Master's thesis explores this question in a restricted mathematical setting. In particular, we will examine the profit-and-loss distribution of a portfolio of investments using such probability distributions that produce extreme values more frequently than some other probability distributions. The theoretical restriction we place for this thesis is that the random variables modelling the profits and losses of individual investments are assumed to be independent and identically distributed. The results of this Master's thesis are originally from Rustam Ibragimov's article Portfolio Diversification and Value at Risk Under Thick-Tailedness (2009). The main results concern two particular cases. The first main result concerns probability distributions which produce extreme values only moderately often. In the first case, we see that the accepted wisdom of portfolio diversification is proven to make sense. The second main result concerns probability distributions which can be considered to produce extreme values extremely often. In the second case, we see that the accepted wisdom of portfolio diversification is proven to increase the overall risk of the portfolio, and therefore it is preferable to not diversify one's investments in this extreme case. In this Master's thesis we will first formally introduce and define heavy-tailed probability distributions as these probability distributions that produce extreme values much more frequently than some other probability distributions. Second, we will introduce and define particular important classes of probability distributions, most of which are heavy-tailed. Third, we will give a definition of portfolio diversification by utilizing a mathematical theory that concerns how to classify how far apart or close the components of a vector are from each other. Finally, we will use all the introduced concepts and theory to answer the question is portfolio diversification always preferable. The answer is that there are extreme situations where portfolio diversification is not preferable.
On the tail behavior of log-normal sums

Vartiainen, Pyörni (2024)

Sums of log-normally distributed random variables arise in numerous settings in the fields of finance and insurance mathematics, typically to model the value of a portfolio of assets over time. In particular, the use of the log-normal distribution in the popular Black-Scholes model allows future asset prices to exhibit heavy tails whilst still possessing finite moments, making the log-normal distribution an attractive assumption. Despite this, the distribution function of the sum of log-normal random variables cannot be expressed analytically, and has therefore been studied extensively through Monte Carlo methods and asymptotic techniques. The asymptotic behavior of log-normal sums is of especial interest to risk managers who wish to assess how a particular asset or portfolio behaves under market stress. This motivates the study of the asymptotic behavior of the left tail of a log-normal sum, particularly when the components are dependent. In this thesis, we characterize the asymptotic behavior of the left and right tail of a sum of dependent log-normal random variables under the assumption of a Gaussian copula. In the left tail, we derive exact asymptotic expressions for both the distribution function and the density of a log-normal sum. The asymptotic behavior turns out to be closely related to Markowitz mean-variance portfolio theory, which is used to derive the subset of components that contribute to the tail asymptotics of the sum. The asymptotic formulas are then used to derive expressions for expectations conditioned on log-normal sums. These formulas have direct applications in insurance and finance, particularly for the purposes of stress testing. However, we call into question the practical validity of the assumptions required for our asymptotic results, which limits their real-world applicability.
On the use of Hotelling’s T^2 statistic in conjunction with PCA

Flinck, Jens (2023)

This thesis focuses on statistical topics that proved important during a research project involving quality control in chemical forensics. This includes general observations about the goals and challenges a statistician may face when working together with a researcher. The research project involved analyzing a dataset with high dimensionality compared to the sample size in order to figure out if parts of the dataset can be considered distinct from the rest. Principal component analysis and Hotelling's T^2 statistic were used to answer this research question. Because of this the thesis introduces the ideas behind both procedures as well as the general idea behind multivariate analysis of variance. Principal component analysis is a procedure that is used to reduce the dimension of a sample. On the other hand, the Hotelling's T^2 statistic is a method for conducting multivariate hypothesis testing for a dataset consisting of one or two samples. One way of detecting outliers in a sample transformed with principal component analysis involves the use of the Hotelling's T^2 statistic. However, using both procedures together breaks the theory behind the Hotelling's T^2 statistic. Due to this the resulting information is considered more of a guideline than a hard rule for the purposes of outlier detection. To figure out how the different attributes of the transformed sample influence the number of outliers detected according to the Hotelling's T^2 statistic, the thesis includes a simulation experiment. The simulation experiment involves generating a large number of datasets. Each observation in a dataset contains the number of outliers according to the Hotelling's T^2 statistic in a sample that is generated from a specific multivariate normal distribution and transformed with principal component analysis. The attributes that are used to create the transformed samples vary between the datasets, and in some datasets the samples are instead generated from two different multivariate normal distributions. The datasets are observed and compared against each other to find out how the specific attributes affect the frequencies of different numbers of outliers in a dataset, and to see how much the datasets differ when a part of the sample is generated from a different multivariate normal distribution. The results of the experiment indicate that the only attributes that directly influence the number of outliers are the sample size and the number of principal components used in the principal component analysis. The mean number of outliers divided by the sample size is smaller than the significance level used for the outlier detection and approaches the significance level when the sample size increases, implying that the procedure is consistent and conservative. In addition, when some part of the sample is generated from a different multivariate normal distribution than the rest, the frequency of outliers can potentially increase significantly. This indicates that the number of outliers according to Hotelling's T^2 statistic in a sample transformed with principal component analysis can potentially be used to confirm that some part of the sample is distinct from the rest.

Now showing items 61-80 of 152

Browsing by master's degree program "Matematiikan ja tilastotieteen maisteriohjelma"

Yhteystiedot

HELSINGIN YLIOPISTO