Browsing by department "Matematiikan ja tilastotieteen osasto"

Now showing items 1-20 of 150

Adaptive algorithm in differential privacy : comparative analysis of pre-processing methods

Kalaja, Eero (2020)

Nowadays the amount of data collected on individuals is massive. Making this data more available to data scientists could be tremendously beneficial in a wide range of fields. Sharing data is not a trivial matter as it may expose individuals to malicious attacks. The concept of differential privacy was first introduced in the seminal work by Cynthia Dwork (2006b). It offers solutions for tackling this problem. Applying random noise to the shared statistics protects the individuals while allowing data analysts to use the data to improve predictions. Input perturbation technique is a simple version of privatizing data, which adds noise to whole data. This thesis studies an output perturbation technique, where the calculations are done with real data, but only suffcient statistics are released. With this method smaller amount of noise is required making the analysis more accurate. Yu-Xiang Wang (2018) improves the model by introducing an adaptive AdaSSP algorithm to fix the instability issues of the previously used Sufficient Statistics Perturbation (SSP) algorithm. In this thesis we will verify the results shown by Yu-Xiang Wang (2018) and look in to the pre-processing steps more carefully. Yu-Xiang Wang has used some unusual normalization methods especially regarding the sensitivity bounds. We are able show that those had little effect on the results and the AdaSSP algorithm shows its superiority over SSP algorithm also when combined with more common data standardization methods. A small adjustment for the noise levels is suggested for the algorithm to guarantee privacy conditions set by classical Gaussian Mechanism. We will combine different pre-processing mechanisms with AdaSSP algorithm and show a comparative analysis between them. The results show that Robust private linear regression by Honkela et al. (2018) makes significant improvements in predictions with half of the data sets used for testing. The combination of AdaSSP algorithm with robust private linear regression often brings us closer to non-private solutions.
Affektiiviset tekijät maatalous-metsätieteellisen tiedekunnan kurssilla Matematiikka I

Karhuvaara, Henriikka (2020)

Aiemmat tutkimukset osoittavat, että matematiikan oppimiseen vaikuttavat affektiiviset tekijät kehittyvät peruskoulun ja lukion aikana, mutta näkyvät myös korkeakouluopinnoissa. Erityisesti vähemmän matemaattisille aloille hakeutuvien opiskelijoiden aiemmat kokemukset matematiikan opiskelusta voivat olla negatiivisia, mikä voi vaikuttaa matematiikan opiskeluun esimerkiksi yliopistossa. Tutkimuksessa tarkasteltiin, miten Helsingin yliopiston maatalous-metsätieteellisen tiedekunnan kurssi Matematiikka I vaikuttaa opiskelijoiden affektivisiin kokemuksiin, kuten itsevarmuuteen, matematiikka-ahdistukseen, motivaatioon, opiskelun mielekkyyteen ja matematiikan arvostukseen. Tulosten perusteella kurssin opetusta pyritään kehittämään siten, että se ehkäisisi erityisesti negatiivisten kokemusten syntymistä, jotta opiskelijat eivät välttelisi matematiikan opiskelua ja käyttöä yliopistossa sekä tulevilla urillaan. Tutkimusaineisto kerättiin syksyn 2019 kurssilla. Affektiivisia kokemuksia käsittelevä aineisto kerättiin sekä kurssin alussa että lopussa kyselylomakkeella ja avoimilla kysymyksillä. Opiskelijoiden osaamitason vaikutuksia affektiivisiin tekijöihin selvitettiin kurssin alkutaitotestin ja loppukokeen avulla. Lisäksi seurattiin opiskelijoiden kurssin aikana tekemiä tehtäviä. Affektiivisten tekijöiden kehitystä ja kurssilla työskentelyä sekä suoriutumista seurattiin 40 opiskelijan otoksella. Tutkimuksesta selvisi, että Matematiikka I -kurssi vaikutti opiskelijoiden affektiivisiin kokemuksiin sekä positiivisesti että negatiivisesti. Opiskelijan lähtötaso oli yhteydessä siihen, miten opiskelijan itsevarmuus ja motivaatio kehittyivät kurssin aikana. Osaamisen lähtötaso vaikutti myös matematiikka-ahdistuksen kokemukseen. Opiskelun mielekkyyteen vaikutti eniten kurssin käytännönjärjestelyt. Matematiikan arvostuksen kehittymisen kannalta keskeistä oli, ymmärsivätkö opiskelijat kurssin myötä matematiikan merkityksen omalla alallaan. Tulokset osoittavat, että kurssin opetusta on järkevää kehittää siten, että se ehkäisee affektiivisten tekijöiden kehittymistä negatiiviseen suuntaan. Lähtötasoltaan kurssiin nähden heikommille opiskelijoille kannattaa järjestää riittävä mahdollisuus täydentää osaamistaan ennen kurssin alkua. Toisaalta opiskelijoiden erilainen lähtötaso tulee huomioida myös varsinaisen kurssin opetuksessa. Kurssisuunnitteluun on jatkossa varattava riittävästi aikaa. Myös viestintään on järkevää panostaa, jotta kurssin tavoitteet ja vaatimukset ovat opiskelijoille selkeitä. Kurssin suunnittelussa ja opetuksessa kannattaa mahdollisuuksien mukaan jatkossakin hyödyntää eri alojen osaajia, jotta kurssin matemaattisia sisältöjä saadaan asteittain tuotua lähemmäs opiskelijoiden omaa alaa.
Affiini geometria

Hirvi, Emilia (2019)

Tämän tutkielman aiheena on affiini geometria, jota esitellään ensimmäisessä luvussa. Aihetta lähestytään lineaarialgebran näkökulmasta. Luodakseen hyvän pohjan affiinin geometrian tarkastelulle toinen luku keskittyy lineaarialgebran perusmääritelmiin. Kolmannessa luvussa tutustutaan affiinin avaruuden käsitteeseen, jossa määritellään pisteiden ja vektoreiden välinen toiminta. Affiinissa avaruudessa suorien ja vektoreiden yhdensuuntaisuus on keskeinen asia. Toisaalta vektorin lähtöpisteellä ei ole merkitystä. Neljännessä luvussa esitellään lineaarikombinaation tapainen käsite: affiini kombinaatio eli painopiste. Affiini kombinaatio määritellään painoilla varustetulle pisteperheelle. Lisäksi painojen eli skalaarien summan on oltava yksi. Seuraavassa luvussa käsitellään affiineja aliavaruuksia. Kuten vektoriavaruuden aliavaruus sisältää kaikki virittäjävektorinsa lineaarikombinaatiot, affiini aliavaruus sisältää kaikki painoilla varustettujen pisteperheidensä affiinit kombinaatiot. Affiini aliavaruus on origosta pois siirretty aliavaruus. Kuudes luku keskittyy affiiniin riippumattomuuteen ja affiiniin kehykseen. Affiini riippumattomuus määritellään lineaarisen riippumattomuuden avulla ja affiini kehys vektoriavaruuden kannan avulla. Seitsemännessä luvussa määritellään affiini kuvaus, joka on lineaarikuvauksen ja siirtovektorin yhdistelmä. Affiinissa kuvauksessa ensin lineaarikuvaus kiertää tai venyttää pistejoukkoa ja sen jälkeen siirtovektori siirtää pistejoukon paikkaa. Affiinissa kuvauksessa yhdensuuntaiset suorat kuitenkin kuvautuvat yhdensuuntaisiksi suoriksi. Lopuksi tarkastellaan joitakin affiinin geometrian esimerkkejä.
A Generalized Form of Mercer's Theorem

Salonen, Ella (2020)

In this thesis we prove a generalized form of Mercer's theorem, and go through the underlying mathematics involved in the result. Mercer's theorem is an important result in the theory of integral equations, as it can be used as a tool in solving the trace of integral operators. With certain assumptions on a topological space X and measure space (X,dµ), the generalized theorem states that the trace of a positive and self-adjoint bounded integral operator on L^2(X,dµ) with a continuous kernel can be determined by integrating the diagonal of the kernel function. The integral operator being trace class depends then on whether the value of the integral is finite or not. We start the thesis by introducing the general settings we have for the theorem, and provide wider background for the main assumptions. We assume that X is a locally compact Hausdorff space that is σ-compact, and µ is a Radon measure on X with support equal to X. We also need the following technical assumption. Since X is σ-compact, then there exists an increasing sequence of compact subsets C_n with union equal to X. We assume that for each C_n there exists a sequence of increasingly fine partitions, compatible with the measure µ. We then go through the basics on Banach spaces, and we introduce the L^p spaces. Theory on Hilbert spaces is represented in greater detail. We introduce some classes of bounded linear operators on Hilbert spaces, including self-adjoint and positive operators. Some spectral theory is considered, first for Banach algebras in general, and then for the Banach algebra of bounded linear operators on a complex Banach space. The space of bounded linear operators on a Hilbert space can be seen as a C^*-algebra, and results for the spectrum of different kind of Hilbert space operators are given. Compact operators are first defined on Banach spaces. We prove that they form a closed, two-sided ideal in the algebra of bounded linear operators on a Banach space. We also consider compact operators on a Hilbert space, and of special interest are the Hilbert-Schmidt integral operators on the space L^2, which are proven to be compact. The existence of the canonical decomposition for compact operators is proven as this property is used in several proofs of the thesis. In the final chapter we focus on the theory of Hilbert-Schmidt operators and trace class operators on Hilbert spaces. We show that operators in these classes are compact. Considering the Hilbert-Schmidt operators on the space L^2, we prove that they then correspond to the Hilbert-Schmidt integral operators. A trace is first defined for a positive operator, and then for a trace class operator. Finally, in the last section, we construct a proof for the generalized form of Mercer's theorem. As a result, we find a way to determine the trace of an integral operator that satisfies the assumptions described in the first paragraph.
Algebralliset virhekäsitykset yläkoulun matematiikassa

Kivi, Manu (2020)

Tavoitteet. Algebran opiskelun aloitus on todettu olevan oppilaille hankalaa jo vuosikymmenten ajan. Aritmetiikasta siirtyminen algebraan on havaittu olevan oppilaille yksi suurimmista haasteista yläkoulun matematiikassa. Aiheesta on julkaistu runsaasti tutkimuksia, mutta tutkimuksissa ei ole esitetty yksiselitteistä ratkaisua tähän moniulotteiseen ongelmaan. Eräs mahdollinen tapa pienentää loikkaa aritmetiikan ja algebran välillä voi olla algebran opetuksen aikaistaminen. Tutkimuksen tarkoituksena on tuoda esille yleisimpiä algebrallisia virheitä ja virhekäsityksiä, joita yläkoulun oppilaat tekevät laskiessaan algebran tehtäviä. Tavoitteena on löytää syitä virheellisiin ratkaisuihin, jotta niihin voitaisiin kiinnittää huomiota opetuksessa, ja näin parantaa algebran opetusta sekä oppimista. Menetelmät. Tutkimukseen osallistui 65 yläkoulun oppilasta: 7. vuosiluokan oppilaita osallistui tutkimukseen 20, 8. vuosiluokan oppilaita 23 ja 9. vuosiluokan oppilaita 22. Kaikki oppilaat opiskelivat samassa koulussa. Tutkimusaineisto kerättiin loppukeväästä 2019. Aineisto koostui oppilaiden kurssikokeiden koevastauksista. Aineistoa analysoitiin kvantitatiivisesti sekä kvalitatiivisesti. Pääpaino analyysissä oli kvalitatiivinen puoli Tulokset ja johtopäätökset. Aineistossa ilmeni runsaasti virheellisiä algebrallisia ajatusmalleja sekä virheratkaisuja algebrallisilla yhtälöillä operoidessa jokaisella vuosiluokalla. Samankaltaiset virheet toistuivat oppilaiden koevastauksissa vuosiluokasta riippumatta. Eniten virheitä oppilailla ilmeni, kun tehtävässä oli negatiivisia lukuja tai sulkulausekkeita. Nämä virheet kuvastavat oppilaiden heikkoa ymmärrystä siitä, mitä sulkulausekkeet ja negatiivisuus kuvaavat. Tutkimuksen perusteella panostaminen 7. vuosiluokan matematiikan aloitukseen voi helpottaa oppilaiden siirtymistä aritmetiikasta algebraan.
Algebran peruslause

Tiihonen, Leena (2020)

Tämä tutkielma käsittelee algebran peruslausetta historiallisesta näkökulmasta. Työssä todistetaan algebran peruslause Lindelöfin oppikirjassa esitettyyn todistukseen tukeutuen. Lindelöfin esittämä algebran peruslauseen todistus perustuu Argandin todistuksen ideaan. Teoriapohjaksi tarvitaan tietoa kompleksianalyysista. Algebran peruslauseen paikkansapitävyys osoitetaan kompleksilukujen joukossa. Algebran peruslauseen todistuksessa todistaan neljä lausetta. Polynomifunktion modulin jatkuvuus osoitetaan ensin. Toisessa lauseessa osoitetaan, että polynomifunktion moduli saa sopivasti valitun ympyrän sisäpuolella ainakin yhdessä pisteessä pienemmän arvon kuin ympyrän ulkopuolella tai sen kehällä. Kolmannessa lauseessa osoitetaan, että ympyrän sisäpuolella on sellainen piste, jossa polynomifunktion moduli saa arvojensa infimumin. Lauseen todistuksessa osoitetaan polynomifunktion modulin saamien arvojen infimum rajatulle alueelle. Polynomifunktion infimumin sijaintia haarukoidaan jakamalla rajattu alue äärellisen moneen osajoukkoon, joista ainakin yhteen modulin saamien arvojen infimum voidaan paikallistaa. Sen osajoukon kohdalla jakoa toistamalla saadaan haluttu modulin infimum paikallistetuksi yhä pienemmälle alueelle. Neljännen lauseen avulla saadaan polynomifunktion modulin infimumille tarkka arvo. Todistuksessa käytetään napakoordinaattiesitystä. Polynomifunktion moduli saa infimumin ainakin yhdessä ympyrän sisäpuolella olevassa pisteessä ja siinä pisteessä infimumin arvo on nolla.
Ämnesövergripande studiehelhet i Biologisk Matematik för gymnasiestuderande inom gymnasiets läroplan 2019

Kellokoski, Sanna (2020)

Temat för denna pro gradu-avhandling är skapandet av en ämnesövergripande studiehelhet i biologisk matematik för gymnasiestuderande, som avlägger sitt sista studieår i gymnasiet. Stu- diehelheten i biologisk matematik är en tillämpad studiehelhet bestående av två en studiepoäng moduler, en i matematik och en i biologi, och studiehelheten utgör en del av den skolvisa läroplanen i ett av Helsingfors svenskspråkiga gymnasier. I arbetet presenteras först pedagogiska bakgrunden som utgör basen till studiehelhetens uppbyggnad, den ämnesövergripande inlärningen och helhetsskapande undervisningen. Pedagogiskt sett är studiehelhetens mening att väcka intresse hos studerande, motivera dem till matematiksa studier och förstärka deras matematiska självkänsla. För att kunna väcka intresse och motivera bör först självkänslan vara i skick, när den matematiska självkänslan finns kan man sträva mot det andra. Arbetets padagogiska ledstomme bygger runt L.Dee Flinks taxonomi om meningsfull inlärning och på intresseutvecklingens faser av Hidi& Renninger. Studiehelhetens läroplan finns med i arbetet och den är planerad så att gymnasiets nya läroplans grunder stöder varandra. Arbetets huvudstomme utgörs av kapitlet där studiehelhetens innehåll presenteras, utfrån de studerandes bakgrunsdkunskaper i både matematik och biologi. Inom matematiken är nivå inriktningarna lång och kort matematik tagits i beaktande så att de studerande kan välja studiehel- heten oberende av vilken inriktning de valt i matematiken, speciellt för att många av de som läst kort matematik är de som i framtiden kommer att behöva kunna tillämppa matematiksa modeller i olika sammanhang. Planen tar i beaktande tre stora temaområden runt vilka matematiska modulen är formad, dessa är populationsekologi, miljöekologi och genetik. Innehållet i studiehelheten utgör en stomme runt de biologiska delarna och i detta arbete är koncentrationen på den matematiska delen, biologin finns med i planen i den mån som den behövs som motivering av matematiken. Avhandlingen avslutas med studiehelhetens bedömning, arbetsmetoder och en reflektion över potentiella alternativ till utveckling och utvidgning av studiehelheten i framtiden.
An Application of Snow Cover Detection with Sentinel-2 Using Bayesian Networks

Simsek, Burak (2020)

In this study, a classiﬁcation scheme is implemented to obtain high resolution snow cover information from Sentinel-2 data using a very simple Bayesian Network (Naive-Bayes) that is trained with ground snow measurement data. Performance comparison of using Bayesian/non-Bayesian Naive-Bayes, diﬀerent feature sets and diﬀerent discretization methods is conducted. Results show that Bayesian NB performs the best with up to 0.88 classiﬁcation accuracy for snow/no-snow classiﬁcation. Use of most relevant spectral bands rather than all available bands provided improvement in some cases but also performed slighty worse in some, hence not giving a clear answer. However, eﬀect of discretization method was clear, chimerge performed better than equal width binning but it was much slower to a point that it was not practical to discretisize a full Sentinel-2 image’s pixels.
Anisotropic diﬀusion in image processing

Sariola, Tomi (2019)

Sometimes digital images may suﬀer from considerable noisiness. Of course, we would like to obtain the original noiseless image. However, this may not be even possible. In this thesis we utilize diﬀusion equations, particularly anisotropic diﬀusion, to reduce the noise level of the image. Applying these kinds of methods is a trade-oﬀ between retaining information and the noise level. Diﬀusion equations may reduce the noise level, but they also may blur the edges and thus information is lost. We discuss the mathematics and theoretical results behind the diﬀusion equations. We start with continuous equations and build towards discrete equations as digital images are fully discrete. The main focus is on iterative method, that is, we diﬀuse the image step by step. As it occurs, we need certain assumptions for these equations to produce good results, one of which is a timestep restriction and the other is a correct choice of a diﬀusivity function. We construct an anisotropic diﬀusion algorithm to denoise images and compare it to other diﬀusion equations. We discuss the edge-enhancing property, the noise removal properties and the convergence of the anisotropic diﬀusion. Results on test images show that the anisotropic diﬀusion is capable of reducing the noise level of the image while retaining the edges of image and as mentioned, anisotropic diﬀusion may even sharpen the edges of the image
Bayesian hidden Markov model for overdiagnosis in colorectal cancer screening

Nevala, Aapeli (2020)

Thanks to modern medical advances, humans have developed tools for detecting diseases so early, that a patient would be better oﬀ had the disease gone undetected. This is called overdiagnosis. Overdiagnosisisaproblemespeciallycommoninacts,wherethetargetpopulationofanintervention consists of mostly healthy people. Colorectal cancer (CRC) is a relatively rare disease. Thus screening for CRC aﬀects mostly cancerfree population. In this thesis I evaluate overdiagnosis in guaiac faecal occult blood test (gFOBT) based CRC screening programme. In gFOBT CRC screening there are two goals: to detect known predecessors of cancers called adenomas and to remove them (cancer prevention), and to detect malign CRCs early enough to be still treatable (early detection). Overdiagnosis can happen when detecting adenomas, but also when detecting cancers. This thesis focuses on overdiagnosis due to detection of adenomas that are non-progressive in their nature. Since there is no clinical means to make distinction between progressive and non-progressive adenomas, statistical methods must be applied. Classical methods to estimate overdiagnosis fail in quantifying this type of overdiagnosis for couple of reasons: incidence data of adenomas is not available, and adenoma removal results in lowering cancer incidence in screened population. While the latter is a desired eﬀect of screening, it makes it impossible to estimate overdiagnosis by just comparing cancer incidences among screened and control populations. In this thesis a Bayesian Hidden Markov model using HMC NUTS algorithm via software Stan is ﬁtted to simulate the natural progression of colorectal cancer. The ﬁve states included in the model were healthy (1), progressive adenoma (2), screen-detectable CRC (3), clinically apparent CRC (4) and non-progressive adenoma (5). Possible transitions are from 1 to 2, 1 to 5, 2 to 3 and 3 to 4. The possible observations are screen-negative (1), detected adenoma (2), screen-detected CRC (3), clinically manifested CRC (3). Three relevant estimands for evaluating this type of overdiagnosis with a natural history model are presented. Then the methods are applied to estimate overdiagnosis proportion in guaiac faecal occult blood test (gFOBT) based CRC screening programme conducted in Finland between 2004 and 2016. The resulting mean overdiagnosis probability for all the patients that had an adenoma detected for programme is 0.48 (0.38, 0.56, 95-percent credible interval). Diﬀerent estimates for overdiagnosis in sex and age-speciﬁc stratas of the screened population are also provided. In addition to these ﬁndings, the natural history model can be used to gain more insight about natural progression of colorectal cancer.
Bayesian hierarchical models for housing prices in the Helsinki-Espoo-Vantaa region

Mäkinen, Ville (2020)

Objectives: The objective of this thesis is to illustrate the advantages of Bayesian hierarchical models in housing price modeling. Methods: Five Bayesian regression models are estimated for the housing prices. The models use a robust Student’s t-distribution likelihood and are estimated with Hamiltonian Monte Carlo. Four of the models are hierarchical such that the apartments’ neighborhoods are used as a grouping. Model stacking is also used to produce an ensemble model. Model checks are conducted using the posterior predictive distributions. The predictive distributions are also evaluated in terms of calibration and sharpness and using the logarithmic score with leave-one-out cross validation. The logarithmic scores are calculated using Pareto smoothed importance sampling. The R^2-statistics from the point predictions averaged from the predictive distributions are also presented. Results: The results from the models are broadly reasonable as, for the most part, the coefficients of the explanatory variables and the predictive distributions behave as expected. The results are also consistent with the existence of a submarket in central Helsinki where the price mechanism differs markedly from the rest of the Helsinki-Espoo-Vantaa region. However, model checks indicate that none of the models is well-calibrated. Additionally, the models tend to underpredict the prices of expensive apartments.
Bayesian Optimization and Classification in Likelihood-Free Inference

Kokko, Jan (2019)

In this thesis we present a new likelihood-free inference method for simulator-based models. A simulator-based model is a stochastic mechanism that specifies how data are generated. Simulator-based models can be as complex as needed, but they must allow exact sampling. One common difficulty with simulator-based models is that learning model parameters from observed data is generally challenging, because the likelihood function is typically intractable. Thus, traditional likelihood-based Bayesian inference is not applicable. Several likelihood-free inference methods have been developed to perform inference when a likelihood function is not available. One popular approach is approximate Bayesian computation (ABC), which relies on the fundamental principle of identifying parameter values for which summary statistics of simulated data are close to those of observed data. However, traditional ABC methods tend have high computational cost. The cost is largely due to the need to repeatedly simulate data sets, and the absence of knowledge of how to specify the discrepancy between the simulated and observed data. We consider speeding up the earlier method likelihood-free inference by ratio estimation (LFIRE) by replacing the computationally intensive grid evaluation with Bayesian optimization. The earlier method is an alternative to ABC that relies on transforming the original likelihood-free inference problem into a classification problem that can be solved using machine learning. This method is able to overcome two traditional difficulties with ABC: it avoids using a threshold value that controls the trade-off between computational and statistical efficiency, and combats the curse of dimensionality by offering an automatic selection of relevant summary statistics when using a large number of candidates. Finally, we measure the computational and statistical efficiency of the new method by applying it to three different real-world time series models with intractable likelihood functions. We demonstrate that the proposed method can reduce the computational cost by some orders of magnitude while the statistical efficiency remains comparable to the earlier method.
Bayesian optimized likelihood-free inference on genetic data

Sipola, Aleksi (2020)

Most of the standard statistical inference methods rely on the evaluating so called likelihood functions. But in some cases the phenomenon of interest is too complex or the relevant data inapplicable and as a result the likelihood function cannot be evaluated. Such a situation blocks frequentist methods based on e.g. maximum likelihood estimation and Bayesian inference based on estimating posterior probabilities. Often still, the phenomenon of interest can be modeled with a generative model that describes supposed underlying processes and variables of interest. In such scenarios, likelihood-free inference, such as Approximate Bayesian Computation (ABC), can provide an option for overcoming the roadblock. Creating a simulator that implements such a generative model provides a way to explore the parameter space and approximate the likelihood function based on similarity between real world data and the data simulated with various parameter values. ABC provides well defined and studied framework for carrying out such simulation-based inference with Bayesian approach. ABC has been found useful for example in ecology, finance and astronomy, in situations where likelihood function is not practically computable but models and simulators for generating simulated data are available. One such problem is the estimation of recombination rates of bacterial populations from genetic data, which often is unsuitable for typical statistical methods due to infeasibly massive modeling and computation requirements. Overcoming these hindrances should provide valuable insight into evolution of bacteria and possibly aid in tackling significant challenges such as antimicrobial resistance. Still, ABC inference is not without its limitations either. Often considerable effort in defining distance functions, summary statistics and threshold for similarity is required to make the comparison mechanism successful. High computational costs can also be a hindrance in ABC inference; As increasingly complex phenomena and thus models are studied, the computations that are needed for sufficient exploration of parameter space with the simulation-comparison cycles can get too time- and resource-consuming. Thus efforts have been made to improve the efficiency of ABC inference. One improvement here has been the Bayesian Optimization for Likelihood-Free Inference algorithm (BOLFI), which provides efficient method to optimize the exploration of parameter space, reducing the amount of needed simulation-comparison cycles by up to several magnitudes. This thesis aims to describe some of the theoretical and applied aspects of the complete likelihood-free inference pipelines using both Rejection ABC and BOLFI methods. The thesis presents also use case where the neutral evolution recombination rate in Streptococcus pneumoniae population is inferred from well-studied real world genome data set. This inference task is used to provide context and concrete examples for the theoretical aspects, and demonstrations for numerous applied aspects. The implementations, experiments and acquired results are also discussed in some detail.
Bayesiläinen päättely matkalippujen dynaamisessa hinnoittelussa

Palomäki, Matti (2020)

Tutkielma kartoittaa bayesiläisen lähestymistavan soveltamista joukkoliikenteen lipunmyynnin dynaamiseen hinnoitteluun. Dynaamisessa hinnoittelussa muutetaan tuotteen hintaa taajaan pyrkimyksenä hiljaisempina aikoina houkutella lisää asiakkaita alemmilla hinnoilla ja hyödyntää suuremman kysynnän jaksot nostamalla hintoja. Tutkittavassa tilanteessa pyritään maksimoimaan pitkän matkan linja-autovuoron lipunmyynnistä syntyvä liikevaihto. Oletetaan lippujen myynnin odotusarvon määräytyvän jonkin kysyntäfunktion perusteella, ja että kysyntä riippuu myyntihinnasta sekä muista muuttujista. Hinnoittelija valitsee kullekin myyntijaksolle hinnan, joka tuottaa jonkin myyntitulon, ja tavoite on siis tietylle linja-autolähdölle maksimoida sen lipunmyynnin kokonaisliikevaihdon odotusarvo yli myyntijaksojen etsimällä parhaat hinnat. Hinnoittelussa esitetään noudatettavaksi seuraavaa lähestymistapaa. Oletetaan kysynnän noudattavan log-lineaarista mallia. Käytetään sen parametreille priorijakaumaa, jonka hyperparametreille lasketaan suurimman uskottavuuden estimaatit aiemmin kerätyn datan perusteella. Kussakin yksittäisessä hinnoittelujaksossa kysyntäfunktion parametreille muodostetaan myyntikauden edellisten hinnoittelujaksojen toteutuneiden myyntien perusteella uskottavuus. Sitten parametrien uskottavuuden ja priorijakauman perusteella muodostetaan parametrien posteriorijakauma, jota arvioidaan Markovin ketju -- Monte Carlo -menetelmin. Viimein posteriorijakauman antamaa tietoa kysynnästä käytetään myyntijakson hinnan optimoimiseen varmuutta vastaavan hinnoittelustrategian mukaisesti, eli olettaen parametriestimaatit virheettömiksi.
Binomi- ja multinomijakaumien luottamusjoukkojen ominaisuuksista

Matilainen, Oskari (2020)

Tässä pro gradu -tutkielmassa käsitellään binomijakauman luottamusjoukkojen analysointimenetelmiä laajentaen niitä multinomijakauman luottamusjoukkojen tarkasteluun. Tutkielman tarkoituksena on vertailla valikoituja binomi- ja multinomijakaumien luottamusjoukkoja sekä binomijakauman luottamusjoukkojen vertailukriteereitä yleistäen niitä multinomijakauman luottamusjoukoille soveltuvin osin. Luottamusjoukkojen määrittelyssä on käytetty frekventististä päättelyä. Vertailuun valikoitujen vakiintuneiden binomijakauman luottamusjoukkojen lisäksi tyossä määritellään kaksi muuta luottamusjoukkoa. Näitä luottamusjoukkoja vertaillaan kahdeksan esitellyn vertailukriteerin perusteella. Luottamusjoukkojen tutkimisessa erityisesti peittotodennäköisyys osoittautuu hyödylliseksi menetelmäksi. Multinomijakauman luottamusjoukkoja esitellään kolme yleisesti käytössä olevaa sekä yksi vertailuun kehitetty luottamusjoukko. Multinomijakauman luottamusjoukoille yleistetään peittotodennäköisyys, jonka avulla luottamusjoukkoja analysoidaan. Esiteltyjä luottamusjoukkoja vertaillaan yhden yleistetyn kriteerin avulla. Tuloksina käydään läpi esitellyt luottamusjoukot, sekä arvioidaan niiden soveltuvuutta erilaisiin tutkimustilanteisiin pienillä havaintomäärillä. Luottamusjoukkojen peittotodennäköisyyden avulla joukkojen erilaiset ominaisuudet erottuvat selkeästi. Arvioidut vertailukriteerit yleistyvät multinomijakauman luottamusjoukoille pääosin hyvin.
Brouwer's Theorem on the Invariance of Domain

Hallamaa, Luukas (2019)

The purpose of this thesis is to present some dimension theory of separable metric spaces, and with the theory developed, prove Brouwer’s Theorem on the Invariance of Domain. This theorem states, that if we embed a subset of the n-dimensional Euclidean space into the aforementioned space, this embedding is an open map. We begin by revising some elementary theory of point-set topology, that should be familiar to any graduate student in mathematics. Drawing from these rudiments, we move on to the concept of dimension. The dimension theory presented is based on the notion of the small inductive dimension. We define this dimension function for regular spaces and state and prove various results that hold for this function. Although this dimension function is defined on regular spaces, we mainly focus on separable metric spaces. Among other things, we prove that the small inductive dimension of the Euclidean n-space is exactly n. This proof makes use of the famous Brouwer Fixed-Point Theorem, which we naturally also prove. We give a combinatorial proof of the Fixed-Point Theorem, which relies on Sperner’s lemma. We move on to develop some theory regarding the extensions of functions. These various results on extensions allow us to finally prove the theorem that lent its name to this thesis: Brouwer’s Theorem on the Invariance of Domain.
CAP-malli Hilbertin avaruuksissa

Savolainen, Valtteri (2019)

Tässä työssä esitellään klassinen matemaattisen rahoitusteorian aihe, Capital Asset Pricing-malli. Mallista on useita versiota, tässä työssä käsitellään Hilbertin avaruuksiin perustuvaa versiota. Yleisesti, erityisesti taloustieteen puolella, CAP-mallilla tarkoitetaan arvopapereiden hinnoittelumallia. Tässä työssä se kuitenkin määritellään tasapainotilaisten sijoituskohteiden tuottojen ominaisuutena. Hilbertin avaruudet ovat täydellisiä normiavaruuksia, jotka on varustettu sisätulolla. Esitellään Hilbertin avaruudet ja muita funktionaalianalyysin keskeisimpiä käsitteitä, kuten ortogonaalisuus ja Riesz-Frechetin teoreema, jotta CAP-mallin rakentaminen on mahdollista. Lisäksi esitellään arvopaperimarkkinat, sekä niiden toimijat. Tärkeimmät käsitteet ovat arvopapereiden tuotot ja hinnat, sekä toimijoiden kulutus ja utiliteetti. Termiä Capital Asset Pricing-malli käytetään silloin, kun markkinatuotto on odotusarvorintaman tuotto. Tällöin jokaiselle sijoituskohteelle voidaan johtaa arvopaperimarkkinasuoran yhtälö, joka on beta-hinnoittelun erikoistapaus. CAP-mallissa tuottoja mitataan odotusarvon avulla ja riskin mittarina toimii varianssi. Cap-mallin voimassaolo markkinoilla ei ole itsestäänselvyys ja liittyy läheisesti toimijoiden preferensseihin, sekä arvopapereiden voittojen jakaumaan. Lopuksi esitellään faktorimalli, joka on hyvin samankaltainen CAP-mallin kanssa.
Categories with Foundation

Forsman, David (2020)

We develop the theory of categories from foundations up. The thesis culminates in a theorem in which we assert that any concrete functor between categories of models of algebraic theories, where the codomain categories' alphabet does not contain relational information, has a left adjoint functor. This theorem is based on The General Adjoint Functor Theorem by Peter Freyd. The first chapter is about the set theoretic foundations of category theory. We present the needed ideas about recursion so that we may define what is meant by first order predicate logic. The first chapter ends in the exposition of the connection between the Grothendieck universes and the inaccessible cardinals. The second chapter starts our conversation about categories and functors between categories. We define properties of morphisms, subobjects, quotient objects and Cartesian closed categories. Furthermore, we talk about embedding and identification morphisms of concrete categories. Much of the third chapter is to show that the category of small categories is a Cartesian closed category. This leads us to talk about natural transformation and canonical constructions relating to functors. To define equivalences and their generalizations, adjoint functors, natural transformations are needed. The fourth chapter enlarges our knowledge about hom-functors and their adjacent functors, representable functors. The study of representable functors yields a profound lemma called Yoneda lemma. Yoneda lemma implies the fully faithfulness of Yoneda embedding. The fifth chapter concentrates to limit operations in a category, which leads us to talk about completeness. We find out how limit procedures are preserved in constructions and how they behave when functors pass them forward. The last chapter is about adjoint functors. The general and the special adjoint functor theorems, due to Peter Freyd, are proven. Using The General Adjoint Functor Theorem, we prove the existence of a left adjoint functor for all suitable forgetful functors among algebraic categories.
Cauchyn jakauma ja sen mahdollisuus lukio-opetuksessa

Rintala, Jasmiina (2020)

Tämä tutkielma käsittelee Cauchyn jakaumaa ja siitä muunneltua log-Cauchyn jakaumaa. Cauchyn jakauma on jatkuva ja todella paksuhäntäinen ja sallii siksi poikkeamia aineistossa, jonka takia se on potentiaalinen vaihtoehto erilaisten luonnonilmiöiden mallintamisessa. Käyn ensimmäisessä luvussa läpi, mikä standardi Cauhcyn jakauma on: mitä matemaattisia määritelmiä sen johtamiseen tarvitaan ja kuinka se johdetaan. Tutkielmassa todistetaan, että tällä jakaumalla ei ole olemassa odotusarvoa eikä varianssia. Puolestaan moodi ja mediaani voidaan laskea ja huomataankin, että ne ovat Cauchyn jakaumalla samat. Käsittelen lyhyesti logaritmisen Cauchyn jakauman ja johdan sen tiheys- ja kertymäfunktiot. Tämän jälkeen perehdyn sekä Cauchyn että log-Cauchyn jakaumien erilaisiin sovelluksiin. Jotta lukija saa käsityksen jakaumien käyttötarkoituksista, käyn läpi useita tutkimuksia kevyesti. Muutamassa tutkimuksessa huomataan, että Cauchyn ja log-Cauchyn jakauma sopivat kyseisiin mallinnuksiin hyvin. Viimeisessä osiossa pohdin Cauchyn jakauman mahdollisuuksia lukio-opetuksessa uusimman lukion opetussuunnitelman (2019) pohjalta. Esitän lopuksi oman ehdotukseni projektityöstä pitkän matematiikan kurssille MAA12 ja perustelen sen sopivuutta kyseiselle valinnaiselle kurssille. Tämä projektityö kehittää oppilaan laaja-alaista osaamista ja luo hyvän kokonaisuuden oppiainerajat ylittävään opetukseen.
Change point analysis of time series data using variable selection methods

Lehtonen, Toni (2020)

Streptococcus pneumoniae is considered to be one of the most common causes of pneumonia and is known to cause a significant disease burden worldwide. During the past two decades much effort has been made globally to prevent pneumococcal illnesses through the use of vaccines. In Finland, all children under the age of five have been eligible to receive pneumococcal conjugate vaccine as part of the national vaccination programme since 2010. The impact of the pneumococcal vaccination has been studied extensively in Finland, and a significant decrease in the incidence of pneumonia has been observed among all vaccine-age children. One research question not yet examined in the previous studies is the exact point of time after which the impact of vaccination can be discerned in the incidence rates. This thesis considers a novel approach to multiple change point detection for time series data, where the change point problem is expressed in the form of a regression model. The model is specified so that potential change point positions are represented as separate explanatory variables. Relevant change points are then chosen by applying several established variable selection methods to the model. Out of these methods, the lasso estimate, its Bayesian analogue and two other Gaussian scale mixture priors are considered in this work. The change point model was implemented with the selected variable selection methods for age-group specific time series of pneumonia incidence rates in Finland between 2001 and 2016 to detect any changes that could be attributed to the introduction of the vaccine. These datasets were produced from routinely generated hospital discharge records, the operationalization of which is also discussed in the thesis. Aside from the vaccinated age group of under five year olds, data for both 25-44 year olds and over 65 year olds were also considered to inspect possible indirect effects of the vaccination. The implementations with different variable selection methods all provided very similar results for each age group. For under five year olds a change point during spring 2011 was selected, while for the over 65 year olds none were chosen during or after the introduction of the vaccine. For 25-44 year olds multiple change points between 2009 and 2014 were selected, but whether any of these could be attributed to the vaccination remains an open question.

Now showing items 1-20 of 150

Browsing by department "Matematiikan ja tilastotieteen osasto"

Yhteystiedot

HELSINGIN YLIOPISTO