Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "logistic regression"

Sort by: Order: Results:

  • Pohjonen, Joona (2020)
    Prediction of the pathological T-stage (pT) in men undergoing radical prostatectomy (RP) is crucial for disease management as curative treatment is most likely when prostate cancer (PCa) is organ-confined (OC). Although multiparametric magnetic resonance imaging (MRI) has been shown to predict pT findings and the risk of biochemical recurrence (BCR), none of the currently used nomograms allow the inclusion of MRI variables. This study aims to assess the possible added benefit of MRI when compared to the Memorial Sloan Kettering, Partin table and CAPRA nomograms and a model built from available preoperative clinical variables. Logistic regression is used to assess the added benefit of MRI in the prediction of non-OC disease and Kaplan-Meier survival curves and Cox proportional hazards in the prediction of BCR. For the prediction of non-OC disease, all models with the MRI variables had significantly higher discrimination and net benefit than the models without the MRI variables. For the prediction of BCR, MRI prediction of non-OC disease separated the high-risk group of all nomograms into two groups with significantly different survival curves but in the Cox proportional hazards models the variable was not significantly associated with BCR. Based on the results, it can be concluded that MRI does offer added value to predicting non-OC disease and BCR, although the results for BCR are not as clear as for non-OC disease.
  • Preussner, Annina (2021)
    The Y chromosome has an essential role in the genetic sex determination in humans and other mammals. It contains a male-specific region (MSY) which escapes recombination and is inherited exclusively through the male line. The genetic variations inherited together on the MSY can be used in classifying Y chromosomes into haplogroups. Y-chromosomal haplogroups are highly informative of genetic ancestry, thus Y chromosomes have been widely used in tracing human population history. However, given the peculiar biology and analytical challenges specific to the Y chromosome, the chromosome is routinely excluded from genetic association studies. Consequently, potential impacts of Y-chromosomal variation on complex disease remain largely uncharacterized. Lately the access to large-scale biobank data has enabled to extend the Y-chromosomal genetic association studies. A recent UK Biobank study suggested links between Y-chromosomal haplogroup I1 and coronary artery disease (CAD) in the British population, but this result has not been validated in other datasets. Since Finland harbours a notable frequency of Y-chromosomal haplogroup I1, the relationship between haplogroup I1 and CAD can further be inferred in the Finnish population using data from the FinnGen project. The first aim of this thesis was to determine the prevalence of Y-chromosomal haplogroups in Finland and characterize their geographical distributions using genotyping array data from the FinnGen project. The second aim was to assess the role between Finnish Y-chromosomal haplogroups and coronary artery disease (CAD) by logistic regression. This thesis characterized the Y-chromosomal haplogroups in Finland for 24 160 males and evaluated the association between Y-chromosomal haplogroups and CAD in Finland. The dataset used in this study was extensive, providing an opportunity to study the Y-chromosomal variation geographically in Finland and its role in complex disease more accurately compared to previous studies. The geographical distribution of the Y-chromosomal haplogroups was characterized on 20 birth regions, and between eastern and western areas of Finland. Consistent with previous studies, the results demonstrated that two major Finnish Y-chromosomal haplogroup lineages, N1c1 and I1, displayed differing distributions within regions, especially between eastern and western Finland. Results from logistic regression analysis between CAD and Y-chromosomal haplogroups suggested no significant association between haplogroup I1 and CAD. Instead, the major Finnish Y-chromosomal haplogroup N1c1 displayed a decreased risk for CAD in the association analysis when compared against other haplogroups. Moreover, this thesis also demonstrated that the association results were not straightforwardly comparable between populations. For instance, haplogroup I1 displayed a decreased risk for CAD in the FinnGen dataset when compared against haplogroup R1b, whereas the same association was reported as risk increasing for CAD in the UK Biobank. Overall, this thesis demonstrates the possibility to study the genetics of Y chromosome using data from the FinnGen project, and highlights the value of including this part of the genome in the future complex disease studies.
  • Leinonen, Helmi (2023)
    Discussion around climate crisis and companies’ role in its mitigation has been accelerating especially in the past few years. Companies are in a crucial role if the targets set in the Paris Agreement are wished to be fulfilled. Companies have also noted the importance of the topic. Corporate environmental responsibility and sustainability themes have gotten a firm foothold in corporate world and companies can control them by utilizing different corporate governance mechanisms. This thesis aims to examine the importance of corporate governance and sustainability management in companies. Purpose is to study whether there is a link between the level of companies’ climate maturity and different corporate governance mechanisms that are used to manage companies’ sustainability. In addition, this thesis examines if there are differences in the results depending on size, industry, or country where companies are headquartered. Scope of this thesis is corporate environmental responsibility and climate sustainability in the context of greenhouse gas emissions. Companies are divided into two groups based on their climate maturity which is determined by whether they have set science-based emission reduction targets validated by the Science Based Targets initiative. Analysis is conducted with statistical analysis, logistic regression and is carried out with Stata. Data is originally from a corporate study and consists of 46 medium and large-sized Nordic companies from various industries. Sustainability criteria in management’s incentive plans and in companies’ investment decisions had a positive and significant link to companies’ climate maturity. Chief Sustainability Officer and board-level sustainability committee were insignificant in the model. Larger companies were more connected to climate maturity most likely because they have more resources to develop their sustainability and corporate environmental responsibility. In addition, larger companies are often obligated to disclose their sustainability performance and face pressure from the public to decrease their negative effects which can encourage them to set more advanced targets. It seems that the most effective measures are mechanisms with concrete criteria, compared to the more symbolic measures with no direct effect. Companies should focus on creating actions with impactful measures that create change in their organizations whereas policy makers should aim to create regulation directing companies towards these measures. Scientific research can help by providing knowledge of the most impactful corporate governance mechanisms. Sample size was relatively small, which prevents from making highly generalized conclusions. With a larger dataset, companies’ maturity could have been determined on a wider scale, different analysis methods could have been used and sustainability could have been considered in a more comprehensive perspective.
  • Junna, Liina (2017)
    Self-rated health (SRH) is a frequently used survey indicator of general health. It is periodically utilised in the study of educational health disparities. Several researchers have, however, suggested that systematic population sub group differences in health self-ratings (reporting heterogeneity) may results in SRH reflecting a different health status, or aspects of health, for different educational groups. Previous studies imply that the associations between SRH and other indicators of health may be strengthened by higher education. However, the studies disagree on the strength and the scope of the interaction effect. Comparability is also an issue due to, for example, the variation in the selected health indicators by which SRH is assessed. No such studies have so far been conducted in Norther Europe. The purpose of this Master’s thesis is to address educational SRH reporting heterogeneity. Using quantitative methods, this thesis analyses which aspects of health are included in dichotomised poor or very poor SRH ratings, and whether education moderates the relationship between SRH and the indicators of health. The selected health indicators represent five health dimensions identified in previous studies: clinical health, functional health, health behaviours, mental health and bodily symptoms and experiences. The analyses are conducted using logistic regression and regression –based nonlinear decomposition methods. The study utilises the Health 2000 data (n= 5586) for the household and institution dwelling population over the age of 30 residing in mainland Finland. The data is nationally representative and consists of a clinical- and mental health examination, and survey sections. Overall, a high volume of somatic complaints was found strongly associated with poor self-rated health for all educational groups. Other significant contributors were functional health, diagnosed mental health conditions, and to some extent diagnosed diseases. An educational interaction effect was found for cardiovascular disease, subjective functional limitations in everyday tasks, and high volume of somatic complaints. In all cases education strengthened the association. However, for the majority of the indicators, SRH was associated with, no interaction effect was found. Compared to those respondents with a higher education, those with lower educational attainments more often reported poor SRH, but the selected health indicators and demographic variables explained virtually the whole difference. The study then, to some extent, concurs with earlier findings of higher education strengthening some of the associating between poor SRH and other indicators of health. However, the effect was statistically significant only when comparing basic education to higher educational attainments, and it was less systematic than some of the previous studies have suggested.
  • Kuronen, Juri (2017)
    This Master’s thesis introduces a new score-based method for learning the structure of a pairwise Markov network without imposing the assumption of chordality on the underlying graph structure by approximating the joint probability distribution using the popular pseudo-likelihood framework. Together with the local Markov property associated with the Markov network, the joint probability distribution is decomposed into node-wise conditional distributions involving only a tiny subset of variables each, getting rid of the problematic intractable normalizing constant. These conditional distributions can be naturally modeled using logistic regression, giving rise to pseudo-likelihood maximization with logistic regression (plmLR) which is designed to be especially well-suited for capturing pairwise interactions by restricting the explanatory variables to main effects (no interaction terms). To deal with overfitting, plmLR is regularized using an extended variant of the Bayesian information criterion. To select the best model out of the vast discrete model space of network structures, a dynamic greedy hill-climbing search algorithm can be readily implemented with the pseudo-likelihood framework where each Markov blanket is learned separately so that the full graph can be composed from the solutions to these subproblems. This work also presents a novel improvement to the algorithm by drastically reducing the search space associated with each node-wise hill-climbing run by first running a set of pairwise queries to isolate only the promising candidates. In experiments on data sets sampled from synthetic pairwise Markov networks, plmLR performs favorably against competing methods with respect to the Hamming distance between the learned and true network structure. Additionally, unlike most logistic regression based methods, plmLR is not limited to binary variables and performs well on learning benchmark network structures based on real-world non-binary models even though plmLR is not designed for their structural form.
  • Lyytikäinen, Minna (2013)
    Climate change and following extreme weather patterns can increase forest damages caused by pest insects especially in higher latitudes. The number, density and intensity of damages by pest insects already have increased because of the changing conditions. Pest insects can e.g. cause reduced tree growth and even tree death. Defoliation by the Common Pine Sawfly (Diprion pini L.) causes severe growth losses and tree mortality of Scots Pine (Pinus sylvestris L.). D. pini has caused damages in Finland over 500 000 hectares between years 1997–2001. The field work was carried out in Palokangas area, Ilomantsi, eastern Finland in years 2002–2010. Stand- and tree-wise characteristics were measured on 11 plots. Tree-wise defoliation with 10% accuracy and amount of D. pini cocoons and fallen shoots of P. sylvestris were estimated annually. In addition, radial tree growths were measured from total of n trees in 2010. The aim this study was to estimate the effect of the natural enemies on population densities of D. pini. The aim was also to estimate the effect of the defoliation caused by D. pini on tree growth. In addition, the aim was also to estimate the consequence of a beetle attack by pith borers (Tomicus spp.) to the defoliation. Effect of natural enemies as regulative factors was estimated from D. pini cocoons. Natural enemies were divided into birds, small mammals and to insect families of Ichneumonidae, Chalcidoidea, Tachinidae, Elateridae and Carabidae. Consequence of beetle attack was assessed from fallen shoots. Tree growth simulation was used to estimate economic losses. Growth losses were estimated from drill chip sample. Logistic regression was used to explicate tree-wise defoliation with tree- and stand-wise variables. Two different classification schemes with threshold values of 20% (class 1) and 30% (class 2) of defoliation were used in regression. The major regulative factor was Ichneumonid parasites (22%) and the second powerful regulative factor was small mammals (21%). Relative proportion of natural enemies increased along the research period as defoliation percentages decreased. Consequence of beetle attack was most violent in 2004 (17 shoots/ m²). Plot-wise defoliation level varied significantly between the years and the plots. The mean defoliation level was 37% in 2002 and 22% in 2010. The most substantial defoliation was in plot 9 in 2005, over 99%. Simulated economic losses were perceptible only on plots 9 and 16; 2785 € and 1623 € per hectare, respectively. Defoliation by D. pini caused growth losses for radial growth in different defoliation classes. The mean growth loss of severe damaged trees (70–100% of defoliation) was approximately 65% and of trees with low defoliation level (0–10% of defoliation) 40%. Classification accuracy of logistic regression for class 1 was 92.4% with kappa value of 0.81 and 94.2% and 0.84 for class 2, respectively. The results of this study showed that control of natural enemies effected on D. pini density. Population density of D. pini affected the defoliation level; when population density was low the defoliation was milder. Peak sawfly densities can affect tree growth during outbreaks. Consequence beetle attack by the pith borers was only slight and delayed.
  • Kukkonen, Tommi (2020)
    The Arctic is warming with an increased pace, and it can affect ecosystems, infrastructure and communities. By studying periglacial landforms and processes, and using improved methods, more knowledge on these changing environmental conditions and their impacts can be obtained. The aim of this thesis is to map studied landforms and predict their probability of occurrence in the circumpolar region utilizing different modelling methods. Periglacial environments occur in high latitudes and other cold regions. These environments host permafrost, which is frozen ground and responds effectively to climate warming, and underlays areas that host many landform types. Therefore, landform monitoring and modelling in permafrost regions under changing climate can provide information about the ongoing changes in the Arctic and landform distributions. Here four landform/process types were mapped and studied: patterned ground, pingos, thermokarst activity and solifluction. The study consisted of 10 study areas across the circumpolar Arctic that were mapped for their landforms. The study utilized GLM, GAM and GBM analyses in determining landform occurrences in the Arctic based on environmental variables. Model calibration utilized logit link function, and evaluation explained the deviance value. Data was sampled to evaluation and calibration sets to assess prediction abilities. The predictive accuracy of the models was assessed using ROC/AUC values. Thermokarst activity proved to be most abundant in studied areas, whereas solifluction activity was most scarce. Pingos were discovered evenly throughout studied areas, and patterned ground activity was absent in some areas but rich in others. Climate variables and mean annual ground temperature had the biggest influence in explaining landform occurrence throughout the circumpolar region. GBM proved to be the most accurate and had the best predictive performance. The results show that mapping and modelling in mesoscale is possible, and in the future, similar studies could be utilized in monitoring efforts regarding global change and in studying environmental and periglacial landform/process interactions.
  • Salo, Tuukka (2016)
    The purpose of the act on the financing of sustainable forestry (Kemera-law) is to advance economically, ecologically and socially sustainable silviculture and use of the forests. A private forest owner may receive financial support from the State for forest management, forest improvement work and for nature management. The purpose of this thesis was to find out the factors affecting the private forest owners’ participation in the Kemera cost sharing program and are there differences between forest owners’ objectives in forest ownership and opinions about Kemera-subsidies depending on the participation in the cost sharing program. The data used in this thesis is from a survey that was implemented in the spring of 2016 as a part of a project in Tapio Oy. Also additional information from The Finnish Forest Centre was used in the regression analysis. The factors affecting the use of Kemera-subsidy was analyzed with logistic regression. The differences in the forest ownership objectives and in the opinions about the Kemera-subsidy depending on the participation to the Kemera cost sharing program were determined by descriptive analysis. With the used factors, the regression analysis did not succeed in making a model that would successfully predict the participation to the cost sharing program. However, the results implied that the factors positively affecting the participation to the cost sharing program were forested area owned, forest owners’ self-determined activity and use of external services in forest. The differences between the forest owners’ objectives depending on the participation in the cost sharing program imply that the participants did not value the non-monetary values less than those who had not participated in the cost sharing program, but they did value monetary values more. The average opinions about Kemera-subsidy did not vary much depending on the participation to the cost sharing program. Those who had participated in the cost sharing program in the last 10 years were a little more satisfied about the Kemera-subsidies. The majority thought that the best incentive in the Kemera-subsidy is the gained benefit in the future. The most common reason not to participate in the cost sharing program was the challenging applying.
  • Sahlberg, Eero (2018)
    This thesis examines underlying causes of customer churn in the Finnish insurance market. Using individual data on moving insurance customers, econometric modeling is conducted to find significant relations between observed customer characteristics and behavior, and the probability to churn. A subscription-based business gains revenue not only from new sales but more importantly from automatic renewals of existing customers, i.e. retention. Significant drops in retention are important to understand for the insurer in order to not lose profit. Churn is an antonym for retention. A change of address – or moving homes – is an event around which churn rates spike, as it is a time when all address-specific subscriptions (electricity, internet, etc.) need to be proactively renewed by the consumer. There were one million moving individuals in 2016, as reported by Posti. This means that a significant share of an insurer’s customers are at a heightened risk to churn, with an address change being the common denominator. This thesis asks which customer characteristics and experiences significantly either increase or decrease the probability of a customer either changing their home insurance or churning completely around the time of their move. Insurance literature such as Hillson & Murray-Webster (2007) and Vaughan (1996) are reviewed to present the nature of risk, the insurance mechanism and the modern insurance business model. An annual report by Finance Finland (2017) provides accounting data via which the Finnish market situation is presented, while data and reports by Posti (2016; 2017a; 2017b) provide the numbers and facts regarding Finnish movers. Churn modeling is based on 20th century discrete choice theory, literature of which is reviewed, most notably by Nobel-laureate Daniel McFadden (1974; 2000). Also presented are modern applications of choice theory into churn problems, such as Madden et al (1999). The empirical section of the thesis consists of data presentation, model construction and evaluation and finally discussion of the results. The final sample of customer data consists of 24 230 observations with 21 variables. Following Madden et al (1999) and with help from Cox (1958) and McFadden (1974), binomial logistic regression models are constructed to relate the probability of churning with the specified variables. It is found that customer data can be used to predict churn among movers. Significant weights are found for variables denoting the size of a customer’s insurance portfolio as well as customer age and the duration of customership. Also the presence of personal insurance products and contact with one’s insurer notably affect retention positively. Younger segments and customers with implications of lower income (with fewer insurance products, more payment installments) exhibit a significantly increased probability of churning.