Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Joosten, Rick (2020)
    In the past two decades, an increasing amount of discussions are held via online platforms such as Facebook or Reddit. The most common form of disruption of these discussions are trolls. Traditional trolls try to digress the discussion into a nonconstructive argument. One strategy to achieve this is to give asymmetric responses, responses that don’t follow the conventional patterns. In this thesis we propose a modern machine learning NLP method called ULMFiT to automatically detect the discourse acts of online forum posts in order to detect these conversational patterns. ULMFiT finetunes the language model before training its classifier in order to create a more accurate language representation of the domain language. This task of discourse act recognition is unique since it attempts to classify the pragmatic role of each post within a conversation compared to the functional role which is related to tasks such as question-answer retrieval, sentiment analysis, or sarcasm detection. Furthermore, most discourse act recognition research has been focused on synchronous conversations where all parties can directly interact with each other while this thesis looks at asynchronous online conversations. Trained on a dataset of Reddit discussions, the proposed model achieves a matthew’s correlation coefficient of 0.605 and an F1-score of 0.69 to predict the discourse acts. Other experiments also show that this model is effective at question-answer classification as well as showing that language model fine-tuning has a positive effect on both classification performance along with the required size of the training data. These results could be beneficial for current trolling detection systems.
  • Duong, Quoc Quan (2021)
    Discourse dynamics is one of the important fields in digital humanities research. Over time, the perspectives and concerns of society on particular topics or events might change. Based on the changing in popularity of a certain theme different patterns are formed, increasing or decreasing the prominence of the theme in news. Tracking these changes is a challenging task. In a large text collection discourse themes are intertwined and uncategorized, which makes it hard to analyse them manually. The thesis tackles a novel task of automatic extraction of discourse trends from large text corpora. The main motivation for this work lies in the need in digital humanities to track discourse dynamics in diachronic corpora. Machine learning is a potential method to automate this task by learning patterns from the data. However, in many real use-cases ground truth is not available and annotating discourses on a corpus-level is incredibly difficult and time-consuming. This study proposes a novel procedure to generate synthetic datasets for this task, a quantitative evaluation method and a set of benchmarking models. Large-scale experiments are run using these synthetic datasets. The thesis demonstrates that a neural network model trained on such datasets can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
  • Sandoval Zárate, América Andrea (2015)
    Personalised medicine involves the use of individual information to determine the best medical treatment. Such information include the historical health records of the patient. In this thesis, the records used are part of the Finnish Hospital Discharge Register. This information is utilized to identify disease trajectories for individuals for the FINRISK cohorts. The techniques usually implemented to analyse longitudinal register data use Markov chains because of their capability to capture temporal relations. In this thesis a first order Markov chain is used to feed the MCL algorithm that identifies disease trajectories. These trajectories highlight the most prevalent diseases in the Finnish population: circulatory diseases, neoplasms and musculoskeletal disorders. Also, they defined high level interactions between other diseases, some of them showing an agreement with physiological interactions widely studied. For example, circulatory diseases and their thoroughly studied association with symptoms from the metabolic syndrome.
  • Haiminen, Niina (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2004)
  • Aalto, Iiro (2020)
    Slack is an instant messaging platform intended for the internal communications of companies and other organizations. For organizations that use Slack extensively it may provide an interesting source of insight, but as such the data is difficult to analyze. Topic modeling, primarily latent Dirichlet allocation (LDA), is commonly used to summarize textual data in a meaningful way. Instant messages tend to be very short, which causes problems for conventional topic modeling methods such as LDA. The data sparsity problem can be tackled with data expansion and data combination techniques. For instant messages, data combination is particularly attractive as the messages are not independent of each other, but form implicit, and sometimes expicit, threads as the participants reply to each other. Most of the threads in the Slack data are not explicit, but must be ’untangled’ from the message stream if they are to be used as a basis for a data combination scheme. In this thesis we study the possibility of detecting implicit threads from a slack message stream and leveraging the threads as a data combination scheme in topic modeling. The threads are detected using a hierarchical clustering algorithm which uses word mover’s distance, latent semantic analysis, and metadata to compute the distances between messages. The clusters are then concatenated and used as the input for LDA. It is shown that on a dataset gathered from the Gofore Oyj Slack workspace, the cluster-based model improves on the message-based model, but falls short of being practical.
  • Mikkola, Petrus (2020)
    This thesis examines discrete complex analysis and potential theory on isoradial graphs. Isoradial graphs form a general class of graphs where all faces of the graph can be inscribed into circles of equal radii. For instance, the square, the honeycomb, and the triangular lattices belong to this family. Discrete analogues (on isoradial graphs) of the classical complex analysis objects such as holomorphic and harmonic functions are considered. The focus is on two fundamental operators: the discrete Cauchy-Riemann and the discrete Laplace operator. Their inverses are studied as well: the discrete Cauchy kernel and the discrete Green’s function. The latter part of the thesis deals with discrete multiplicatively multivalued functions such as discrete complex power functions. Discrete multivalued functions are not extensively studied in general, but instead from a viewpoint of two special functions: the discrete multivalued Cauchy kernel and the discrete multivalued Green's function. These functions have relevance, for instance, when studying the asymptotics of the electric correlators of the dimer model. The dimer model is a classical model of statistical mechanics. The thesis is based on the following articles: "Discrete complex analysis on isoradial graphs" by Chelkak and Smirnov (2011), "Dimers and families of Cauchy-Riemann operators" by Dubédat (2015), and "The Laplacian and Dirac operators on critical planar graphs" by Kenyon (2002). The latter part of the thesis that deals with discrete multivalued functions, is built upon the Dubédat’s work (2015).
  • Vartia, Olli (2013)
    Porous structures, such as foams make excellent thermal insulators. This happens because thermal transfer by conduction is hindered by the voids in the material. However, heat can still radiate through the material or just past the voids. Due to Stefan-Boltzmann law, heat transfer by radiation can be especially significant for large temperatures, and it follows that thermal transfer models that account for radiation may be necessary in some cases. Several existing models for radiative thermal transfer in porous materials, such as continuum models and Monte Carlo, have been used in the past. What many of these models tend to have in common, is that they are highly specific to the systems they were originally made for and require some rather limiting approximations. A more general method which would only require knowing the material and the geometry of the system would be useful. One candidate for such a method, discrete dipole approximation for periodic structures, was tested. In the discrete dipole approximation a structure is discretized into a collection of polarizable points, or dipoles and an incoming electromagnetic planewave is set to polarize it. This has the benefits that it accurately takes into account the target geometry and possible near field effects. It was found that this method is limited for high wavelength, by computation time and for small wavelengths by errors. The cause of the errors for small wavelengths was not entirely caused by the discretization and remains not fully understood.
  • Kallanranta, Antti (2018)
    The geological Discrete Fracture Network (DFN) model is a statistical model for stochastically simulating rock fractures and minor faults (Fox et al. 2007). Unlike the continuum model approaches, DFN model geometries explicitly represent populations of individual or equivalent fractures (Wilson et al. 2011). Model construction typically involves stochastic approaches that create multiple deterministic realizations of the fracture network (Gringarten 1998). This study was made as a part of a broader Salpausselkä project to gain deeper understanding of the brittle structures in the study area.This thesis can be broken down to three steps: literaturereview of the DFN methodology, parameterization of the model variables, and DFN modeling itself. For the purposes of the DFN modeling one-dimensionalfracture intensities measured in the field (P10) had to be converted into their volumetric counterpart (P32). Wang’s (2005) C13 conversion factor was decided tobe the most appropriatemethod. Calculation of the angles between the scanlines and fracture normals (α), conversion factor C13, and P32 were done in Python by applying the methods presented by Wang (2005) and Fox et al. (2007, 2012). Fracture setswere weighted by their P10 intensities to get clearer picture of the dominant fracturing orientations. For better and automated classification clustering of the fracture poles into desired number of mean vectorswas conducted byusing kmeansfunctionof Python module MPLstereonet. The function finds centers of multi-modal clusters of data by using a numpy einsummodified for spherical measurements. Fracture set poles were divided into populationsby finding the mean vector with the smallest angular distance from each pole. C13 calculation was done by integrating over the probability distribution function(PDF)of each population.C13 values produced by the script fall within the expected range quoted by the reference literature(Wang 2005, Fox et al. 2007, 2012). In the final modeling phase the clustered groups were modeled in MOVE as finite surfaces and the resulting DFN model was compared to the Local anisotropy interpolator (LAI) model created by Ruuska (2018).Fracture populations were modeled on an outcrop level as well as interpolated over the whole study area, producing two different interpretations of the most dominant fracturing orientations.Based on the results, fracture set pole clustering with open source methods (MPLStereonet K-means) is a feasible approach. K-means clustering algorithm was superior to the expert approach on every level, though more studies are needed to ascertain the soundness of the methodology. Statements made at this point are merely tentative due to the quality and amount of the available data. Taking into account the results of the parallel MSc thesis (Ruuska, 2018) thesis, the DFN and clustered fracture populations constructed using aforementioned methods can be used as a tentative approximation of the preferred fracturing orientations within the boundaries of the study area. Outcrop level model shows the true, measured values and could be used as ground truth in future modeling efforts. Efficient production of large-scale brittle models could be possible with the added flexibility of the implicit modeling methods and automated clustering.
  • Detrois, Kira Elaine (2023)
    Background/Objectives: Various studies have shown the advantage when incorporating polygenic risk scores (PRSs) in models with classic risk factors. However, systematic comparisons of PRSs with non-genetic factors are lacking. In particular, many studies on PRSs do not even report the predictive performance of the confounders, such as age and sex, included in the model, which are already very predictive for most diseases. We looked at the ability of PRSs to predict the onset of 18 diseases in FinnGen R8 (N=342,499) and compared PRSs with the known non-genetic risk factors, age, sex, Education, and Charlson Comorbidity Index (CCI). Methods: We set up individual studies for the 18 diseases. A single study consisted of an exposure (1999-2009), a washout (2009-2011), and an observation period (2011-2019). Eligible individuals could not have the selected disease of interest inside the disease-free period, which ranged from birth until the beginning of the observation period. We then defined the case and control status based on the diagnoses in the observation period and calculated the phenotypic scores during the exposure period. The PRSs were calculated using MegaPRS and the latest publicly available genome-wide association study summary statistics. We then fitted separate Cox proportional hazards models for each disease to predict disease onset during the observation period. Results: In FinnGen, the model’s predictive ability (c-index) with all predictors ranged from 0.565 (95%CI: 0.552-0.576) for Acute Appendicitis to 0.838 (95% CI: 0.834-0.841) for Atrial Fibrillation. The PRSs outperformed the phenotypic predictors, CCI, and Education, for 6/18 diseases and still significantly enhance onset prediction for 13/18 diseases when added to a model with only non-genetic predictors. Conclusion: Overall, we showed that for many diseases PRSs add predictive power over commonly used predictors - such as age, sex, CCI, and Education. However, many important challenges must be addressed before implementing PRSs in clinical practice. Notably, we will need disease-specific cost- benefit analyses and studies to assess the direct impact of including PRSs in clinical use. Nonetheless, as more research is being conducted, PRSs could play an increasingly valuable role in identifying individuals at higher risk for certain diseases and enabling targeted interventions to improve health outcomes.
  • Muszynski, Johann Michael (2015)
    The presence of dislocations in metal crystals accounts for the plasticity of metals. These dislocations do not nucleate spontaneously, but require favorable conditions. These conditions include, but are not limited to, a high temperature, external stress, and an interface such as a grain boundary or a surface. The slip of dislocations leads to steps forming on the surface, as atomic planes are displaced along a line. If a void is placed very near a surface, the possibility of forming a dislocation platelet exists. The skip of the dislocation platelet would displace the surface atoms within a closed line. Repeating such a process may form a small protrusion on the surface. In this thesis, the mechanism with which a dislocations displace the surface atoms within a closed loop is studied by using molecular dynamics (MD) simulations of copper. A spherical void is placed within the lattice, and the lattice is then subjected to an external stress. The dislocation reactions which lead to the formation of the dislocation platelet after the initial dislocation nucleation on the void is studied by running MD simulations of a void with the radius of 3 nm under tensile stress. Since the dislocations are thermally activated, the simulation proceeded differently for each run. We describe the different ways the dislocations nucleate, and the dislocation reactions that occur when they intersect to form the platelet. The activation energy of this process was studied by simulating half of a much larger void, with a radius of 8 nm, in order to obtain a more realistic nucleation environment. Formulas connecting the observable and controllable simulation variables with the energies of the nucleation are derived. The activation energies are then calculated and compared with values from literature.
  • Johansson, Maria (2013)
    Dispersive liquid-liquid microextraction was developed in 2006 for the extraction of organic compounds from water samples. Since then, more complex matrices have been processed and the technique includes nowadays a variety of subsets. To the advantages of the technique are, for example, its rapidity, low cost and high enrichment factors. A pretreatment and analysis method was developed for the five harmful flame retardants, dechlorane plus (syn and anti) and dechloranes 602, -603 and -604 (component A) from solid environmental samples. The pretreatment method included extraction with pressurised liquid extraction and clean-up with multilayer silica and basic alumina columns. The analytes were separated and analysed with gas chromatography coupled to mass spectrometry. Electron capture negative ionisation was applied as the ionisation technique. The developed method was sensitive, resulting in acceptable recoveries and low detection limits. The chosen ionisation technique was proven to be superior over the more used electron ionisation.
  • Sobolev, Anton (2020)
    When couples with children split or divorce, they are often unable to come to a mutual agreement concerning their child's place of residency, custody, the child's meetings with the other parent and the frequency of these meetings, or financial aid one parent is obliged to pay the other parent for the child. In many countries, these disagreements quite often lead to long disputes in court. A lot of research has been made (both in Finland and internationally) concerning the court's consideration of disputes about children. This thesis studies the disputes on custody and residency of a child in the district courts of Finland. The objective is to find out which factors play the biggest role in solving these disputes in court. Nine district courts of Finland have kindly provided the documents of the disputes concerning custody and residency of children from the period of 2004 - 2015. Only the cases where a dispute was solely between the parents of a child (no other relatives) and where the final decision was made by court (no agreement between the parties) are taken into analysis. Disputes are divided into two types - the ones where residency of a child was involved in a dispute (residency disputes) and the ones where it was not involved (custody disputes). The winner of a dispute is a dependent variable. A logistic regression model is applied for the custody disputes, and a cumulative logistic regression model is applied for the residency disputes. Due to results of the analysis, mothers win more disputes than fathers, but the difference is statistically significant only for the residency disputes. When only father is of a foreign background, it lowers father's winning chances in a custody dispute, but neither father's nor mother's foreign backgrounds are statistically significant for the residency disputes. A substantiated violence of father towards mother again acts negatively for fathers in custody disputes, and so does a non-substantiated accusation regarding alcohol or drug abuse by father. For the residency disputes, the main factors decreasing fathers' probability to win are mother hiring a legal assistant and father receiving legal aid (which takes place when father is not financially capable of hiring a legal assistant). Established conditions of a child at one of the parents increase the winning chances of that parent, but the effect is higher for fathers. All the accusations (both substantiated and non-substantiated in court) act in favor of fathers; these are substantiated mother's mental disorder, non-substantiated alcohol or drug abuse by mother and non-substantiated accusation regarding father's violence towards mother. At the same time, no variables regarding genders of children disputed about, genders of a judge or of legal assistants are statistically significant in the models. The same concerns the parents' demands in court, as well as the ages of parents (and their difference) and of children involved in disputes. This investigation can be extended by adding the disputes from other years and from other district courts into the analysis.
  • Singh, Maninder Pal (2016)
    Research in healthcare domain is primarily focused on diseases based on the physiological changes of an individual. Physiological changes are often linked to multiple streams originated from different biological systems of a person. The streams from various biological systems together form attributes for evaluation of symptoms or diseases. The interconnected nature of different biological systems encourages the use of an aggregated approach to understand symptoms and predict diseases. These streams or physiological signals obtained from healthcare systems contribute to a vast amount of vital information in healthcare data. The advent of technologies allows to capture physiological signals over the period, but most of the data acquired from patients are observed momentarily or remains underutilized. The continuous nature of physiological signals demands context aware real-time analysis. The research aspects are addressed in this thesis using large-scale data processing solution. We have developed a general-purpose distributed pipeline for cumulative analysis of physiological signals in medical telemetry. The pipeline is built on the top of a framework which performs computation on a cluster in a distributed environment. The emphasis is given to the creation of a unified pipeline for processing streaming and non-streaming physiological time series signals. The pipeline provides fault-tolerance guarantees for the processing of signals and scalable to multiple cluster nodes. Besides, the pipeline enables indexing of physiological time series signals and provides visualization of real-time and archived time series signals. The pipeline provides interfaces to allow physicians or researchers to use distributed computing for low-latency and high-throughput signals analysis in medical telemetry.
  • Laukkanen, Janne Johannes (2018)
    The vast amount of data created in the world today requires an unprecedented amount of processing power to be turned into valuable information. Importantly, more and more of this data is created on the edges of the Internet, where small computers, capable of sensing and controlling their environments, are producing it. Traditionally these so-called Internet of Things (IoT) devices have been utilized as sources of data or as control devices, and their rising computing capabilities have not yet been harnessed for data processing. Also, the middleware systems that are created to manage these IoT resources have heterogeneous APIs, and thus cannot communicate with each other in a standardized way. To address these issues, the IoT Hub framework was created. It provides a RESTful API for standardized communication, and includes an execution engine for distributed task processing on the IoT resources. A thorough experimental evaluation shows that the IoT Hub platform can considerably lower the execution time of a task in a distributed IoT environment with resource constrained devices. When compared to theoretical benchmark values, the platform scales well and can effectively utilize dozens of IoT resources for parallel processing.
  • Hirvonen, Juho (2012)
    In this work we study a graph problem called edge packing in a distributed setting. An edge packing p is a function that associates a packing weight p(e) with each edge e of a graph such that the sum of the weights of the edges incident to each node is at most one. The task is to maximise the total weight of p over all edges. We are interested in approximating a maximum edge packing and in finding maximal edge packings, that is, edge packings such that the weight of no edge can be increased. We use the model of distributed computing known as the LOCAL model. A communication network is modelled as a graph, where nodes correspond to computers and edges correspond to direct communication links. All nodes start at the same time and they run the same algorithm. Computation proceeds in synchronous communication rounds, during each of which each node can send a message through each of its communication links, receive a message from each of its communication links, and then do unbounded local computation. When a node terminates the algorithm, it must produce a local output – in this case a packing weight for each incident edge. The local outputs of the nodes must together form a feasible global solution. The running time of an algorithm is the number of steps it takes until all nodes have terminated and announced their outputs. In a typical distributed algorithm, the running time of an algorithm is a function of n, the size of the communication graph, and ∆, the maximum degree of the communication graph. In this work we are interested in deterministic algorithms that have a running time that is a function of ∆, but not of n. In this work we will review an O(log ∆)-time constant-approximation algorithm for maximum edge packing, and an O(∆)-time algorithm for maximal edge packing. Maximal edge packing is an example of a problem where the best known algorithm has a running time that is linear-in-∆. Other such problems include maximal matching and (∆ + 1)-colouring. However, few matching lower bounds exist for these problems: by prior work it is known that finding a maximal edge packing requires time Ω(log ∆), leaving an exponential gap between the best known lower and upper bounds. Recently Hirvonen and Suomela (PODC 2012) showed a linear-in-∆ lower bound for maximal matching. This lower bound, however, applies only in weaker, anonymous models of computation. In this work we show a linear-in-∆ lower bound for maximal edge packing. It applies also in the stronger port numbering model with orientation. Recently Göös et al. (PODC 2012) showed that for a large class of optimisation problems, the port numbering with orientation model is as powerful as a stronger, so called unique identifier model. An open question is if this result can applied to extend our lower bound to the unique identifier model.
  • Mäki, Jussi Olavi Aleksis (2013)
    With the increasing growth of data traffic in mobile networks there is an ever growing demand from the operators for a more scalable and cost efficient network core. Recent successes in deploying Software-Defined Networking (SDN) in data centers and large network backbones has given it credibility as a viable solution for meeting the requirements of even the large core networks. Software-Defined Networking is a novel new paradigm where the control logic of the network is separated from the network elements into logically centralized controllers. This separation of concerns offers more flexibility in network control and makes writing of new management applications, such as routing protocols, easier, faster and more manageable. This thesis is an empirical experiment in designing and implementing a scalable and fault- tolerant distributed SDN controller and management application for managing the GPRS Tunneling Protocol flows that carry the user data traffic within the Evolved Packet Core. The experimental implementation is built using modern open-source distributed system tools such as the Apache Zookeeper distributed coordination service and Basho's Riak distributed key-value database. In addition to the design, a prototype implementation is presented and its performance is evaluated.
  • Alhalaseh, Rola (2018)
    Sensors of different kinds connect to the IoT network and generate a large number of data streams. We explore the possibility of performing stream processing at the network edge and an architecture to do so. This thesis work is based on a prototype solution developed by Nokia. The system operates close to the data sources and retrieves the data based on requests made by applications through the system. Processing the data close to the place where it is generated can save bandwidth and assist in decision making. This work proposes a processing component operating at the far edge. The applicability of the prototype solution given the proposed processing component was illustrated in three use cases. Those use cases involve analysis performed on values of Key Performance Indicators, data streams generated by air quality sensors called Sensordrones, and recognizing car license plates by an application of deep learning.
  • Lange, Moritz Johannes (2020)
    In the context of data science and machine learning, feature selection is a widely used technique that focuses on reducing the dimensionality of a dataset. It is commonly used to improve model accuracy by preventing data redundancy and over-fitting, but can also be beneficial in applications such as data compression. The majority of feature selection techniques rely on labelled data. In many real-world scenarios, however, data is only partially labelled and thus requires so-called semi-supervised techniques, which can utilise both labelled and unlabelled data. While unlabelled data is often obtainable in abundance, labelled datasets are smaller and potentially biased. This thesis presents a method called distribution matching, which offers a way to do feature selection in a semi-supervised setup. Distribution matching is a wrapper method, which trains models to select features that best affect model accuracy. It addresses the problem of biased labelled data directly by incorporating unlabelled data into a cost function which approximates expected loss on unseen data. In experiments, the method is shown to successfully minimise the expected loss transparently on a synthetic dataset. Additionally, a comparison with related methods is performed on a more complex EMNIST dataset.
  • Gaire, Surakshya (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2016)
    The objective of this master s thesis was to better understand the impact of black carbon and its distribution in Northern Europe and the Arctic. To achieve the goal of the project, information on the observations relevant to black carbon (BC) pollution in Arctic dataset was collected. For the observational data all main BC measurement campaigns along with active satellite operations were collected. In this study, the BC concentration and deposition was estimated by the System Integrated Modelling of Atmospheric coMposition - a chemical transport model (CTM) SILAM. The model was driven with monitoring atmospheric composition and climate (MACCity), emission database for global atmospheric research hemispheric transport of air pollution (EDGAR-HTAP), and evaluating the climate and air quality impacts of short lived pollutants (ECLIPSE) emission inventories. For the computations, the year 2010 was chosen because of a better availability of data during that year. In the literature section, black carbon process in the atmosphere is explained along with its properties and characteristics. Furthermore, data description and data analysis is included which is followed by interpretation of model output on the seasonal deposition and concentration. As shown by the model-measurement comparison, the model basically captured the measured BC and organic carbon (OC) quite well for all emission inventories. However, the correlation coefficient for OC was still weak for most of the stations in Europe. The overall performance of BC for European stations is substantially better than in the Arctic areas. Deposition for BC and OC shows that the seasonal transport of BC from source regions is evident in the Arctic and near Arctic areas. Patterns of dry deposition is higher in winter period than in summer period. The SILAM model suggests winter period concentration of BC by MACCity and ECLIPSE inventory of 0.23 µg/m3 and 0.26 µg/m3 respectively for year 2010. This study provides a best performing setup for BC modeling , transport and deposition of BC in the Northern Europe and the Arctic despite the absence of ageing process. More observational data from Arctic stations would provide better result and model performance. Finally, the study gives an insight of the quality of existing emission inventories and the capabilities of reproducing seasonal deposition and concentration of BC to the Arctic.
  • Vento, Eero (2017)
    Tourism is one of the main contributors in the fight against poverty, as it has become one of the strongest drivers of trade and prosperity in the global south. Protected area tourism is an especially quickly growing segment of the industry, having an important role in regional development on many rural areas of global south. However, territories labelled as protected areas represent a great variety of spaces. This research aims at unifying the holistic picture of protected area tourism governance by analysing, how protected areas with divergent landownership arrangements, management objectives and associated regulations influence tourism development and its local socio-economic impacts at the grass roots. This comparative case-study survey scrutinizes local-level tourism governance and territorial regulations on three neighbouring protected areas in Taita Taveta County, Kenya. The Tsavo National Parks are state-owned conservancies focusing on conserving biodiversity. LUMO community wildlife sanctuary is a nature tourism project owned and orchestrated by a local community, which aims to advance local socio-economic development via tourism while preserving the environment at the same time. The third area, Sarova, is a private-owned conservancy harnessed solely for nature tourism and profit-making. The areas are liable to same legislative framework and international phenomena have similar influence on them, which makes comparison on their divergent management objectives and local-level regulations expedient. By giving voice to local-level tourism stakeholders, it is possible to point out how the category (i.e. public, private or community) of the land owner and the areas’ respective management objectives influence tourism operations and impact the socio-economic outcomes from both conservation and tourism. The comparative analyses focus first on spatial socially constructed preconditions for tourism development and second, on its developmental outcomes that will primarily be analysed by reflecting the livelihood changes generated by protected area tourism and protection regulations in place. The data-set was gathered during field research in February–March 2016, and it is mainly based on semi-structured interviews with tourism employees, employers and regional experts. The principal method of interviewing is supplemented by observation and statistics, and the data is analysed by thematic and qualitative content analyses. The protected areas’ management objectives and associated regulations have drastic impacts on tourism development within their respective borders. The local administrations of the protected areas were identified as the primary institutions to explain the stark spatial differences in the case-study areas tourist numbers. Instead of the mere ”type” of the landowner, the areas’ respective management objectives and associated regulations determined whether protected area tourism generated livelihoods or other positive socio-economic outcomes. Altogether, similar preconditions for tourism development and similar socio-economic outcomes cannot be expected from all territories labelled as protected areas.