Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "feature selection"

Sort by: Order: Results:

  • Shappo, Viacheslav (2022)
    The primary concern of the companies working with many customers is proper customer segmentation, i.e., division of the customers into different groups based on their common characteristics. Customer segmentation helps marketing specialists to adjust their offers and reach potential customer groups interested in a specific type of product or service. In addition, knowing such customer segments may help search for new look-alike customers sharing similar characteristics. The first and most crucial segmentation is splitting the customers into B2B (business to business) and B2C (business to consumers). The next step is to analyze these groups properly and create more through product-specific groups. Nowadays, machine learning plays a vital role in customer segmentation. This is because various classification algorithms can see more patterns in customer characteristics and create more tailored customer segmentations than a human can. Therefore, utilizing machine learning approaches in customer segmentation may help companies save their costs on marketing campaigns and increase their sales by targeting the correct customers. This thesis aims to analyze B2B customers potentially interested in renewable diesel "Neste MY" and create a classification model for such segmentation. The first part of the thesis is focused on the theoretical background of customer segmentation and its use in marketing. Firstly, the thesis introduces general information about Neste as a company and discusses the marketing stages that involve the customer segmentation approach. Secondly, the data features used in the study are presented. Then the methodological part of the thesis is introduced, and the performance of three selected algorithms is evaluated on the test data. Finally, the study's findings and future means of improvement are discussed. The significant finding of the study is that finely selected features may significantly improve model performance while saving computational power. Several important features are selected as the most crucial customer characteristics that the marketing department afterward uses for future customer segmentations.
  • Porna, Ilkka (2022)
    Despite development in many areas of machine learning in recent decades, still, changing data sources between the domain in a model is trained and the domain in the same model is used for predictions is a fundamental and common problem. In the area of domain adaptation, these circum- stances have been studied by incorporating causal knowledge about the information flow between features to be utilized in the feature selection for the model. That work has shown promising results to accomplish so-called invariant causal prediction, which means a prediction performance is immune to the change levels between domains. Within these approaches, recognizing the Markov blanket to the target variable has served as a principal workhorse to find the optimal starting point. In this thesis, we continue to investigate closely the property of invariant prediction performance within Markov blankets to target variable. Also, some scenarios with latent parents involved in the Markov blanket are included to understand the role of the related covariates around the latent parent effect to the invariant prediction properties. Before the experiments, we cover the concepts of Makov blankets, structural causal models, causal feature selection, covariate shift, and target shift. We also look into ways to measure bias between changing domains by introducing transfer bias and incomplete information bias, as these biases play an important role in the feature selection, often being a trade-off situation between these biases. In the experiments, simulated data sets are generated from structural causal models to conduct the testing scenarios with the changing conditions of interest. With different scenarios, we investigate changes in the features of Markov blankets between training and prediction domains. Some scenarios involve changes in latent covariates as well. As result, we show that parent features are generally steady predictors enabling invariant prediction. An exception is a changing target, which basically requires more information about the changes in other earlier domains to enable invariant prediction. Also, emerging with latent parents, it is important to have some real direct causes in the feature sets to achieve invariant prediction performance.