Skip to main content
Login | Suomeksi | På svenska | In English

Utilizing Clustering to Create New Industrial Classifications of Finnish Businesses: Design Science Approach

Show full item record

Title: Utilizing Clustering to Create New Industrial Classifications of Finnish Businesses: Design Science Approach
Author(s): Hyttinen, Miika
Contributor: University of Helsinki, Faculty of Science
Degree program: Master's Programme in Computer Science
Specialisation: Software systems
Language: English
Acceptance year: 2022
An industrial classification system is a set of classes meant to describe different areas of business. Finnish companies are required to declare one main industrial class from TOL 2008 industrial classification system. However, the TOL 2008 system is designed by the Finnish authorities and does not serve the versatile business needs of the private sector. The problem was discovered in Alma Talent Oy, the commissioner of the thesis. This thesis follows the design science approach to create new industrial classifications. To find out what is the problem with TOL 2008 indus- trial classifications, qualitative interviews with customers were carried out. Interviews revealed several needs for new industrial classifications. According to the customer interviews conducted, classifications should be 1) more detailed, 2) simpler, 3) updated regularly, 4) multi-class and 5) able to correct wrongly assigned TOL classes. To create new industrial classifications, un- supervised natural language processing techniques (clustering) were tested on Finnish natural language data sets extracted from company websites. The largest data set contained websites of 805 Finnish companies. The experiment revealed that the interactive clustering method was able to find meaningful clusters for 62%-76% of samples, depending on the clustering method used. Finally, the found clusters were evaluated based on the requirements set by customer interviews. The number of classes extracted from the data set was significantly lower than the number of distinct TOL 2008 classes in the data set. Results indicate that the industrial classification system created with clustering would contain significantly fewer classes compared to TOL 2008 industrial classifications. Also, the system could be updated regularly and it could be able to correct wrongly assigned TOL classes. Therefore, interactive clustering was able to satisfy three of the five requirements found in customer interviews.
Keyword(s): clustering industrial classification design science unsupervised learning

Files in this item

Files Size Format View
Gradu_Final.pdf 2.122Mb PDF

This item appears in the following Collection(s)

Show full item record