Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Unsupervised machine learning"

Sort by: Order: Results:

  • Salmirinne, Simo (2020)
    Time series are essential in various domains and applications. Especially in retail business forecasting demand is a crucial task in order to make the appropriate business decisions. In this thesis we focus on a problem that can be characterized as a sub-problem in the field of demand forecasting: we attempt to form clusters of products that reflect the products’ annual seasonality patterns. We believe that these clusters would aid us in building more accurate forecast models. The seasonality patterns are identified from weekly sales time series, which in many cases are very sparse and noisy. In order to successfully identify the seasonality patterns from all the other factors contributing in a product’s sales, we build a pipeline to preprocess the data accordingly. This pipeline consist of first aggregating the sales of individual products over several stores to strengthen the sales signal, followed by solving a regularized weighted least squares objective to smooth the aggregates. Finally, the seasonality patterns are extracted using the STL decomposition procedure. These seasonality patterns are then used as input for the k-means algorithm and several hierarchical agglomerative clustering algorithms. We evaluate the clusters using two distinct approaches. In the first approach we manually label a subset of the data. These labeled subsets are then compared against the clusters provided by the clustering algorithms. In the second approach we form a simple forecast model that fits the clusters’ seasonality patterns back to the observed sales time series of individual products. In this approach we also build a secondary validation forecast model with the same objective, but instead of using the clusters provided by the algorithms, we use predetermined product categories as the clusters. These product categories should naturally provide a valid baseline for groups of products with similar seasonality as they reflect the structure of how similar products are organized within close proximity in physical stores. Our results indicate that we were able to find clear seasonal structure in the clusters. Especially the k-means algorithm and hierarchical agglomerative clustering algorithms with complete linkage and Ward’s method were able to form reasonable clusters, whereas hierarchical agglomerative clustering algorithm with single linkage was proven to be unsuitable given our data.