Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Activation Region"

Sort by: Order: Results:

  • Hätönen, Vili (2020)
    Recently it has been shown that sparse neural networks perform better than dense networks with similar number of parameters. In addition, large overparameterized networks have been shown to contain sparse networks which, while trained in isolation, reach or exceed the performance of the large model. However, the methods to explain the success of sparse networks are still lacking. In this work I study the performance of sparse networks using network’s activation regions and patterns, concepts from the neural network expressivity literature. I define network specialization, a novel concept that considers how distinctly a feed forward neural network (FFNN) has learned to processes high level features in the data. I propose Minimal Blanket Hypervolume (MBH) algorithm to measure the specialization of a FFNN. It finds parts of the input space that the network associates with some user-defined high level feature, and compares their hypervolume to the hypervolume of the input space. My hypothesis is that sparse networks specialize more to high level features than dense networks with the same number of hidden network parameters. Network specialization and MBH also contribute to the interpretability of deep neural networks (DNNs). The capability to learn representations on several levels of abstraction is at the core of deep learning, and MBH enables numerical evaluation of how specialized a FFNN is w.r.t. any abstract concept (a high level feature) that can be embodied in an input. MBH can be applied to FFNNs in any problem domain, e.g. visual object recognition, natural language processing, or speech recognition. It also enables comparison between FFNNs with different architectures, since the metric is calculated in the common input space. I test different pruning and initialization scenarios on the MNIST Digits and Fashion datasets. I find that sparse networks approximate more complex functions, exploit redundancy in the data, and specialize to high level features better than dense, fully parameterized networks with the same number of hidden network parameters.