Browsing by Author "Wichmann, Nic"

Now showing items 1-1 of 1

Knowledge Distillation with Multiple Teachers for NLP

Wichmann, Nic (2022)

Building a Neural Language Model from scratch involves a big number of different design decisions. You need to decide, among other things, the network structure, input and output encodings, regularization methods and optimization algorithm. In addition, the number of options for each decision is constantly increasing due to new ones being invented. The trend when it comes to Neural Language Models is also to simply increase the number of parameters in the model in order to increase the performance of inference. With more and more parameters to learn, a growing amount of expensive hardware is needed to teach the models. A method called Knowledge Distillation can be used to start from an existing, already trained, Neural Language Model instead. The existing model is used as a teacher in a way that aligns the knowledge of the student model with it. We build a student Neural Language Model that uses multiple teacher models. In this scenario each teacher model is seen as an expert in a specific language. The student model is taught all the languages of the teacher models. In our experiment we use four existing language models, each specialized in a different language. We train the student model using Knowledge Distillations with some additional techniques which enable using multiple teachers. We use the XNLI benchmark to show that the student model is able to successfully learn all of the languages.

Now showing items 1-1 of 1