Vector or multivariate autoregression is a statistical model for random processes. It is relatively simple yet flexible enough to describe many real-world phenomena. Stochastic processes modelled by multivariate autoregression are called vector autoregressive (VAR) processes.
The structure of a VAR process is determined by the conditional independences of the variables and the lag length that describes the duration of direct influence. Structure discovery in VAR processes refers to finding reasonable candidates for these elements.
Learning the structure of a VAR process can be realized using graphical models, where nodes represent variables and edges represent absence of conditional independence. This transforms the problem of learning conditional independences of variables into the problem of finding edges between nodes.
This thesis extends previous studies to make inference on the structure of VAR processes involving tens or hundreds of variables, without assuming the underlying Granger causality graphs to be decomposable. A scoring function capable of predicting the Markov blankets of the nodes is derived and proved to be consistent. This scoring function is combined with another scoring function to discover VAR structures from multivariate time series.
The performance of the proposed method is tested on synthetic data. In all test cases that are considered, given enough samples and some a priori information, the true lag length can be identified, the true positive rate made higher than 0.94, and the false positive rate kept below 0.01.