4.4 向量自回归模型(Vector Autoregression)

以上三个章节我们讨论的都是单变量问题,现实世界往往更为复杂,这时我们将把AR模型从单变量拓展到多变量,多变量模型的特点是存在几组并行的时间序列,并且这些时间之间互相影响。

本章讨论的模型就叫做向量自回归模型(VAR),考虑一个三组并行时间序列数据的场景,在二阶的条件下,公式如下:

y1,t=ϕ01+ϕ11,1y1,t1+ϕ12,1y2,t1+ϕ13,1y3,t1+ϕ11,2y1,t2+ϕ12,2y2,t2+ϕ13,2y3,t2y2,t=ϕ02+ϕ21,1y1,t1+ϕ22,1y2,t1+ϕ23,1y3,t1+ϕ21,2y1,t2+ϕ22,2y2,t2+ϕ23,2y3,t2y3,t=ϕ03+ϕ31,1y1,t1+ϕ32,1y2,t1+ϕ33,1y3,t1+ϕ31,2y1,t2+ϕ32,2y2,t2+ϕ33,2y3,t2y_{1,t} = \phi_{01} + \phi_{11,1} * y_{1,t -1} + \phi_{12,1} * y_{2,t -1} + \phi_{13,1} * y_{3,t -1} + \phi_{11,2} * y_{1,t -2} + \phi_{12,2} * y_{2,t -2} + \phi_{13,2} * y_{3,t -2} \\ y_{2,t} = \phi_{02} + \phi_{21,1} * y_{1,t -1} + \phi_{22,1} * y_{2,t -1} + \phi_{23,1} * y_{3,t -1} + \phi_{21,2} * y_{1,t -2} + \phi_{22,2} * y_{2,t -2} + \phi_{23,2} * y_{3,t -2} \\ y_{3,t} = \phi_{03} + \phi_{31,1} * y_{1,t -1} + \phi_{32,1} * y_{2,t -1} + \phi_{33,1} * y_{3,t -1} + \phi_{31,2} * y_{1,t -2} + \phi_{32,2} * y_{2,t -2} + \phi_{33,2} * y_{3,t -2}

可以看到在AR模型的基础上,每一组时间序列的y值都加入了其他两组时间序列值作为模型因子。熟悉线性代数的读者可能已经发现,上面的三个公式可以写成向量乘法的形式,这也是VAR模型名字的由来,写作向量乘法的公式和前面学习的AR模型时完全一致的,在这个公式中,y和 ϕ0 \phi_0 3×13\times1的向量,ϕ1,ϕ2\phi_1, \phi_2 3×33\times3的矩阵。

y=ϕ0+ϕ1×yt1+ϕ2×yt2y = \phi_0 + \phi_1 × y_{t -1} + \phi_2 × y_{t -2}

从公式中也能看到随着阶数上升,VAR模型的变量增加很快,因此在使用时只有当期待存在不同时间序列互相影响的关系时才尝试这种方法。VAR在某些场景下十分有用,

  • 测试某个变量是否影响其他变量

  • 大量变量需要被预测,而分析师没有太多领域知识

  • 决定某个预测值在多大程度上是由潜在的因果性导致的

python实战部分

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import grangercausalitytests
from statsmodels.tsa.vector_ar.vecm import coint_johansen
from statsmodels.stats.stattools import durbin_watson
%matplotlib inline

在输出结果中,index以y结尾,表示响应变量,column以x结尾,表示预测变量,如果p值小于0.05表明存在格兰杰因果性。 因此根据检验数据,完全有理由使用VAR模型。

​ ​ Augmented Dickey-Fuller Test on "pgnp" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = 1.2743 ​ No. Lags Chosen = 1 ​ Critical value 1% = -3.486 ​ Critical value 5% = -2.886 ​ Critical value 10% = -2.58 ​ => P-Value = 0.9965. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "ulc" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = 1.3967 ​ No. Lags Chosen = 2 ​ Critical value 1% = -3.486 ​ Critical value 5% = -2.886 ​ Critical value 10% = -2.58 ​ => P-Value = 0.9971. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfco" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = 0.5762 ​ No. Lags Chosen = 5 ​ Critical value 1% = -3.488 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.987. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdf" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = 1.1129 ​ No. Lags Chosen = 7 ​ Critical value 1% = -3.489 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.9953. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfim" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -0.1987 ​ No. Lags Chosen = 1 ​ Critical value 1% = -3.486 ​ Critical value 5% = -2.886 ​ Critical value 10% = -2.58 ​ => P-Value = 0.9387. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfcf" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = 1.6693 ​ No. Lags Chosen = 9 ​ Critical value 1% = -3.49 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.581 ​ => P-Value = 0.9981. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfce" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -0.8159 ​ No. Lags Chosen = 13 ​ Critical value 1% = -3.492 ​ Critical value 5% = -2.888 ​ Critical value 10% = -2.581 ​ => P-Value = 0.8144. Weak evidence to reject the Null Hypothesis. ​ => Series is Non-Stationary.

没有一个变量具有平稳性,提示我们需要进一步进行协整检验(cointegration test)。 一般来说进行协整检验的步骤如下: 1.单独检验单个变量是否平稳,使用ADF test, KPSS test, PP test等方法 。 2.如果发现单个序列不平稳,则需要进一步进行协整检验。进行协整检验的目的是当每个变量本身不平稳,有可能他们在某些线性组合下是平稳的。如果两个时间序列是协整的,则表明他们具有长期的,统计学显著的关联,使用Johansen, Engle-Granger, and Phillips-Ouliaris等方法。

​ ​ Augmented Dickey-Fuller Test on "pgnp" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -10.9813 ​ No. Lags Chosen = 0 ​ Critical value 1% = -3.488 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "ulc" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -8.769 ​ No. Lags Chosen = 2 ​ Critical value 1% = -3.489 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfco" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -7.9102 ​ No. Lags Chosen = 3 ​ Critical value 1% = -3.49 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.581 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdf" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -10.0351 ​ No. Lags Chosen = 1 ​ Critical value 1% = -3.489 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfim" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -9.4059 ​ No. Lags Chosen = 1 ​ Critical value 1% = -3.489 ​ Critical value 5% = -2.887 ​ Critical value 10% = -2.58 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfcf" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -6.922 ​ No. Lags Chosen = 5 ​ Critical value 1% = -3.491 ​ Critical value 5% = -2.888 ​ Critical value 10% = -2.581 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

​ ​ Augmented Dickey-Fuller Test on "gdfce" ​ ----------------------------------------------- ​ Null Hypothesis: Data has unit root. Non-Stationary. ​ Significance Level = 0.05 ​ Test Statistic = -5.1732 ​ No. Lags Chosen = 8 ​ Critical value 1% = -3.492 ​ Critical value 5% = -2.889 ​ Critical value 10% = -2.581 ​ => P-Value = 0.0. Rejecting Null Hypothesis. ​ => Series is Stationary.

Last updated