递归最小二乘滤波器

递归最小二乘滤波器（RLS）属于自适应滤波器，会针对和输入信号有关的加权最小二乘（英语：Weighted least squares）损失函数，递归查找可以使其最小化的系数。此方法和想要减少均方误差的最小均方滤波器（LMS）不同。在推导递归最小二乘滤波器时，会假设输入信号是确定性的，而最小均方滤波及其他算法会假设信号随机。和其他的方法比较起来，递归最小二乘滤波器的收敛速度特别快。但其代价是非常高的运算复杂度。

演进

递归最小二乘滤波器是由卡尔·弗里德里希·高斯发现，但被遗忘无人使用，直到Plackett在1950年发现高斯1821年的著作之后，才再获使用。一般来说，递归最小二乘滤波器可以求得任何可以用自适应滤波器求解的问题。例如，假设信号 $d(n)$ 从有回声的有噪信道传输，接收到的是

x(n)=\sum _{k=0}^{q}b_{n}(k)d(n-k)+v(n)

其中 $v(n)$ 代表加性高斯白噪声。RLS滤波器的目的是用 $p+1$ 阶有限冲激响应（FIR）滤波器 $\mathbf {w}$ ，还原想要的信号 $d(n)$ ：

d(n)\approx \sum _{k=0}^{p}w(k)x(n-k)=\mathbf {w} ^{\mathit {T}}\mathbf {x} _{n}

其中 $\mathbf {x} _{n}=[x(n)\quad x(n-1)\quad \ldots \quad x(n-p)]^{T}$ 是包括 $x(n)$ 最近 $p+1$ 次取样的列向量。接收到想要信号的估测值为

{\hat {d}}(n)=\sum _{k=0}^{p}w_{n}(k)x(n-k)=\mathbf {w} _{n}^{\mathit {T}}\mathbf {x} _{n}

其目的是估测滤波器 $\mathbf {w}$ 的参数，在每个时间 $n$ ，会将目前的估测值用 $\mathbf {w} _{n}$ 表示，自适应的最小方差估测值为 $\mathbf {w} _{n+1}$ 。 $\mathbf {w} _{n}$ 也是如下的列向量，其转置 $\mathbf {w} _{n}^{\mathit {T}}$ 则是行向量。矩阵乘法 $\mathbf {w} _{n}^{\mathit {T}}\mathbf {x} _{n}$ （也是 $\mathbf {w} _{n}$ 和 $\mathbf {x} _{n}$ 的点积）是 ${\hat {d}}(n)$ 为标量。若 ${\hat {d}}(n)-d(n)$ 在最小二乘法的概念下，其值是小的，其估测就算是好的。

随着时间演进，会希望避免完全重做最小方差算法来找新估测值 $\mathbf {w} _{n+1}$ ，会希望可以将新估测值 $\mathbf {w} _{n+1}$ 用 $\mathbf {w} _{n}$ 配合其他变量来表示。

递归最小二乘滤波器的好处是不用进行反矩阵运算，因此可以节省计算时间。另一个好处是在其结果之后，提供了类似卡尔曼滤波的直觉信息。

概念

递归最小二乘滤波器背后的概念是在收到新资料时，适当选择滤波器系数 $\mathbf {w} _{n}$ 、更新滤波器，以及让损失函数 $C$ 最小化。误差信号 $e(n)$ 和期望信号 $d(n)$ 之间的关系，可以用以下的负反馈框图来说明：

误差会透过估测值 ${\hat {d}}(n)$ ，受到滤波器系数影响：

e(n)=d(n)-{\hat {d}}(n)

加权最小方差函数 $C$ —希望可以最小化的费用函数—是 $e(n)$ 的函数，因此也会受到滤波器系数的影响：

C(\mathbf {w} _{n})=\sum _{i=0}^{n}\lambda ^{n-i}e^{2}(i)

其中 $0<\lambda \leq 1$ ，是“遗忘因子”（forgetting factor），以指数的方式让较旧的资料有较小的加权。

费用函数最小化的方式，是将系数向量 $\mathbf {w} _{n}$ 中的所有项次微分，并让微分为零

{\frac {\partial C(\mathbf {w} _{n})}{\partial w_{n}(k)}}=\sum _{i=0}^{n}2\lambda ^{n-i}e(i)\cdot {\frac {\partial e(i)}{\partial w_{n}(k)}}=-\sum _{i=0}^{n}2\lambda ^{n-i}e(i)\,x(i-k)=0\qquad k=0,1,\ldots ,p

接着，用以下的误差信号来代替 $e(n)$

\sum _{i=0}^{n}\lambda ^{n-i}\left[d(i)-\sum _{\ell =0}^{p}w_{n}(\ell )x(i-\ell )\right]x(i-k)=0\qquad k=0,1,\ldots ,p

将等式重组如下

\sum _{\ell =0}^{p}w_{n}(\ell )\left[\sum _{i=0}^{n}\lambda ^{n-i}\,x(i-\ell )x(i-k)\right]=\sum _{i=0}^{n}\lambda ^{n-i}d(i)x(i-k)\qquad k=0,1,\ldots ,p

可以表示为以下的矩阵

\mathbf {R} _{x}(n)\,\mathbf {w} _{n}=\mathbf {r} _{dx}(n)

其中 $\mathbf {R} _{x}(n)$ 是 $x(n)$ 的加权样本协方差（英语：Sample mean and sample covariance）矩阵，而 $\mathbf {r} _{dx}(n)$ 是 $d(n)$ 和 $x(n)$ 互协方差的等效估计。依照上式可以找到最小化费用函数的系数

\mathbf {w} _{n}=\mathbf {R} _{x}^{-1}(n)\,\mathbf {r} _{dx}(n)

如何选择λ

$\lambda$ 越小，旧数据对协方差矩阵的影响越小，让滤波器对最近的数据较敏感，这也会让滤波器的co-efficients比较容易振荡。 $\lambda =1$ 称为growing window递归最小二乘算法。在实务上，会让 $\lambda$ 0.98介于1之间^[1]。利用第二型的最大可能区间估测，可以用资料中估测到最佳的 $\lambda$ ^[2]。

递归算法

以上叙述的结论是可以决定费用函数的参数，使费用函数最小化的方程式。以下则说明如何找到此形式的递归解

\mathbf {w} _{n}=\mathbf {w} _{n-1}+\Delta \mathbf {w} _{n-1}

其中 $\Delta \mathbf {w} _{n-1}$ 是时间 ${n-1}$ 的修正因子。首先将互协方差 $\mathbf {r} _{dx}(n)$ 用 $\mathbf {r} _{dx}(n-1)$ 来表示

$\mathbf {r} _{dx}(n)$	$=\sum _{i=0}^{n}\lambda ^{n-i}d(i)\mathbf {x} (i)$
	$=\sum _{i=0}^{n-1}\lambda ^{n-i}d(i)\mathbf {x} (i)+\lambda ^{0}d(n)\mathbf {x} (n)$
	$=\lambda \mathbf {r} _{dx}(n-1)+d(n)\mathbf {x} (n)$

其中 $\mathbf {x} (i)$ 是 ${p+1}$ 维的资料向量

\mathbf {x} (i)=[x(i),x(i-1),\dots ,x(i-p)]^{T}

接下来以相似的方式，用 $\mathbf {R} _{x}(n-1)$ 表示 $\mathbf {R} _{x}(n)$

$\mathbf {R} _{x}(n)$	$=\sum _{i=0}^{n}\lambda ^{n-i}\mathbf {x} (i)\mathbf {x} ^{T}(i)$
	$=\lambda \mathbf {R} _{x}(n-1)+\mathbf {x} (n)\mathbf {x} ^{T}(n)$

为了要找到其系数向量，接下来要关注的是决定性自协方差矩阵的反矩阵。这问题可以使用伍德伯里矩阵恒等式（英语：Woodbury matrix identity）。若

$A$	$=\lambda \mathbf {R} _{x}(n-1)$ 是 $(p+1)\times (p+1)$ 矩阵
$U$	$=\mathbf {x} (n)$ 是 $(p+1)\times 1$ （列向量）
$V$	$=\mathbf {x} ^{T}(n)$ 是 $1\times (p+1)$ （行向量）
$C$	$=\mathbf {I} _{1}$ 是 $1\times 1$ 单位矩阵

依照伍德伯里矩阵恒等式，可得到下式

$\mathbf {R} _{x}^{-1}(n)$	$=$	$\left[\lambda \mathbf {R} _{x}(n-1)+\mathbf {x} (n)\mathbf {x} ^{T}(n)\right]^{-1}$
	$=$	${\dfrac {1}{\lambda }}\left\lbrace \mathbf {R} _{x}^{-1}(n-1)-{\dfrac {\mathbf {R} _{x}^{-1}(n-1)\mathbf {x} (n)\mathbf {x} ^{T}(n)\mathbf {R} _{x}^{-1}(n-1)}{\lambda +\mathbf {x} ^{T}(n)\mathbf {R} _{x}^{-1}(n-1)\mathbf {x} (n)}}\right\rbrace$

为了和标准的文献一致，定义

$\mathbf {P} (n)$	$=\mathbf {R} _{x}^{-1}(n)$
	$=\lambda ^{-1}\mathbf {P} (n-1)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)$

其中的增益向量 $g(n)$ 为

$\mathbf {g} (n)$	$=\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)\left\{1+\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)\right\}^{-1}$
	$=\mathbf {P} (n-1)\mathbf {x} (n)\left\{\lambda +\mathbf {x} ^{T}(n)\mathbf {P} (n-1)\mathbf {x} (n)\right\}^{-1}$

在往下推导之前，需要将 $\mathbf {g} (n)$ 改为以下的形式

$\mathbf {g} (n)\left\{1+\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)\right\}$	$=\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)$
$\mathbf {g} (n)+\mathbf {g} (n)\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)$	$=\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)$

等式两侧减去左边的第二项，得到

$\mathbf {g} (n)$	$=\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)\mathbf {x} (n)$
	$=\lambda ^{-1}\left[\mathbf {P} (n-1)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\mathbf {P} (n-1)\right]\mathbf {x} (n)$

配合 $\mathbf {P} (n)$ 的递归式定义，希望的形式如下

\mathbf {g} (n)=\mathbf {P} (n)\mathbf {x} (n)

此时就可以完成递归，如以上讨论

$\mathbf {w} _{n}$	$=\mathbf {P} (n)\,\mathbf {r} _{dx}(n)$
	$=\lambda \mathbf {P} (n)\,\mathbf {r} _{dx}(n-1)+d(n)\mathbf {P} (n)\,\mathbf {x} (n)$

第二步是从 $\mathbf {r} _{dx}(n)$ 的递归式定义开始，接着使用 $\mathbf {P} (n)$ 的递归式定义，配合调整后的 $\mathbf {g} (n)$ ，可以得到

$\mathbf {w} _{n}$	$=\lambda \left[\lambda ^{-1}\mathbf {P} (n-1)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)\right]\mathbf {r} _{dx}(n-1)+d(n)\mathbf {g} (n)$
	$=\mathbf {P} (n-1)\mathbf {r} _{dx}(n-1)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\mathbf {P} (n-1)\mathbf {r} _{dx}(n-1)+d(n)\mathbf {g} (n)$
	$=\mathbf {P} (n-1)\mathbf {r} _{dx}(n-1)+\mathbf {g} (n)\left[d(n)-\mathbf {x} ^{T}(n)\mathbf {P} (n-1)\mathbf {r} _{dx}(n-1)\right]$

配合 $\mathbf {w} _{n-1}=\mathbf {P} (n-1)\mathbf {r} _{dx}(n-1)$ ，可以得到以下的更新方程式

$\mathbf {w} _{n}$	$=\mathbf {w} _{n-1}+\mathbf {g} (n)\left[d(n)-\mathbf {x} ^{T}(n)\mathbf {w} _{n-1}\right]$
	$=\mathbf {w} _{n-1}+\mathbf {g} (n)\alpha (n)$

其中 $\alpha (n)=d(n)-\mathbf {x} ^{T}(n)\mathbf {w} _{n-1}$ 是先验误差。将此和后验误差（在滤波器更新后计算的误差）比较

e(n)=d(n)-\mathbf {x} ^{T}(n)\mathbf {w} _{n}

这就找到了修正因子

\Delta \mathbf {w} _{n-1}=\mathbf {g} (n)\alpha (n)

这个结论指出了修正系数直接和误差和增益向量成正比，增益向量会透过加权因子 $\lambda$ 影响想要的灵敏度，这个结论很符合直觉。

RLS算法摘要

p阶RLS滤波器的算法可以摘要如下

参数：	$p=$ 阶数
	$\lambda =$ 遗忘因子
	$\delta =\mathbf {P} (0)$ 的初始值
开始：	$\mathbf {w} (0)=0$ ,
	$x(k)=0,k=-p,\dots ,-1$ ,
	$d(k)=0,k=-p,\dots ,-1$
	$\mathbf {P} (0)=\delta I$ 其中 $I$ 是 $p+1$ 阶的单位矩阵
计算：	针对 $n=1,2,\dots$
	$\mathbf {x} (n)=\left[{\begin{matrix}x(n)\\x(n-1)\\\vdots \\x(n-p)\end{matrix}}\right]$
	$\alpha (n)=d(n)-\mathbf {x} ^{T}(n)\mathbf {w} (n-1)$
	$\mathbf {g} (n)=\mathbf {P} (n-1)\mathbf {x} (n)\left\{\lambda +\mathbf {x} ^{T}(n)\mathbf {P} (n-1)\mathbf {x} (n)\right\}^{-1}$
	$\mathbf {P} (n)=\lambda ^{-1}\mathbf {P} (n-1)-\mathbf {g} (n)\mathbf {x} ^{T}(n)\lambda ^{-1}\mathbf {P} (n-1)$
	$\mathbf {w} (n)=\mathbf {w} (n-1)+\,\alpha (n)\mathbf {g} (n)$ .

$P$ 的递归依照代数Riccati方程，也类似卡尔曼滤波的结果^[3]。

书目

^ Emannual C. Ifeacor, Barrie W. Jervis. Digital signal processing: a practical approach, second edition. Indianapolis: Pearson Education Limited, 2002, p. 718
^ Steven Van Vaerenbergh, Ignacio Santamaría, Miguel Lázaro-Gredilla "Estimation of the forgetting factor in kernel recursive least squares", 2012 IEEE International Workshop on Machine Learning for Signal Processing, 2012, accessed June 23, 2016.
^ Welch, Greg and Bishop, Gary "An Introduction to the Kalman Filter", Department of Computer Science, University of North Carolina at Chapel Hill, September 17, 1997, accessed July 19, 2011.

Hayes, Monson H. 9.4: Recursive Least Squares. Statistical Digital Signal Processing and Modeling. Wiley. 1996: 541. ISBN 0-471-59431-8.
Simon Haykin, Adaptive Filter Theory, Prentice Hall, 2002, ISBN 0-13-048434-2
M.H.A Davis, R.B. Vinter, Stochastic Modelling and Control, Springer, 1985, ISBN 0-412-16200-8
Weifeng Liu, Jose Principe and Simon Haykin, Kernel Adaptive Filtering: A Comprehensive Introduction, John Wiley, 2010, ISBN 0-470-44753-2
R.L.Plackett, Some Theorems in Least Squares, Biometrika, 1950, 37, 149–157, ISSN 0006-3444
C.F.Gauss, Theoria combinationis observationum erroribus minimis obnoxiae, 1821, Werke, 4. Gottinge

[1] Emannual C. Ifeacor, Barrie W. Jervis. Digital signal processing: a practical approach, second edition. Indianapolis: Pearson Education Limited, 2002, p. 718

[2] Steven Van Vaerenbergh, Ignacio Santamaría, Miguel Lázaro-Gredilla "Estimation of the forgetting factor in kernel recursive least squares", 2012 IEEE International Workshop on Machine Learning for Signal Processing, 2012, accessed June 23, 2016.

[3] Welch, Greg and Bishop, Gary "An Introduction to the Kalman Filter", Department of Computer Science, University of North Carolina at Chapel Hill, September 17, 1997, accessed July 19, 2011.

[1]

[2]

[3]

演进

概念

如何选择λ

递归算法

RLS算法摘要

相关条目

书目