局部加权回归（Locally weighted linear regression）

2019-11-06 07:34:14

字体：大中小

来源：转载

供稿：网友

通常情况下的线性拟合不能很好地预测所有的值，因为它容易导致欠拟合（under fitting），比如数据集是一个钟形的曲线。而多项式拟合能拟合所有数据，但是在预测新样本的时候又会变得很糟糕，因为它导致数据的过拟合（overfitting），不符合数据真实的模型。

今天来讲一种非参数学习方法，叫做局部加权回归（LWR）。为什么局部加权回归叫做非参数学习方法呢？首先参数学习方法是这样一种方法：在训练完成所有数据后得到一系列训练参数，然后根据训练参数来预测新样本的值，这时不再依赖之前的训练数据了，参数值是确定的。而非参数学习方法是这样一种算法：在预测新样本值时候每次都会重新训练数据得到新的参数值，也就是说每次预测新样本都会依赖训练数据集合，所以每次得到的参数值是不确定的。接下来，介绍局部加权回归的原理。

有上面的原理，我们来实践一下，使用python的代码来实现，如下：

#python 3.5.3  蔡军生    #http://edu.csdn.net/course/detail/2592    #  计算加权回归import numpy as npimport randomimport matplotlib.pyplot as pltdef gaussian_kernel(x, x0, c, a=1.0):    """    Gaussian kernel.    :Parameters:      - `x`: nearby datapoint we are looking at.      - `x0`: data point we are trying to estimate.      - `c`, `a`: kernel parameters.    """    # Euclidian distance    diff = x - x0    dot_PRoduct = diff * diff.T    return a * np.exp(dot_product / (-2.0 * c**2))def get_weights(training_inputs, datapoint, c=1.0):    """    Function that calculates weight matrix for a given data point and training    data.    :Parameters:      - `training_inputs`: training data set the weights should be assigned to.      - `datapoint`: data point we are trying to predict.      - `c`: kernel function parameter    :Returns:      NxN weight matrix, there N is the size of the `training_inputs`.    """    x = np.mat(training_inputs)    n_rows = x.shape[0]    # Create diagonal weight matrix from identity matrix    weights = np.mat(np.eye(n_rows))    for i in range(n_rows):        weights[i, i] = gaussian_kernel(datapoint, x[i], c)    return weightsdef lwr_predict(training_inputs, training_outputs, datapoint, c=1.0):    """    Predict a data point by fitting local regression.    :Parameters:      - `training_inputs`: training input data.      - `training_outputs`: training outputs.      - `datapoint`: data point we want to predict.      - `c`: kernel parameter.    :Returns:      Estimated value at `datapoint`.    """    weights = get_weights(training_inputs, datapoint, c=c)    x = np.mat(training_inputs)    y = np.mat(training_outputs).T    xt = x.T * (weights * x)    betas = xt.I * (x.T * (weights * y))    return datapoint * betasdef genData(numPoints, bias, variance):      x = np.zeros(shape=(numPoints, 2))      y = np.zeros(shape=numPoints)      # 构造一条直线左右的点      for i in range(0, numPoints):          # 偏移          x[i][0] = 1          x[i][1] = i          # 目标值          y[i] = bias + i * variance  + random.uniform(0, 1) * 20      return x, y#生成数据a1, a2 = genData(100, 10, 0.6)a3 = []#计算每一点for i in a1:    pdf = lwr_predict(a1, a2, i, 1)    a3.append(pdf.tolist()[0])plt.plot(a1[:,1], a2, "x")     plt.plot(a1[:,1], a3, "r-")   plt.show()  采用C = 1.0的结果图：
采用C = 2.0的结果图：
1. C++标准模板库从入门到精通 
http://edu.csdn.net/course/detail/3324
2.跟老菜鸟学C++
http://edu.csdn.net/course/detail/29013. 跟老菜鸟学python
http://edu.csdn.net/course/detail/25924. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/25905. 在Windows下SVN的版本管理与实战 
 http://edu.csdn.net/course/detail/2579
6.Visual Studio 2015开发C++程序的基本使用 
http://edu.csdn.net/course/detail/2570
7.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/25828.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672