在上篇文章中,我们介绍并实现了基于LS(平方损失函数)的GBRT(梯度提升回归树),本文我们继续介绍其他损失函数的GBRT实现,并实现它。
LAD绝对值损失函数的基本形式为:
绝对值损失函数的负梯度为:
根据梯度提升算法的步骤,在拟合好相应的树之后,需要根据损失函数得到最小损失,对于已经拟合好的树而言,最小损失就是要所有叶子结点的损失和最小,又由于树的叶子结点之间并不相交,所以近似于求解每个子结点上的损失最小:
因此最终叶子结点由该结点上所有样本的残差中位数来替代:
基于LAD损失函数下GBRT算法步骤如下:
Huber损失函数是平方损失(LS)和绝对损失(LAD)的综合,相较于LS,Huber对异常点具有更好的鲁棒性,可以通过自行定义参数δ的值,来区分正常点和异常点进行不同损失的计算。
Huber损失函数的基本形式如下,当参数δ趋近于0时,Huber损失会趋向于绝对值损失;当参数δ趋近于+∞时,Huber损失会趋向于平方损失。
Huber损失函数的负梯度为:
Huber中的参数δ一般用残差绝对值在α分位的值来替代,残差在α分位之后的样本即被视为异常点。
同样地,为了每一轮迭代得到最小的损失,我们也要将叶子结点的值替换如下:
乍一看,叶子节点的更新公式有点乱,这是因为叶子结点中既有正常点也有异常点,公式将两者整合了起来。
基于Huber损失函数下GBRT算法步骤如下:
以上就是GBRT中的另外两种损失函数的算法介绍,下面我们将在先前GBDT代码的基础上,继续实现以上算法。
之前的文章中,我们已经搭建好了GBRT的框架,并实现了LS损失函数相关的类,本次主要实现LAD和Huber损失函数的相关内容。
# 损失函数的基础类 class LossFunction(object): def __init__(self): pass def __call__(self, y, y_prediction): pass def negative_gradient(self, y, y_prediction): pass def update_terminal_regions(self, tree, terminal_samples_nodeid, X, y, y_prediction): """update terminal regions""" if not tree.childNode: terminal_leaf = tree.node_id tree.value = self._update_terminal_node(terminal_samples_nodeid, terminal_leaf, X, y, y_prediction) return for thres, childTree in tree.childNode.items(): self.update_terminal_regions(childTree, terminal_samples_nodeid, X, y, y_prediction) def _update_terminal_node(self, tree, terminal_nodes, terminal_leaf, X, y, y_prediction): pass
class LeastAbsoluteError(LossFunction): """Loss function for least absolute deviation (LAD) regression.""" def __call__(self, y, y_prediction): """Compute the least absolute loss.""" return np.abs(y - y_prediction.ravel()).mean() def negative_gradient(self, y, y_prediction): """Compute the negative gradient.""" return 2 * ((y - y_prediction.ravel()) > 0) - 1 def _update_terminal_node(self, terminal_samples_nodeid, terminal_leaf, X, y, y_prediction): """LS no need to update terminal regions""" terminal_region_index = np.where(terminal_samples_nodeid == terminal_leaf)[0] diff = (y.take(terminal_region_index, axis=0) - y_prediction.take(terminal_region_index, axis=0)) return np.median(diff)
class HuberLossFunction(LossFunction): """Huber Loss Function """ def __init__(self, alpha=0.9): self.alpha = alpha self.gamma = None def __call__(self, y, y_prediction): # 先求出区分异常点的值 diff = y-y_prediction if not self.gamma: gamma = np.percentile(np.abs(diff), self.alpha * 100) else: gamma = self.gamma normal_partern = np.abs(diff) <= gamma square_loss = np.sum(0.5 * diff[normal_partern] ** 2) abs_loss = np.sum(gamma * (np.abs(diff[~normal_partern]) - gamma / 2)) loss = (square_loss + abs_loss) / y.shape[0] return loss def negative_gradient(self, y, y_prediction): """Compute the negative gradient.""" diff = y-y_prediction gamma = np.percentile(np.abs(diff), self.alpha * 100) normal_partern = np.abs(diff) <= gamma negative_gra = np.zeros(y.shape) negative_gra[normal_partern] = diff[normal_partern] negative_gra[~normal_partern] = gamma * np.sign(diff[~normal_partern]) self.gamma = gamma return negative_gra def _update_terminal_node(self, terminal_samples_nodeid, terminal_leaf, X, y, y_prediction): """LS no need to update terminal regions""" gamma = self.gamma terminal_region_index = np.where(terminal_samples_nodeid == terminal_leaf)[0] diff = (y.take(terminal_region_index, axis=0) - y_prediction.take(terminal_region_index, axis=0)) median_diff = np.median(diff) terminal_region_value = median_diff + np.mean( np.sign(diff-median_diff)* np.minimum(gamma, abs(diff-median_diff)) ) return terminal_region_value
因为Huber和LS、LAD损失函数初始化参数不一样,所以在fit中需要进行区分。
def fit(self, X, y): # 创建模型 self.trees_ = [] # 分配相应的损失函数 loss_function = LOSS_FUNCTIONS[self.loss] if self.loss in ("huber"): self.loss_function_ = loss_function(alpha=self.alpha) else: self.loss_function_ = loss_function() # 初始化预测值为0 y_prediction = np.zeros(y.shape) for i in range(self.n_estimators): y_prediction_copy = y_prediction.copy() # 逐棵树进行训练 tree = self._fit_step(X, y, y_prediction_copy) self.trees_.append(tree) # 根据训练结果更新最新的预测函数 y_prediction = y_prediction_copy + self.learning_rate*self.trees_[i].predict(X) print(f'第{i}棵树的Loss:', self.loss_function_(y, y_prediction)) return
目前GBRT的三种损失函数都写好了,使用写好的GBRT和sklearn的GBRT用波士顿房价数据区分训练集和测试集进行测试。
if __name__ == "__main__": # 回归树测试 ## 波士顿房价数据训练 from sklearn.datasets import load_boston from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import train_test_split from sklearn import metrics X, y = load_boston(True) train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3) # LS ## 自编的 gbrt_ls = GBRegressionTree(loss='ls', n_estimators=100) gbrt_ls.fit(train_X, train_y) ypre_ls = gbrt_ls.predict(test_X) ## sklearn的 sk_gbrt_ls = GradientBoostingRegressor(loss='ls', learning_rate=0.1, n_estimators=100, criterion='mse', max_depth=3, min_samples_leaf=1) sk_gbrt_ls.fit(train_X, train_y) ypre_sk_ls = sk_gbrt_ls.predict(test_X) ## 对比效果 r2_ls = metrics.r2_score(test_y, ypre_ls) r2_sk_ls = metrics.r2_score(test_y, ypre_sk_ls) loss_ls = metrics.mean_squared_error(test_y, ypre_ls) loss_sk_ls = metrics.mean_squared_error(test_y, ypre_sk_ls) print("LS损失函数效果对比:
", f"R2:自己写的R2={round(r2_ls, 3)}, sklearn的R2={round(r2_sk_ls, 3)}
", f"MSE:自己写的MSE={round(loss_ls, 3)}, sklearn的MSE={round(loss_sk_ls, 3)}") # LAD ## 自编的 gbrt_lad = GBRegressionTree(loss='lad', n_estimators=100) gbrt_lad.fit(train_X, train_y) ypre_lad = gbrt_lad.predict(test_X) ## sklearn的 sk_gbrt_lad = GradientBoostingRegressor(loss='lad', learning_rate=0.1, n_estimators=100, criterion='mse', max_depth=3, min_samples_leaf=1) sk_gbrt_lad.fit(train_X, train_y) ypre_sk_lad = sk_gbrt_lad.predict(test_X) ## 对比效果 r2_lad = metrics.r2_score(test_y, ypre_lad) r2_sk_lad = metrics.r2_score(test_y, ypre_sk_lad) loss_lad = metrics.mean_squared_error(test_y, ypre_lad) loss_sk_lad = metrics.mean_squared_error(test_y, ypre_sk_lad) print("LAD损失函数效果对比:
", f"R2:自己写的R2={round(r2_lad, 3)}, sklearn的R2={round(r2_sk_lad, 3)}
", f"MSE:自己写的MSE={round(loss_lad, 3)}, sklearn的MSE={round(loss_sk_lad, 3)}") # Huber gbrt_huber = GBRegressionTree(loss='huber', n_estimators=100, alpha=0.8) gbrt_huber.fit(train_X, train_y) ypre_huber = gbrt_huber.predict(test_X) ## sklearn的 sk_gbrt_huber = GradientBoostingRegressor(loss='huber', learning_rate=0.1, n_estimators=100, criterion='mse', max_depth=3, min_samples_leaf=1, alpha=0.8) sk_gbrt_huber.fit(train_X, train_y) ypre_sk_huber = sk_gbrt_huber.predict(test_X) ## 对比效果 r2_huber = metrics.r2_score(test_y, ypre_huber) r2_sk_huber = metrics.r2_score(test_y, ypre_sk_huber) loss_huber = metrics.mean_squared_error(test_y, ypre_huber) loss_sk_huber = metrics.mean_squared_error(test_y, ypre_sk_huber) print("Huber损失函数效果对比:
", f"R2:自己写的R2={round(r2_huber, 3)}, sklearn的R2={round(r2_sk_huber, 3)}
", f"MSE:自己写的MSE={round(loss_huber, 3)}, sklearn的MSE={round(loss_sk_huber, 3)}")
自己写的GBRT和sklearn的结果差不多,结果如下所示:
LS损失函数:
LAD损失函数:
Huber损失函数:
https:// github.com/shoucangjia1 qu/ML_gzh/tree/master/Ensemble
*本文知识点参考了Friedman的《Greedy Function Approximation : A Gradient Boosting Machine》一文。
嗷大豆的数据工厂:机器学习算法推导&实现18——集成模型系列之GBDT1
嗷大豆的数据工厂:机器学习算法推导&实现17——树模型系列之CART算法2
嗷大豆的数据工厂:机器学习算法推导&实现16——树模型系列之CART算法1
嗷大豆的数据工厂:机器学习算法推导&实现15——树模型系列之C4.5算法
嗷大豆的数据工厂:机器学习算法推导&实现14——树模型系列之ID3算法
嗷大豆的数据工厂:机器学习算法推导&实现13——EM算法以及高斯混合聚类模型
嗷大豆的数据工厂:机器学习算法推导&实现12——半朴素贝叶斯分类算法2
嗷大豆的数据工厂:机器学习算法推导&实现11——半朴素贝叶斯分类算法
嗷大豆的数据工厂:机器学习算法推导&实现10——朴素贝叶斯分类算法(补充版)
嗷大豆的数据工厂:机器学习算法推导&手写实现09——朴素贝叶斯分类算法
学无止境,欢迎关注笔者公众号(嗷大豆的数据工厂),互相学习。