参考FedSGD和FedAVG的原始论文《Federated Learning of Deep Networks using Model Averaging》中的一段话:
To apply this approach in the federated setting, we select a C-fraction of clients on each round, and compute the gradient of the loss over all the data held by these clients. Thus, C controls the global batch size, with C = 1 corresponding to full-batch (non-stochastic) gradient descent.2 We refer to this baseline algorithm as FederatedSGD (or FedSGD).
FedSGD:每次采用client的所有数据集进行训练,本地训练次数为1,然后进行aggregation。
C:the fraction of clients that perform computation on each round
每次参与联邦聚合的clients数量占client总数的比例。C=1 代表所有成员参与聚合
B:the local minibatch size used for the client updates.
client的本地的训练的batchsize
E:then number of training passes each client makes over its local dataset on each round
两次联邦训练之间的本地训练的次数
综上所述,C=E=1 ,B=∞时,FedAvg等价于FedSGD,FedSGD是特殊形式的FedAVG,与采用什么优化器没有关系。