DL之DNN:自定义MultiLayerNet(5*100+ReLU+SGD/Momentum/AdaGrad/Adam四种最优化)对MNIST数据集训练进而比较不同方法的性能
目录
- ===========iteration:0===========
- SGD:2.289282108880558
- Momentum:2.2858501933777964
- AdaGrad:2.135969407893337
- Adam:2.2214629551644443
- ===========iteration:100===========
- SGD:1.549948593098733
- Momentum:0.2630614409487161
- AdaGrad:0.1280980906681204
- Adam:0.21268580798960957
- ===========iteration:200===========
- SGD:0.7668413651485669
- Momentum:0.19974263379725932
- AdaGrad:0.0688320187945635
- Adam:0.12737004371824456
- ===========iteration:300===========
- SGD:0.46630711328743457
- Momentum:0.17680542175883507
- AdaGrad:0.0580940990397764
- Adam:0.12930303058268838
- ===========iteration:400===========
- SGD:0.34526365067568743
- Momentum:0.08914404106297127
- AdaGrad:0.038093353912494965
- Adam:0.06415424083978832
- ===========iteration:500===========
- SGD:0.3588584559967853
- Momentum:0.1299949652623088
- AdaGrad:0.040978421988412894
- Adam:0.058780880102566074
- ===========iteration:600===========
- SGD:0.38273120367667224
- Momentum:0.14074766142608885
- AdaGrad:0.08641723451090685
- Adam:0.11339321858037713
- ===========iteration:700===========
- SGD:0.381094901742027
- Momentum:0.1566582072807326
- AdaGrad:0.08844650332208387
- Adam:0.10485802139218811
- ===========iteration:800===========
- SGD:0.25722603754213674
- Momentum:0.07897119725740888
- AdaGrad:0.04960128385990466
- Adam:0.0835996553542796
- ===========iteration:900===========
- SGD:0.33273148769731326
- Momentum:0.19612162874621766
- AdaGrad:0.03441995281224886
- Adam:0.12248261979926914
- ===========iteration:1000===========
- SGD:0.26394416793465253
- Momentum:0.10157776537129978
- AdaGrad:0.04761303979039287
- Adam:0.046994040537976525
- ===========iteration:1100===========
- SGD:0.23894569840123672
- Momentum:0.09093030644899333
- AdaGrad:0.07018006635107976
- Adam:0.07879622117292093
- ===========iteration:1200===========
- SGD:0.24382935069334477
- Momentum:0.08324889705863456
- AdaGrad:0.04484659272127939
- Adam:0.0719509559060747
- ===========iteration:1300===========
- SGD:0.21307958354960485
- Momentum:0.07030166296163001
- AdaGrad:0.022552468995955182
- Adam:0.049860815437560935
- ===========iteration:1400===========
- SGD:0.3110486414209358
- Momentum:0.13117004626934742
- AdaGrad:0.07351569965620054
- Adam:0.09723751626189574
- ===========iteration:1500===========
- SGD:0.2087589466947655
- Momentum:0.09088929766254576
- AdaGrad:0.027825434320282873
- Adam:0.06352715244823183
- ===========iteration:1600===========
- SGD:0.12783635178644553
- Momentum:0.053366262737818
- AdaGrad:0.012093087503155344
- Adam:0.021385013278486315
- ===========iteration:1700===========
- SGD:0.21476134194349975
- Momentum:0.08453161462373757
- AdaGrad:0.054955557126319256
- Adam:0.035257261368372185
- ===========iteration:1800===========
- SGD:0.3415964018415049
- Momentum:0.13866704706781385
- AdaGrad:0.04585298765046911
- Adam:0.06437669858445684
- ===========iteration:1900===========
- SGD:0.13530674587479818
- Momentum:0.03958142222010819
- AdaGrad:0.019096102635470277
- Adam:0.02185864115092371
- T1、SGD算法
- class SGD:
- '……'
- def update(self, params, grads):
- for key in params.keys():
- params[key] -= self.lr * grads[key]
-
- T2、Momentum算法
- import numpy as np
- class Momentum:
- '……'
-
- def update(self, params, grads):
- if self.v is None:
- self.v = {}
- for key, val in params.items():
- self.v[key] = np.zeros_like(val)
- for key in params.keys():
- self.v[key] = self.momentum*self.v[key] - self.lr*grads[key]
- params[key] += self.v[key]
-
- T3、AdaGrad算法
- '……'
-
- def update(self, params, grads):
- if self.h is None:
- self.h = {}
- for key, val in params.items():
- self.h[key] = np.zeros_like(val)
- for key in params.keys():
- self.h[key] += grads[key] * grads[key]
- params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)
-
-
- T4、Adam算法
- '……'
-
- def update(self, params, grads):
- if self.m is None:
- self.m, self.v = {}, {}
- for key, val in params.items():
- self.m[key] = np.zeros_like(val)
- self.v[key] = np.zeros_like(val)
- self.iter += 1
- lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)
-
- for key in params.keys():
- self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
- self.v[key] += (1 - self.beta2) * (grads[key]**2 - self.v[key])
-
- params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)
-
-
-
- networks = {}
- train_loss = {}
- for key in optimizers.keys():
- networks[key] = MultiLayerNet( input_size=784, hidden_size_list=[10, 10, 10, 10], output_size=10)
- train_loss[key] = []
-
-
-
- for i in range(max_iterations):
- batch_mask = np.random.choice(train_size, batch_size)
- x_batch = x_train[batch_mask]
- t_batch = t_train[batch_mask]
-
- for key in optimizers.keys():
- grads = networks[key].gradient(x_batch, t_batch)
- optimizers[key].update(networks[key].params, grads)
- loss = networks[key].loss(x_batch, t_batch)
- train_loss[key].append(loss)
-
- if i % 100 == 0:
- print( "===========" + "iteration:" + str(i) + "===========")
- for key in optimizers.keys():
- loss = networks[key].loss(x_batch, t_batch)
- print(key + ":" + str(loss))
-
相关文章
DL之DNN:自定义五层DNN(5*100+ReLU+SGD/Momentum/AdaGrad/Adam四种最优化)对MNIST数据集训练进而比较不同方法的性能
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
加入交流群
请使用微信扫一扫!