DL之DNN优化技术:自定义MultiLayerNet【5*100+ReLU】对MNIST数据集训练进而比较三种权重初始值(Xavier参数初始化、He参数初始化)性能差异
导读
思路:观察不同的权重初始值(std=0.01、Xavier初始值、He初始值)的赋值进行实验,会在多大程度上影响神经网络的学习。
结论:std=0.01时完全无法进行学习,是因为正向传播中传递的值很小(集中在0附近的数据)。因此,逆向传播时求到的梯度也很小,权重几乎不进行更新。相反,当权重初始值为Xavier初始值和He初始值时,学习进行得很顺利。并且,我们发现He初始值时的学习进度更快一些。
总结:在神经网络的学习中,权重初始值非常重要。很多时候权重初始值的设定关系到神经网络的学习能否成功。权重初始值的重要性容易被忽视,而任何事情的开始(初始值)总是关键的。
目录
- ===========iteration:0===========
- std=0.01:2.302533896615576
- Xavier:2.301592862642649
- He:2.452819600404312
- ===========iteration:100===========
- std=0.01:2.3021427450183882
- Xavier:2.2492771742332085
- He:1.614645290697084
- ===========iteration:200===========
- std=0.01:2.3019226530108763
- Xavier:2.142875264754691
- He:0.8883226546097108
- ===========iteration:300===========
- std=0.01:2.3021797231413514
- Xavier:1.801154569414849
- He:0.5779849031641334
- ===========iteration:400===========
- std=0.01:2.3012695247928474
- Xavier:1.3899007227604079
- He:0.41014765063844627
- ===========iteration:500===========
- std=0.01:2.3007728429528314
- Xavier:0.9069490262118367
- He:0.33691702821838565
- ===========iteration:600===========
- std=0.01:2.298961977446477
- Xavier:0.7562167106493611
- He:0.3818234934485747
- ===========iteration:700===========
- std=0.01:2.3035037771527715
- Xavier:0.5636724725221689
- He:0.21607562992114449
- ===========iteration:800===========
- std=0.01:2.3034607224422023
- Xavier:0.5658840865099287
- He:0.33168882912900743
- ===========iteration:900===========
- std=0.01:2.305051548224051
- Xavier:0.588201820904584
- He:0.2569635828759095
- ===========iteration:1000===========
- std=0.01:2.2994594023429755
- Xavier:0.4185962336886156
- He:0.20020701131406038
- ===========iteration:1100===========
- std=0.01:2.2981894831572904
- Xavier:0.3963740567004913
- He:0.25746657996551603
- ===========iteration:1200===========
- std=0.01:2.2953607843932193
- Xavier:0.41330568558866765
- He:0.2796398422265146
- ===========iteration:1300===========
- std=0.01:2.2964967978545396
- Xavier:0.39618376387851506
- He:0.2782019670206384
- ===========iteration:1400===========
- std=0.01:2.299861702734514
- Xavier:0.24832216447348573
- He:0.1512273585162205
- ===========iteration:1500===========
- std=0.01:2.3006214773891234
- Xavier:0.3596899255315174
- He:0.2719352219860638
- ===========iteration:1600===========
- std=0.01:2.298109767745866
- Xavier:0.35977950572647455
- He:0.2650267112104039
- ===========iteration:1700===========
- std=0.01:2.301979953517381
- Xavier:0.23664052932406424
- He:0.13415720105707601
- ===========iteration:1800===========
- std=0.01:2.299083895357553
- Xavier:0.2483172887982285
- He:0.14187181238369628
- ===========iteration:1900===========
- std=0.01:2.305385198129093
- Xavier:0.3655424067819445
- He:0.21497438379944553
- class MultiLayerNet:
- '……'
-
- def predict(self, x):
- for layer in self.layers.values():
- x = layer.forward(x)
-
- return x
-
- def loss(self, x, t):
- y = self.predict(x)
-
- weight_decay = 0
- for idx in range(1, self.hidden_layer_num + 2):
- W = self.params['W' + str(idx)]
- weight_decay += 0.5 * self.weight_decay_lambda * np.sum(W ** 2)
-
- return self.last_layer.forward(y, t) + weight_decay
-
- def accuracy(self, x, t):
- y = self.predict(x)
- y = np.argmax(y, axis=1)
- if t.ndim != 1 :
- t = np.argmax(t, axis=1)
-
- accuracy = np.sum(y == t) / float(x.shape[0]) 计算accuracy并返回
- return accuracy
-
- def numerical_gradient(self, x, t): T1、numerical_gradient()函数:数值微分法求梯度
- loss_W = lambda W: self.loss(x, t)
-
- grads = {}
- for idx in range(1, self.hidden_layer_num+2):
- grads['W' + str(idx)] = numerical_gradient(loss_W, self.params['W' + str(idx)])
- grads['b' + str(idx)] = numerical_gradient(loss_W, self.params['b' + str(idx)])
-
- return grads
-
- def gradient(self, x, t):
- self.loss(x, t)
-
- dout = 1
- dout = self.last_layer.backward(dout)
-
- layers = list(self.layers.values())
- layers.reverse()
- for layer in layers:
- dout = layer.backward(dout)
-
- grads = {}
- for idx in range(1, self.hidden_layer_num+2):
- grads['W' + str(idx)] = self.layers['Affine' + str(idx)].dW + self.weight_decay_lambda * self.layers['Affine' + str(idx)].W
- grads['b' + str(idx)] = self.layers['Affine' + str(idx)].db
-
- return grads
-
-
-
-
- networks = {}
- train_loss = {}
- for key, weight_type in weight_init_types.items():
- networks[key] = MultiLayerNet(input_size=784, hidden_size_list=[100, 100, 100, 100],
- output_size=10, weight_init_std=weight_type)
- train_loss[key] = []
-
-
- for i in range(max_iterations):
- 定义x_batch、t_batch
- batch_mask = np.random.choice(train_size, batch_size)
- x_batch = x_train[batch_mask]
- t_batch = t_train[batch_mask]
-
- for key in weight_init_types.keys():
- grads = networks[key].gradient(x_batch, t_batch)
- optimizer.update(networks[key].params, grads)
-
- loss = networks[key].loss(x_batch, t_batch)
- train_loss[key].append(loss)
-
- if i % 100 == 0:
- print("===========" + "iteration:" + str(i) + "===========")
- for key in weight_init_types.keys():
- loss = networks[key].loss(x_batch, t_batch)
- print(key + ":" + str(loss))
相关文章
DL之DNN:自定义MultiLayerNet【5*100+ReLU】对MNIST数据集训练进而比较三种权重初始值性能差异
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。
添加我为好友,拉您入交流群!
请使用微信扫一扫!