Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》


chiyitouniu
想吃一头牛 2022-09-19 17:26:29 52917
分类专栏: 资讯

Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

目录

原文解读

文章内容以及划重点

结论


原文解读

原文:Understanding the difficulty of training deep feedforward neural networks

文章内容以及划重点

Sigmoid的四层局限


sigmoid函数的test loss和training loss要经过很多轮数一直为0.5,后再有到0.1的差强人意的变化。

     We hypothesize that this behavior is due to the combinationof random initialization and the fact that an hidden unitoutput of 0 corresponds to a saturated sigmoid. Note that deep networks with sigmoids but initialized from unsupervisedpre-training (e.g. from RBMs) do not suffer fromthis saturation behavior.

tanh、softsign的五层局限



换为tanh函数,就会很好很快的收敛

结论

1、The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network. We call it the normalized initialization


2、结果可知分布更加均匀

     Activation values normalized histograms with  hyperbolic tangent activation, with standard (top) vs normalized  initialization (bottom). Top: 0-peak increases for  higher layers.
       Several conclusions can be drawn from these error curves:  
(1)、The more classical neural networks with sigmoid or  hyperbolic tangent units and standard initialization  fare rather poorly, converging more slowly and apparently  towards ultimately poorer local minima. 
(2)、The softsign networks seem to be more robust to the  initialization procedure than the tanh networks, presumably  because of their gentler non-linearity. 
(3)、For tanh networks, the proposed normalized initialization  can be quite helpful, presumably because the  layer-to-layer transformations maintain magnitudes of activations (flowing upward) and gradients (flowing backward).
3、Sigmoid 5代表有5层,N代表正则化,可得出预训练会得到更小的误差




相关文章
Understanding the difficulty of training deep feedforward neural networks 本文作者为:Xavier Glorot与Yoshua Bengio。

网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。

本文链接:https://www.xckfsq.com/news/show.html?id=3845
赞同 0
评论 0 条
想吃一头牛L0
粉丝 0 发表 6 + 关注 私信
上周热门
如何使用 StarRocks 管理和优化数据湖中的数据?  2944
【软件正版化】软件正版化工作要点  2864
统信UOS试玩黑神话:悟空  2823
信刻光盘安全隔离与信息交换系统  2718
镜舟科技与中启乘数科技达成战略合作,共筑数据服务新生态  1251
grub引导程序无法找到指定设备和分区  1217
华为全联接大会2024丨软通动力分论坛精彩议程抢先看!  164
点击报名 | 京东2025校招进校行程预告  163
2024海洋能源产业融合发展论坛暨博览会同期活动-海洋能源与数字化智能化论坛成功举办  161
华为纯血鸿蒙正式版9月底见!但Mate 70的内情还得接着挖...  157
本周热议
我的信创开放社区兼职赚钱历程 40
今天你签到了吗? 27
信创开放社区邀请他人注册的具体步骤如下 15
如何玩转信创开放社区—从小白进阶到专家 15
方德桌面操作系统 14
我有15积分有什么用? 13
用抖音玩法闯信创开放社区——用平台宣传企业产品服务 13
如何让你先人一步获得悬赏问题信息?(创作者必看) 12
2024中国信创产业发展大会暨中国信息科技创新与应用博览会 9
中央国家机关政府采购中心:应当将CPU、操作系统符合安全可靠测评要求纳入采购需求 8

加入交流群

请使用微信扫一扫!