Second order approximation of the loss function (Deep learning book, 7.33)Loss and dropout in deep learningderivative of loss functionIncreasing the learning rate on loss function saturationHow exactly to compute Deep Q-Learning Loss Function?Equation 6.3 from “deep learning book”Yolo Loss function explanationLoss convergence in deep learningWhat's the effect of scaling a loss function in deep learning?Yolo v3 loss functionYOLOv3 loss function

How to get a plain text file version of a CP/M .BAS (M-BASIC) program?

What language was spoken in East Asia before Proto-Turkic?

Binary Numbers Magic Trick

How to reduce LED flash rate (frequency)

Can someone publish a story that happened to you?

Do I have to worry about players making “bad” choices on level up?

What happened to Captain America in Endgame?

What are the potential pitfalls when using metals as a currency?

How to verbalise code in Mathematica?

Controversial area of mathematics

Stop and Take a Breath!

Combinable filters

Unexpected email from Yorkshire Bank

Is there a way to get a compiler for the original B programming language?

Is contacting this expert in the field something acceptable or would it be discourteous?

How can the Zone of Truth spell be defeated without the caster knowing?

Contradiction proof for inequality of P and NP?

How did Captain America manage to do this?

Apply MapThread to all but one variable

Pulling the rope with one hand is as heavy as with two hands?

What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?

What is the relationship between spectral sequences and obstruction theory?

Why isn't the definition of absolute value applied when squaring a radical containing a variable?

What is the difference between `command a[bc]d` and `command `ab,cd`



Second order approximation of the loss function (Deep learning book, 7.33)


Loss and dropout in deep learningderivative of loss functionIncreasing the learning rate on loss function saturationHow exactly to compute Deep Q-Learning Loss Function?Equation 6.3 from “deep learning book”Yolo Loss function explanationLoss convergence in deep learningWhat's the effect of scaling a loss function in deep learning?Yolo v3 loss functionYOLOv3 loss function






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








9












$begingroup$


In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



Quadratic approximation of cost function $j$ is given by:



$$hatJ(theta)=J(w^*)+frac12(w-w^*)^TH(w-w^*)$$



where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
$$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac12f''(w)cdotepsilon^2$$










share|cite|improve this question









New contributor




stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    9












    $begingroup$


    In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



    Quadratic approximation of cost function $j$ is given by:



    $$hatJ(theta)=J(w^*)+frac12(w-w^*)^TH(w-w^*)$$



    where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
    $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac12f''(w)cdotepsilon^2$$










    share|cite|improve this question









    New contributor




    stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      9












      9








      9


      1



      $begingroup$


      In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



      Quadratic approximation of cost function $j$ is given by:



      $$hatJ(theta)=J(w^*)+frac12(w-w^*)^TH(w-w^*)$$



      where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
      $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac12f''(w)cdotepsilon^2$$










      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



      Quadratic approximation of cost function $j$ is given by:



      $$hatJ(theta)=J(w^*)+frac12(w-w^*)^TH(w-w^*)$$



      where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
      $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac12f''(w)cdotepsilon^2$$







      neural-networks deep-learning loss-functions derivative






      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question








      edited Apr 24 at 11:34









      Jan Kukacka

      6,08711641




      6,08711641






      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Apr 24 at 10:30









      stevewstevew

      1484




      1484




      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          14












          $begingroup$

          They talk about the weights at optimum:




          We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




          At that point, the first derivative is zero—the middle term is thus left out.






          share|cite|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            stevew is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404781%2fsecond-order-approximation-of-the-loss-function-deep-learning-book-7-33%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            14












            $begingroup$

            They talk about the weights at optimum:




            We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




            At that point, the first derivative is zero—the middle term is thus left out.






            share|cite|improve this answer









            $endgroup$

















              14












              $begingroup$

              They talk about the weights at optimum:




              We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




              At that point, the first derivative is zero—the middle term is thus left out.






              share|cite|improve this answer









              $endgroup$















                14












                14








                14





                $begingroup$

                They talk about the weights at optimum:




                We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




                At that point, the first derivative is zero—the middle term is thus left out.






                share|cite|improve this answer









                $endgroup$



                They talk about the weights at optimum:




                We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




                At that point, the first derivative is zero—the middle term is thus left out.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Apr 24 at 10:34









                Jan KukackaJan Kukacka

                6,08711641




                6,08711641




















                    stevew is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    stevew is a new contributor. Be nice, and check out our Code of Conduct.












                    stevew is a new contributor. Be nice, and check out our Code of Conduct.











                    stevew is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404781%2fsecond-order-approximation-of-the-loss-function-deep-learning-book-7-33%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

                    Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                    Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form