What loss function to use when labels are probabilities? Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameExtend the loss function from the single action to the n-action case per time stepHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorchGradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?How to obtain a formula for loss, when given an iterative update rule in gradient descent?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

Should I discuss the type of campaign with my players?

illegal generic type for instanceof when using local classes

Can a non-EU citizen traveling with me come with me through the EU passport line?

What's the meaning of 間時肆拾貳 at a car parking sign

Why are there no cargo aircraft with "flying wing" design?

What is Arya's weapon design?

Is pollution the main cause of Notre Dame Cathedral's deterioration?

51k Euros annually for a family of 4 in Berlin: Is it enough?

At the end of Thor: Ragnarok why don't the Asgardians turn and head for the Bifrost as per their original plan?

What exactly is a "Meth" in Altered Carbon?

3 doors, three guards, one stone

Selecting the same column from Different rows Based on Different Criteria

Generate an RGB colour grid

How widely used is the term Treppenwitz? Is it something that most Germans know?

Storing hydrofluoric acid before the invention of plastics

How to tell that you are a giant?

What is the role of the transistor and diode in a soft start circuit?

Do I really need recursive chmod to restrict access to a folder?

What does this icon in iOS Stardew Valley mean?

Can an alien society believe that their star system is the universe?

Echoing a tail command produces unexpected output?

Sci-Fi book where patients in a coma ward all live in a subconscious world linked together

In predicate logic, does existential quantification (∃) include universal quantification (∀), i.e. can 'some' imply 'all'?

What's the purpose of writing one's academic biography in the third person?



What loss function to use when labels are probabilities?



Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameExtend the loss function from the single action to the n-action case per time stepHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorchGradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?How to obtain a formula for loss, when given an iterative update rule in gradient descent?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



Would something like MSE (after applying softmax) make sense, or is there a better loss function?










share|improve this question









New contributor




Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    4












    $begingroup$


    What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



    It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



    Would something like MSE (after applying softmax) make sense, or is there a better loss function?










    share|improve this question









    New contributor




    Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      4












      4








      4





      $begingroup$


      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?










      share|improve this question









      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?







      neural-networks machine-learning loss-functions probability-distribution






      share|improve this question









      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited yesterday









      nbro

      2,4091726




      2,4091726






      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      Thomas JohnsonThomas Johnson

      1233




      1233




      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          6












          $begingroup$

          Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



          You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



          $$H(p,q)=-sum_xin X p(x) log q(x).$$
          $ $



          Note that one-hot labels would mean that
          $$
          p(x) =
          begincases
          1 & textif x text is the true label\
          0 & textotherwise
          endcases$$



          which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



          $$H(p,q) = -log q(x_label)$$






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6












            $begingroup$

            Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



            You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



            $$H(p,q)=-sum_xin X p(x) log q(x).$$
            $ $



            Note that one-hot labels would mean that
            $$
            p(x) =
            begincases
            1 & textif x text is the true label\
            0 & textotherwise
            endcases$$



            which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



            $$H(p,q) = -log q(x_label)$$






            share|improve this answer









            $endgroup$

















              6












              $begingroup$

              Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



              You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



              $$H(p,q)=-sum_xin X p(x) log q(x).$$
              $ $



              Note that one-hot labels would mean that
              $$
              p(x) =
              begincases
              1 & textif x text is the true label\
              0 & textotherwise
              endcases$$



              which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



              $$H(p,q) = -log q(x_label)$$






              share|improve this answer









              $endgroup$















                6












                6








                6





                $begingroup$

                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$






                share|improve this answer









                $endgroup$



                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 2 days ago









                Philip RaeisghasemPhilip Raeisghasem

                1,085119




                1,085119




















                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.












                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.











                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                    Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                    Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림