How to determine the optimal threshold to achieve the highest accuracyWhy is accuracy not the best measure for assessing classification models?Classification probability thresholdIs accuracy an improper scoring rule in a binary classification setting?How to find the best input value for this simple problem?How do I deal with datasets that have many values out of range / over threshold?Threshold in precision/recall curveFinding the optimal threshold parameterWhat is F1 Optimal Threshold? How to calculate it?Do I do threshold selection for my logit model on the testing or training subset?Training threshold vs validation threshold for better prediction results?Decision rule for Bayesian variable selectionStatistically prove classification accuracy is acceptableGeneral rule uniform distributed classes

How do we explain the E major chord in this progression?

A planet illuminated by a black hole?

What are the exact meanings of roll, pitch and yaw?

How can I stop myself from micromanaging other PCs' actions?

How is the uk visa 180 calculated

Why can't my huge trees be chopped down?

What is the max number of outlets on a GFCI circuit?

Where to place an artificial gland in the human body?

Does the Intel 8086 CPU have user mode and kernel mode?

How to write a sincerely religious protagonist without preaching or affirming or judging their worldview?

Iterate over non-const variables in C++

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

Commercial jet accompanied by small plane near Seattle

Why are off grid solar setups only 12, 24, 48 VDC?

Does academia have a lazy work culture?

What should I say when a company asks you why someone (a friend) who was fired left?

Is my employer paying me fairly? Going from 1099 to W2

Why are so many countries still in the Commonwealth?

Which Roman general was killed by his own soldiers for not letting them to loot a newly conquered city?

What is the lowest-speed bogey a jet fighter can intercept/escort?

What is the difference between 1/3, 1/2, and full casters?

Spin vs orbital angular momenta in QFT

Explanation for a joke about a three-legged dog that walks into a bar

How can I prevent corporations from growing their own workforce?



How to determine the optimal threshold to achieve the highest accuracy


Why is accuracy not the best measure for assessing classification models?Classification probability thresholdIs accuracy an improper scoring rule in a binary classification setting?How to find the best input value for this simple problem?How do I deal with datasets that have many values out of range / over threshold?Threshold in precision/recall curveFinding the optimal threshold parameterWhat is F1 Optimal Threshold? How to calculate it?Do I do threshold selection for my logit model on the testing or training subset?Training threshold vs validation threshold for better prediction results?Decision rule for Bayesian variable selectionStatistically prove classification accuracy is acceptableGeneral rule uniform distributed classes






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


I have a list of probabilities outputted by a classifier on a balanced dataset. The metric I want to maximize is accuracy ($fracTP+TNP+N$). Is there a way to calculate the best threshold (without iterating over many threshold values an selecting the best one), given the probabilities and their true labels.










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
    $endgroup$
    – Stephan Kolassa
    Jul 16 at 11:59

















3












$begingroup$


I have a list of probabilities outputted by a classifier on a balanced dataset. The metric I want to maximize is accuracy ($fracTP+TNP+N$). Is there a way to calculate the best threshold (without iterating over many threshold values an selecting the best one), given the probabilities and their true labels.










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
    $endgroup$
    – Stephan Kolassa
    Jul 16 at 11:59













3












3








3





$begingroup$


I have a list of probabilities outputted by a classifier on a balanced dataset. The metric I want to maximize is accuracy ($fracTP+TNP+N$). Is there a way to calculate the best threshold (without iterating over many threshold values an selecting the best one), given the probabilities and their true labels.










share|cite|improve this question









$endgroup$




I have a list of probabilities outputted by a classifier on a balanced dataset. The metric I want to maximize is accuracy ($fracTP+TNP+N$). Is there a way to calculate the best threshold (without iterating over many threshold values an selecting the best one), given the probabilities and their true labels.







optimization threshold






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jul 16 at 11:51









ShakShak

183 bronze badges




183 bronze badges







  • 2




    $begingroup$
    Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
    $endgroup$
    – Stephan Kolassa
    Jul 16 at 11:59












  • 2




    $begingroup$
    Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
    $endgroup$
    – Stephan Kolassa
    Jul 16 at 11:59







2




2




$begingroup$
Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
$endgroup$
– Stephan Kolassa
Jul 16 at 11:59




$begingroup$
Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold. That said, it's an interesting theoretical question.
$endgroup$
– Stephan Kolassa
Jul 16 at 11:59










2 Answers
2






active

oldest

votes


















6












$begingroup$

I suspect that the answer is "no", i.e., that there is no such way.



Here is an illustration, where we plot the predicted probabilities against the true labels:



accuracy



Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.



Your best bet is likely a bisection search.



That said, I recommend you look at



  • Why is accuracy not the best measure for assessing classification models?

  • Is accuracy an improper scoring rule in a binary classification setting?

  • Classification probability threshold





share|cite|improve this answer









$endgroup$








  • 1




    $begingroup$
    Thank you, the graphical explanation is really good.
    $endgroup$
    – Shak
    Jul 16 at 12:25


















4












$begingroup$

Agreeing to @StephanKolassa, I'll just look from an algorithmic perspective. You'll need to sort your samples with respect to produced probabilities, which is $O(nlog n)$, if you've $n$ data samples. Then, your true class labels will order like
$$0 0 1 0 0 1 ... 1 1 0 1 $$
Then, we'll put a separator $|$ at some position in this array; this'll represent your threshold. At most there are $n+1$ positions to put it. Even if you calculate the accuracy for each of these positions, you won't be worse than the sorting complexity. After getting the maximum accuracy, the threshold may just be chosen as the average of the neighboring samples.






share|cite|improve this answer









$endgroup$















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417660%2fhow-to-determine-the-optimal-threshold-to-achieve-the-highest-accuracy%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6












    $begingroup$

    I suspect that the answer is "no", i.e., that there is no such way.



    Here is an illustration, where we plot the predicted probabilities against the true labels:



    accuracy



    Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.



    Your best bet is likely a bisection search.



    That said, I recommend you look at



    • Why is accuracy not the best measure for assessing classification models?

    • Is accuracy an improper scoring rule in a binary classification setting?

    • Classification probability threshold





    share|cite|improve this answer









    $endgroup$








    • 1




      $begingroup$
      Thank you, the graphical explanation is really good.
      $endgroup$
      – Shak
      Jul 16 at 12:25















    6












    $begingroup$

    I suspect that the answer is "no", i.e., that there is no such way.



    Here is an illustration, where we plot the predicted probabilities against the true labels:



    accuracy



    Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.



    Your best bet is likely a bisection search.



    That said, I recommend you look at



    • Why is accuracy not the best measure for assessing classification models?

    • Is accuracy an improper scoring rule in a binary classification setting?

    • Classification probability threshold





    share|cite|improve this answer









    $endgroup$








    • 1




      $begingroup$
      Thank you, the graphical explanation is really good.
      $endgroup$
      – Shak
      Jul 16 at 12:25













    6












    6








    6





    $begingroup$

    I suspect that the answer is "no", i.e., that there is no such way.



    Here is an illustration, where we plot the predicted probabilities against the true labels:



    accuracy



    Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.



    Your best bet is likely a bisection search.



    That said, I recommend you look at



    • Why is accuracy not the best measure for assessing classification models?

    • Is accuracy an improper scoring rule in a binary classification setting?

    • Classification probability threshold





    share|cite|improve this answer









    $endgroup$



    I suspect that the answer is "no", i.e., that there is no such way.



    Here is an illustration, where we plot the predicted probabilities against the true labels:



    accuracy



    Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.



    Your best bet is likely a bisection search.



    That said, I recommend you look at



    • Why is accuracy not the best measure for assessing classification models?

    • Is accuracy an improper scoring rule in a binary classification setting?

    • Classification probability threshold






    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered Jul 16 at 12:14









    Stephan KolassaStephan Kolassa

    53.3k9 gold badges105 silver badges199 bronze badges




    53.3k9 gold badges105 silver badges199 bronze badges







    • 1




      $begingroup$
      Thank you, the graphical explanation is really good.
      $endgroup$
      – Shak
      Jul 16 at 12:25












    • 1




      $begingroup$
      Thank you, the graphical explanation is really good.
      $endgroup$
      – Shak
      Jul 16 at 12:25







    1




    1




    $begingroup$
    Thank you, the graphical explanation is really good.
    $endgroup$
    – Shak
    Jul 16 at 12:25




    $begingroup$
    Thank you, the graphical explanation is really good.
    $endgroup$
    – Shak
    Jul 16 at 12:25













    4












    $begingroup$

    Agreeing to @StephanKolassa, I'll just look from an algorithmic perspective. You'll need to sort your samples with respect to produced probabilities, which is $O(nlog n)$, if you've $n$ data samples. Then, your true class labels will order like
    $$0 0 1 0 0 1 ... 1 1 0 1 $$
    Then, we'll put a separator $|$ at some position in this array; this'll represent your threshold. At most there are $n+1$ positions to put it. Even if you calculate the accuracy for each of these positions, you won't be worse than the sorting complexity. After getting the maximum accuracy, the threshold may just be chosen as the average of the neighboring samples.






    share|cite|improve this answer









    $endgroup$

















      4












      $begingroup$

      Agreeing to @StephanKolassa, I'll just look from an algorithmic perspective. You'll need to sort your samples with respect to produced probabilities, which is $O(nlog n)$, if you've $n$ data samples. Then, your true class labels will order like
      $$0 0 1 0 0 1 ... 1 1 0 1 $$
      Then, we'll put a separator $|$ at some position in this array; this'll represent your threshold. At most there are $n+1$ positions to put it. Even if you calculate the accuracy for each of these positions, you won't be worse than the sorting complexity. After getting the maximum accuracy, the threshold may just be chosen as the average of the neighboring samples.






      share|cite|improve this answer









      $endgroup$















        4












        4








        4





        $begingroup$

        Agreeing to @StephanKolassa, I'll just look from an algorithmic perspective. You'll need to sort your samples with respect to produced probabilities, which is $O(nlog n)$, if you've $n$ data samples. Then, your true class labels will order like
        $$0 0 1 0 0 1 ... 1 1 0 1 $$
        Then, we'll put a separator $|$ at some position in this array; this'll represent your threshold. At most there are $n+1$ positions to put it. Even if you calculate the accuracy for each of these positions, you won't be worse than the sorting complexity. After getting the maximum accuracy, the threshold may just be chosen as the average of the neighboring samples.






        share|cite|improve this answer









        $endgroup$



        Agreeing to @StephanKolassa, I'll just look from an algorithmic perspective. You'll need to sort your samples with respect to produced probabilities, which is $O(nlog n)$, if you've $n$ data samples. Then, your true class labels will order like
        $$0 0 1 0 0 1 ... 1 1 0 1 $$
        Then, we'll put a separator $|$ at some position in this array; this'll represent your threshold. At most there are $n+1$ positions to put it. Even if you calculate the accuracy for each of these positions, you won't be worse than the sorting complexity. After getting the maximum accuracy, the threshold may just be chosen as the average of the neighboring samples.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jul 16 at 12:11









        gunesgunes

        12.1k1 gold badge5 silver badges22 bronze badges




        12.1k1 gold badge5 silver badges22 bronze badges



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417660%2fhow-to-determine-the-optimal-threshold-to-achieve-the-highest-accuracy%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form