What's the point of the test set? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPre-processing (center, scale, impute) among training sets (different forms) and the test set - what is a good approach?Machine learning for Point Clouds Lidar dataHow to model user's buying behavior on Amazon?What's the best way to rank aggregate imdb rating data?How can l get 50 % examples in training set and 50% in test set for each class when splitting data?Is it correct to use non-target values of test set to engineer new features for train set?Data set with multiple tablesSub-sampling so that sample statistics match population statisticsData set descriptions for frequent item-set mining data sethow to check the distribution of the training set and testing set are similar

Does GDPR cover the collection of data by websites that crawl the web and resell user data

When speaking, how do you change your mind mid-sentence?

Why do people think Winterfell crypts is the safest place for women, children & old people?

Reflections in a Square

Weaponising the Grasp-at-a-Distance spell

How to leave only the following strings?

false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'

How is an IPA symbol that lacks a name (e.g. ɲ) called?

What is the definining line between a helicopter and a drone a person can ride in?

Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?

Who can become a wight?

Is Bran literally the world's memory?

Kepler's 3rd law: ratios don't fit data

Is my guitar’s action too high?

What is the difference between 准时 and 按时?

Compiling and throwing simple dynamic exceptions at runtime for JVM

Who's this lady in the war room?

What could prevent concentrated local exploration?

Assertions In A Mock Callout Test

Do chord progressions usually move by fifths?

lm and glm function in R

Can I take recommendation from someone I met at a conference?

Why did Europeans not widely domesticate foxes?

What is the evidence that custom checks in Northern Ireland are going to result in violence?



What's the point of the test set?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsPre-processing (center, scale, impute) among training sets (different forms) and the test set - what is a good approach?Machine learning for Point Clouds Lidar dataHow to model user's buying behavior on Amazon?What's the best way to rank aggregate imdb rating data?How can l get 50 % examples in training set and 50% in test set for each class when splitting data?Is it correct to use non-target values of test set to engineer new features for train set?Data set with multiple tablesSub-sampling so that sample statistics match population statisticsData set descriptions for frequent item-set mining data sethow to check the distribution of the training set and testing set are similar










3












$begingroup$


I get the point of a validation and training set, but the importance of a test set doesn't click for me.



Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.



After you've decided you have a model you're proud of, you do a final sanity check on the test set, and let's say the performance is trash. Are you really going to start all over? What decision making does it inform? In my workplace, the way timelines are structured, there's no time to start over.










share|improve this question









New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    The test set is so that you don't cheat.
    $endgroup$
    – Stephen Rauch
    Apr 19 at 22:11















3












$begingroup$


I get the point of a validation and training set, but the importance of a test set doesn't click for me.



Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.



After you've decided you have a model you're proud of, you do a final sanity check on the test set, and let's say the performance is trash. Are you really going to start all over? What decision making does it inform? In my workplace, the way timelines are structured, there's no time to start over.










share|improve this question









New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    The test set is so that you don't cheat.
    $endgroup$
    – Stephen Rauch
    Apr 19 at 22:11













3












3








3


1



$begingroup$


I get the point of a validation and training set, but the importance of a test set doesn't click for me.



Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.



After you've decided you have a model you're proud of, you do a final sanity check on the test set, and let's say the performance is trash. Are you really going to start all over? What decision making does it inform? In my workplace, the way timelines are structured, there's no time to start over.










share|improve this question









New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I get the point of a validation and training set, but the importance of a test set doesn't click for me.



Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.



After you've decided you have a model you're proud of, you do a final sanity check on the test set, and let's say the performance is trash. Are you really going to start all over? What decision making does it inform? In my workplace, the way timelines are structured, there's no time to start over.







dataset






share|improve this question









New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago







Nick Corona













New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Apr 19 at 21:08









Nick CoronaNick Corona

285




285




New contributor




Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    The test set is so that you don't cheat.
    $endgroup$
    – Stephen Rauch
    Apr 19 at 22:11
















  • $begingroup$
    The test set is so that you don't cheat.
    $endgroup$
    – Stephen Rauch
    Apr 19 at 22:11















$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch
Apr 19 at 22:11




$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch
Apr 19 at 22:11










3 Answers
3






active

oldest

votes


















5












$begingroup$

The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.



Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)



You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).



In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?






share|improve this answer










New contributor




aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$




















    3












    $begingroup$

    I like your question, it is somewhat philosophical in nature.



    We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.



    In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.



    As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.



    In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.



    P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)






    share|improve this answer









    $endgroup$












    • $begingroup$
      Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
      $endgroup$
      – aranglol
      Apr 19 at 23:33







    • 1




      $begingroup$
      @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
      $endgroup$
      – Esmailian
      2 days ago



















    1












    $begingroup$

    So, I've gathered from the good responses here that the point of a test set is to:



    • discourage cheating

    • spot data leakage

    • avoid a disaster

    • create realistic expectations





    share|improve this answer








    New contributor




    Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "557"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.









      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49612%2fwhats-the-point-of-the-test-set%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5












      $begingroup$

      The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.



      Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)



      You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).



      In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?






      share|improve this answer










      New contributor




      aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$

















        5












        $begingroup$

        The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.



        Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)



        You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).



        In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?






        share|improve this answer










        New contributor




        aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$















          5












          5








          5





          $begingroup$

          The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.



          Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)



          You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).



          In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?






          share|improve this answer










          New contributor




          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.



          Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)



          You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).



          In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?







          share|improve this answer










          New contributor




          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited Apr 19 at 22:04





















          New contributor




          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered Apr 19 at 21:58









          aranglolaranglol

          2162




          2162




          New contributor




          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















              3












              $begingroup$

              I like your question, it is somewhat philosophical in nature.



              We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.



              In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.



              As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.



              In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.



              P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)






              share|improve this answer









              $endgroup$












              • $begingroup$
                Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
                $endgroup$
                – aranglol
                Apr 19 at 23:33







              • 1




                $begingroup$
                @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
                $endgroup$
                – Esmailian
                2 days ago
















              3












              $begingroup$

              I like your question, it is somewhat philosophical in nature.



              We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.



              In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.



              As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.



              In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.



              P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)






              share|improve this answer









              $endgroup$












              • $begingroup$
                Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
                $endgroup$
                – aranglol
                Apr 19 at 23:33







              • 1




                $begingroup$
                @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
                $endgroup$
                – Esmailian
                2 days ago














              3












              3








              3





              $begingroup$

              I like your question, it is somewhat philosophical in nature.



              We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.



              In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.



              As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.



              In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.



              P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)






              share|improve this answer









              $endgroup$



              I like your question, it is somewhat philosophical in nature.



              We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.



              In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.



              As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.



              In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.



              P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 19 at 23:09









              EsmailianEsmailian

              3,576420




              3,576420











              • $begingroup$
                Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
                $endgroup$
                – aranglol
                Apr 19 at 23:33







              • 1




                $begingroup$
                @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
                $endgroup$
                – Esmailian
                2 days ago

















              • $begingroup$
                Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
                $endgroup$
                – aranglol
                Apr 19 at 23:33







              • 1




                $begingroup$
                @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
                $endgroup$
                – Esmailian
                2 days ago
















              $begingroup$
              Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
              $endgroup$
              – aranglol
              Apr 19 at 23:33





              $begingroup$
              Do you think that repeated cross validation would solve this issue of overfitting a particular static test set? I feel that on Kaggle no one does this because it is computationally expensive and models take a while to train. However, in practical usage getting multiple estimates and then forming say, a bootstrapped confidence interval seems to make a lot of intuitive sense with respect to this problem.
              $endgroup$
              – aranglol
              Apr 19 at 23:33





              1




              1




              $begingroup$
              @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
              $endgroup$
              – Esmailian
              2 days ago





              $begingroup$
              @aranglol Definitely it gives a better estimate of performance, here I mostly went for absolute meanings of test and validation terminologies, which is basically unimportant in practice.
              $endgroup$
              – Esmailian
              2 days ago












              1












              $begingroup$

              So, I've gathered from the good responses here that the point of a test set is to:



              • discourage cheating

              • spot data leakage

              • avoid a disaster

              • create realistic expectations





              share|improve this answer








              New contributor




              Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$

















                1












                $begingroup$

                So, I've gathered from the good responses here that the point of a test set is to:



                • discourage cheating

                • spot data leakage

                • avoid a disaster

                • create realistic expectations





                share|improve this answer








                New contributor




                Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  So, I've gathered from the good responses here that the point of a test set is to:



                  • discourage cheating

                  • spot data leakage

                  • avoid a disaster

                  • create realistic expectations





                  share|improve this answer








                  New contributor




                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  So, I've gathered from the good responses here that the point of a test set is to:



                  • discourage cheating

                  • spot data leakage

                  • avoid a disaster

                  • create realistic expectations






                  share|improve this answer








                  New contributor




                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 2 days ago









                  Nick CoronaNick Corona

                  285




                  285




                  New contributor




                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.




















                      Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.









                      draft saved

                      draft discarded


















                      Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.












                      Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.











                      Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.














                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49612%2fwhats-the-point-of-the-test-set%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                      Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                      Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림