How to call made-up data?What should I call these growth rates?How to call “Inliers” and “Outliers” in FrenchRatio “observed-to-expected” - how do you call it?How to create synthetic mortality data set?What do you call a “redundant” parameter?What do you call models that are not invariant to predictor order?Raising average: organic approach vs outlierHow to call label encoding in multi-label case?Naming of mathematical elements in GMM?How would you call matrix of response variables?

How much can I judge a company based on a phone screening?

Why aren’t there water shutoff valves for each room?

What is the most difficult concept to grasp in Calculus 1?

Does an Irish VISA WARNING count as "refused entry at the border of any country other than the UK?"

Weird resistor with dots around it

Cusp forms have an orthonormal basis of eigenfunctions for all Hecke operators

Are there liquid fueled rocket boosters having coaxial fuel/oxidizer tanks?

List, map function based on a condition

Did Michelle Obama have a staff of 23; and Melania have a staff of 4?

What was the intention with the Commodore 128?

How can I find files in directories listed in a file?

Why does this Jet Provost strikemaster have a textured leading edge?

Did Pope Urban II issue the papal bull "terra nullius" in 1095?

What can I do to increase the amount of LEDs I can power with a pro micro?

What would it take to get a message to another star?

What is a "soap"?

What is the prop for Thor's hammer (Mjölnir) made of?

What is the farthest a camera can see?

Heyawake: An Introductory Puzzle

What if a restaurant suddenly cannot accept credit cards, and the customer has no cash?

How does the Moon's gravity affect Earth's oceans despite Earth's stronger gravitational pull?

What are the advantages of this gold finger shape?

How can I shoot a bow using Strength instead of Dexterity?

Is there a word for returning to unpreparedness?



How to call made-up data?


What should I call these growth rates?How to call “Inliers” and “Outliers” in FrenchRatio “observed-to-expected” - how do you call it?How to create synthetic mortality data set?What do you call a “redundant” parameter?What do you call models that are not invariant to predictor order?Raising average: organic approach vs outlierHow to call label encoding in multi-label case?Naming of mathematical elements in GMM?How would you call matrix of response variables?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








20












$begingroup$


I'm writing an example and have made up some data. I want it to be clear to the reader this is not real data, but I also don't want to give the impression of malice, since it just serves as an example.



There is no (pseudo)random component to this particular data, so it seems to me that 'simulated' is not appropriate. If I call it fictitious or fabricated, does that give the impression of fraudulent data? Is 'made-up' a word that would fit in a scientific context?



What is the terminology in statistical literature for non-simulated made-up data?










share|cite|improve this question











$endgroup$









  • 8




    $begingroup$
    Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
    $endgroup$
    – Cort Ammon
    Aug 4 at 20:02

















20












$begingroup$


I'm writing an example and have made up some data. I want it to be clear to the reader this is not real data, but I also don't want to give the impression of malice, since it just serves as an example.



There is no (pseudo)random component to this particular data, so it seems to me that 'simulated' is not appropriate. If I call it fictitious or fabricated, does that give the impression of fraudulent data? Is 'made-up' a word that would fit in a scientific context?



What is the terminology in statistical literature for non-simulated made-up data?










share|cite|improve this question











$endgroup$









  • 8




    $begingroup$
    Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
    $endgroup$
    – Cort Ammon
    Aug 4 at 20:02













20












20








20





$begingroup$


I'm writing an example and have made up some data. I want it to be clear to the reader this is not real data, but I also don't want to give the impression of malice, since it just serves as an example.



There is no (pseudo)random component to this particular data, so it seems to me that 'simulated' is not appropriate. If I call it fictitious or fabricated, does that give the impression of fraudulent data? Is 'made-up' a word that would fit in a scientific context?



What is the terminology in statistical literature for non-simulated made-up data?










share|cite|improve this question











$endgroup$




I'm writing an example and have made up some data. I want it to be clear to the reader this is not real data, but I also don't want to give the impression of malice, since it just serves as an example.



There is no (pseudo)random component to this particular data, so it seems to me that 'simulated' is not appropriate. If I call it fictitious or fabricated, does that give the impression of fraudulent data? Is 'made-up' a word that would fit in a scientific context?



What is the terminology in statistical literature for non-simulated made-up data?







terminology synthetic-data






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 4 at 9:17









kjetil b halvorsen

36.1k9 gold badges90 silver badges281 bronze badges




36.1k9 gold badges90 silver badges281 bronze badges










asked Aug 4 at 4:19









Frans RodenburgFrans Rodenburg

5,5611 gold badge8 silver badges33 bronze badges




5,5611 gold badge8 silver badges33 bronze badges










  • 8




    $begingroup$
    Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
    $endgroup$
    – Cort Ammon
    Aug 4 at 20:02












  • 8




    $begingroup$
    Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
    $endgroup$
    – Cort Ammon
    Aug 4 at 20:02







8




8




$begingroup$
Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
$endgroup$
– Cort Ammon
Aug 4 at 20:02




$begingroup$
Just to add a comment which spreads across several answer: "synthetic" is a good word for made up data which tries to look as realistic as possible, while "mock up" suggests data which has been crafted to demonstrate something particular. For example, "mock up" data might contain absurd outliers, just to demonstrate how important it is to deal with outliers properly.
$endgroup$
– Cort Ammon
Aug 4 at 20:02










8 Answers
8






active

oldest

votes


















40












$begingroup$

I would probably call this "synthetic" or "artificial" data, though I might also call it "simulated" (the simulation is just very simple).






share|cite|improve this answer









$endgroup$










  • 29




    $begingroup$
    One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
    $endgroup$
    – rolando2
    Aug 4 at 9:33






  • 7




    $begingroup$
    "Illustrative data" or "example data" might also work
    $endgroup$
    – Henry
    Aug 4 at 9:44






  • 8




    $begingroup$
    +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
    $endgroup$
    – Glen_b
    Aug 4 at 10:10







  • 1




    $begingroup$
    I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
    $endgroup$
    – cbeleites
    Aug 5 at 17:27






  • 1




    $begingroup$
    It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
    $endgroup$
    – Ceph
    Aug 6 at 20:26



















11












$begingroup$

If you want to refer to your data as fictitious you'd be in good company, as that's the term Francis Anscombe used to describe his now famous quartet.



From Anscombe, F. J. (1973). "Graphs in Statistical Analysis", Am. Stat. 27 (1):




Some of these points are illustrated by four fictitious data sets,
each consisting of eleven (x, y) pairs, shown in the table.




But I think your caution is well placed, as my OED (v4) seems to indicates that this use of fictitious is obsolete




fictitious, a.



(fɪkˈtɪʃəs)



[f. L. fictīci-us (f. fingĕre to fashion, feign) + -ous: see -itious.]



1.1 †a.1.a Artificial as opposed to natural (obs.). b.1.b Counterfeit, ‘imitation’, sham; not genuine.







share|cite|improve this answer











$endgroup$














  • $begingroup$
    In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
    $endgroup$
    – Tim
    Aug 4 at 14:38







  • 1




    $begingroup$
    @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
    $endgroup$
    – AkselA
    Aug 4 at 14:46


















7












$begingroup$

In IT we often call it mockup data, which can presented through a mockup (application).



The mockup data can also be presented through a fully functional application, for instance to test the functionality of the application in a controlled manner.






share|cite|improve this answer









$endgroup$










  • 4




    $begingroup$
    Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
    $endgroup$
    – Tim
    Aug 4 at 20:02







  • 2




    $begingroup$
    I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
    $endgroup$
    – Mathijs Segers
    Aug 6 at 5:52






  • 1




    $begingroup$
    Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
    $endgroup$
    – ErikE
    Aug 6 at 10:22


















2












$begingroup$

I've seen repeated suggestions for the term "synthetic data". That term however has a broadly used, and very different meaning from what you want to express: https://en.wikipedia.org/wiki/Synthetic_data



I am not sure there is a generally accepted scientific term, but the term "example data" seems hard to misunderstand?






share|cite|improve this answer









$endgroup$










  • 1




    $begingroup$
    That article seems a little confused--the relationship to anonymization is pretty tenuous.
    $endgroup$
    – Matt Krause
    Aug 6 at 16:29










  • $begingroup$
    +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
    $endgroup$
    – Darren Cook
    Aug 9 at 11:09


















2












$begingroup$

I've encountered the term 'fake data' a fair amount. I guess it could have some negative connotations but I've heard it often enough that it doesn't register negatively at all for me.



FWIW, Andrew Gelman uses it too:



https://statmodeling.stat.columbia.edu/2009/09/04/fake-data_simul/



https://statmodeling.stat.columbia.edu/2019/03/23/yes-i-really-really-really-like-fake-data-simulation-and-i-cant-stop-talking-about-it/



https://books.google.dk/books?id=lV3DIdV0F9AC&pg=PA155&lpg=PA155&dq=fake+data+simulation&source=bl&ots=6ljKB6StQ4&sig=ACfU3U17GLP_84q_HfIQB4u5O6wV0yA2Aw&hl=en&sa=X&ved=2ahUKEwiF2_eB0uvjAhWswcQBHSn5Cn04ChDoATAAegQICRAB#v=onepage&q=fake%20data%20simulation&f=false



A quick google search for 'fake data' turns up a lot of results that seem to be using the term similarly:



https://scientistseessquirrel.wordpress.com/2016/03/10/good-uses-for-fake-data-part-1/



http://modernstatisticalworkflow.blogspot.com/2017/04/an-easy-way-to-simulate-fake-data-from.html



https://clayford.github.io/dwir/dwr_12_generating_data.html



And there's even a fakeR package, which suggests that this is relatively common:
https://cran.r-project.org/web/packages/fakeR/fakeR.pdf






share|cite|improve this answer









$endgroup$






















    1












    $begingroup$

    I use a different word depending on the manner in which I use the data. If I have found the made-up dataset lying around and have pointed my algorithm at it in a confirmatory manner, then the word "synthetic" is fine.



    However, oftentimes whenever I use this type of data, I have invented the data with the specific intent of showing off the capabilities of my algorithm. In other words, I invented data for the specific purpose of getting "good results". In such circumstances, I am fond of the term "contrived" along with an explanation of my expectations for the data. This is because I don't want anyone to make the mistake of thinking that I pointed my algorithm at some arbitrary synthetic dataset I found lying around and it really worked out well. If I have cherry-picked data (to the point of actually making it up) specifically to make my algorithm work out well, I say so. This is because such results provide evidence that my algorithm can work out well, but provide only very weak evidence that one might expect the algorithm to work out well in general. The word "contrived" really sums up nicely the fact that I have chosen the data with "good results" in mind, a priori.




    "does that give the impression of fraudulent data?"




    No, but, it is important to be clear about the source of any dataset and your a priori expectations as the experimenter when reporting your results on any dataset. The term "fraud" explicitly includes an aspect of having covered something up or having outright lied. The #1 way to avoid commission of fraud in science is to simply be honest and forthright about the nature of your data and your expectations. In other words, if your data are fabricated and you fail to say as much in any way, and there is some kind of expectation that the data are not fabricated or, worse, you claim that the data are gathered in some non-fabricated sort of way, then that is "fraud". Don't do that thing. If you want to use some synonym for the term "fabricated" that "sounds better", such as "synthetic", nobody will fault you, but at the same time I don't think that anyone will notice the difference except for you.



    A side note:



    Less obvious are circumstances where one claims to have had a priori expectations that are actually post hoc explanations. This is also fraudulent analysis of data.



    There is a danger of this when one chooses data specifically with the intent of "showing off" the capabilities of an algorithm, which is frequently the case with synthetic data.



    To be clear about why this is the case, consider that the "normal" scientific method works something like so: 1) A population $D$ is chosen 2) A hypothesis $H$ is concieved 3) $H$ is tested against $D$ (or some sample chosen from $D$). Science doesn't have to work within this narrow definition, but this is what is called "confirmatory" analysis, and is generally considered the strongest form of evidence one can provide. Since the order of events correlates with the strength of evidence, it is important to specifically document them.



    Notably, in the case of "contrived" data, the process often works more like so: 1) A hypothesis $H$ is conceived, 2) A population $D$ is chosen, 3) $H$ is tested against $D$. If you are testing an algorithm, for example, then the hypothesis that your fancy new algorithm "does a good job" might occur prior to the invention of the synthetic dataset. If this is the case, you should mention it. At the very least you should not purport that events transpired in a "confirmatory" manner, because that would lead readers to conclude that your evidence is stronger than it actually is.



    There is no problem with doing this, so long as you are honest and forthright about what you have done. If you have gone through pains to create a dataset that gives "good results", do say so. As long as you let the reader know the steps that you have taken in your data analysis, they have the information necessary to effectively weigh the evidence for or against your hypotheses. When you are not honest or are not forthright, then this may give the impression that your evidence is stronger than it really is. When you are KNOWINGLY less than honest and forthright for the sake of making your evidence seem stronger than it really is, then that is, indeed, fraudulent.



    In any case, this is why I prefer the term "contrived" for such datasets, along with a short explanation that they are, indeed, chosen with a hypothesis in mind. "Contrived" conveys the sense that not only did I create a synthetic dataset, but I did so with particular intentions that reflect the fact that my hypothesis was already in place before the creation of my dataset.



    To illustrate by an example: You create an algorithm for analysis of arbitrary time-series. You hypothesize that this algorithm will give "good results" when pointed at time-series. Consider, now, the following two possibilities:
    1) You create some synthetic data that looks like the sort of thing that you expect your algorithm to perform well on. You analyze this data and the algorithm performs well. 2) You grab some synthetic datasets because they are available why not. You analyze this data and the algorithm performs well. Which of these two circumstances provides the better evidence that your algorithm performs well on arbitrary time-series? Clearly, it is option 2. However, it might be easy to report in either option 1 or option 2 that "we applied algorithm $A$ to synthetic dataset $D$. Results are shown in Figure $x.y$." In the absence of any context, a reader might reasonably assume that these results are confirmatory (option 2), when, in the case of option 1, they are not. The reader has therefore, in option 1, been given the impression that the evidence is stronger than it really is.



    tl;dr



    Use whatever term you like, "synthetic", "contrived", "fabricated", "fictitious". However, the term that you use is insufficient to ensure that your results are not misleading. Ensure that you are clear in your report about how the data came about, including your expectations for the data and the reasons why you chose the data that you chose.






    share|cite|improve this answer









    $endgroup$






















      1












      $begingroup$

      Intuitively I would go to the term 'Dummy data', in the same sense that "Lorem ipsum..." is called 'Dummy text'.
      The word 'Dummy' is quite general and easy to understand for people from various backgrounds and is therfore less likely to be misinterpreted by readers of a less statistical background.






      share|cite|improve this answer









      $endgroup$










      • 2




        $begingroup$
        If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
        $endgroup$
        – Matt Krause
        Aug 5 at 21:24


















      0












      $begingroup$

      In business you would call it synthetic data predominantly - and occasionally simulated data or fake data. In academia you would call it pseudo-data predominantly, and occasionally simulated data. If it is the result of a Monte Carlo simulation, at times it gets referred colloquially as simply “Monte Carlo”.






      share|cite|improve this answer









      $endgroup$










      • 2




        $begingroup$
        "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
        $endgroup$
        – Tim
        Aug 5 at 21:30










      • $begingroup$
        It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
        $endgroup$
        – famargar
        Aug 5 at 21:38













      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "65"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f420525%2fhow-to-call-made-up-data%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      8 Answers
      8






      active

      oldest

      votes








      8 Answers
      8






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      40












      $begingroup$

      I would probably call this "synthetic" or "artificial" data, though I might also call it "simulated" (the simulation is just very simple).






      share|cite|improve this answer









      $endgroup$










      • 29




        $begingroup$
        One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
        $endgroup$
        – rolando2
        Aug 4 at 9:33






      • 7




        $begingroup$
        "Illustrative data" or "example data" might also work
        $endgroup$
        – Henry
        Aug 4 at 9:44






      • 8




        $begingroup$
        +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
        $endgroup$
        – Glen_b
        Aug 4 at 10:10







      • 1




        $begingroup$
        I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
        $endgroup$
        – cbeleites
        Aug 5 at 17:27






      • 1




        $begingroup$
        It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
        $endgroup$
        – Ceph
        Aug 6 at 20:26
















      40












      $begingroup$

      I would probably call this "synthetic" or "artificial" data, though I might also call it "simulated" (the simulation is just very simple).






      share|cite|improve this answer









      $endgroup$










      • 29




        $begingroup$
        One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
        $endgroup$
        – rolando2
        Aug 4 at 9:33






      • 7




        $begingroup$
        "Illustrative data" or "example data" might also work
        $endgroup$
        – Henry
        Aug 4 at 9:44






      • 8




        $begingroup$
        +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
        $endgroup$
        – Glen_b
        Aug 4 at 10:10







      • 1




        $begingroup$
        I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
        $endgroup$
        – cbeleites
        Aug 5 at 17:27






      • 1




        $begingroup$
        It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
        $endgroup$
        – Ceph
        Aug 6 at 20:26














      40












      40








      40





      $begingroup$

      I would probably call this "synthetic" or "artificial" data, though I might also call it "simulated" (the simulation is just very simple).






      share|cite|improve this answer









      $endgroup$



      I would probably call this "synthetic" or "artificial" data, though I might also call it "simulated" (the simulation is just very simple).







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Aug 4 at 4:38









      Louis CialdellaLouis Cialdella

      1,0428 silver badges13 bronze badges




      1,0428 silver badges13 bronze badges










      • 29




        $begingroup$
        One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
        $endgroup$
        – rolando2
        Aug 4 at 9:33






      • 7




        $begingroup$
        "Illustrative data" or "example data" might also work
        $endgroup$
        – Henry
        Aug 4 at 9:44






      • 8




        $begingroup$
        +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
        $endgroup$
        – Glen_b
        Aug 4 at 10:10







      • 1




        $begingroup$
        I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
        $endgroup$
        – cbeleites
        Aug 5 at 17:27






      • 1




        $begingroup$
        It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
        $endgroup$
        – Ceph
        Aug 6 at 20:26













      • 29




        $begingroup$
        One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
        $endgroup$
        – rolando2
        Aug 4 at 9:33






      • 7




        $begingroup$
        "Illustrative data" or "example data" might also work
        $endgroup$
        – Henry
        Aug 4 at 9:44






      • 8




        $begingroup$
        +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
        $endgroup$
        – Glen_b
        Aug 4 at 10:10







      • 1




        $begingroup$
        I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
        $endgroup$
        – cbeleites
        Aug 5 at 17:27






      • 1




        $begingroup$
        It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
        $endgroup$
        – Ceph
        Aug 6 at 20:26








      29




      29




      $begingroup$
      One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
      $endgroup$
      – rolando2
      Aug 4 at 9:33




      $begingroup$
      One hears "toy data," "toy example," and "dummy data." Also I agree that "simulated" might well fit even in the absence of random numbers.
      $endgroup$
      – rolando2
      Aug 4 at 9:33




      7




      7




      $begingroup$
      "Illustrative data" or "example data" might also work
      $endgroup$
      – Henry
      Aug 4 at 9:44




      $begingroup$
      "Illustrative data" or "example data" might also work
      $endgroup$
      – Henry
      Aug 4 at 9:44




      8




      8




      $begingroup$
      +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
      $endgroup$
      – Glen_b
      Aug 4 at 10:10





      $begingroup$
      +1 'synthetic data' and 'toy example' are both terms I might use, if the occasion arose, as is 'constructed example'. Sometimes I say "illustrative example" or something similar, particularly when the example was explicitly constructed to have particular features (e.g. when designed as a counterexample to some mistaken notion).
      $endgroup$
      – Glen_b
      Aug 4 at 10:10





      1




      1




      $begingroup$
      I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
      $endgroup$
      – cbeleites
      Aug 5 at 17:27




      $begingroup$
      I tend to use toy data (without artificial or simulated) for real (measured) data sets that I "abuse" to demonstrate something.
      $endgroup$
      – cbeleites
      Aug 5 at 17:27




      1




      1




      $begingroup$
      It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
      $endgroup$
      – Ceph
      Aug 6 at 20:26





      $begingroup$
      It depends a bit on your application what will work best. For example, I am also doing a project with "fake" data, but another part of the project involves using a computer model simulation. So it might confuse the reader for me to refer to the fake data as "simulated", falsely implying the data come from the simulation. So I've been relying on "artificial", and at times I describe the data as "manufactured". I personally would avoid "synthetic" as to me this term would imply that the data is some sort of combination of other data sources (a "synthesis" of e.g. data A and data B).
      $endgroup$
      – Ceph
      Aug 6 at 20:26














      11












      $begingroup$

      If you want to refer to your data as fictitious you'd be in good company, as that's the term Francis Anscombe used to describe his now famous quartet.



      From Anscombe, F. J. (1973). "Graphs in Statistical Analysis", Am. Stat. 27 (1):




      Some of these points are illustrated by four fictitious data sets,
      each consisting of eleven (x, y) pairs, shown in the table.




      But I think your caution is well placed, as my OED (v4) seems to indicates that this use of fictitious is obsolete




      fictitious, a.



      (fɪkˈtɪʃəs)



      [f. L. fictīci-us (f. fingĕre to fashion, feign) + -ous: see -itious.]



      1.1 †a.1.a Artificial as opposed to natural (obs.). b.1.b Counterfeit, ‘imitation’, sham; not genuine.







      share|cite|improve this answer











      $endgroup$














      • $begingroup$
        In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
        $endgroup$
        – Tim
        Aug 4 at 14:38







      • 1




        $begingroup$
        @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
        $endgroup$
        – AkselA
        Aug 4 at 14:46















      11












      $begingroup$

      If you want to refer to your data as fictitious you'd be in good company, as that's the term Francis Anscombe used to describe his now famous quartet.



      From Anscombe, F. J. (1973). "Graphs in Statistical Analysis", Am. Stat. 27 (1):




      Some of these points are illustrated by four fictitious data sets,
      each consisting of eleven (x, y) pairs, shown in the table.




      But I think your caution is well placed, as my OED (v4) seems to indicates that this use of fictitious is obsolete




      fictitious, a.



      (fɪkˈtɪʃəs)



      [f. L. fictīci-us (f. fingĕre to fashion, feign) + -ous: see -itious.]



      1.1 †a.1.a Artificial as opposed to natural (obs.). b.1.b Counterfeit, ‘imitation’, sham; not genuine.







      share|cite|improve this answer











      $endgroup$














      • $begingroup$
        In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
        $endgroup$
        – Tim
        Aug 4 at 14:38







      • 1




        $begingroup$
        @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
        $endgroup$
        – AkselA
        Aug 4 at 14:46













      11












      11








      11





      $begingroup$

      If you want to refer to your data as fictitious you'd be in good company, as that's the term Francis Anscombe used to describe his now famous quartet.



      From Anscombe, F. J. (1973). "Graphs in Statistical Analysis", Am. Stat. 27 (1):




      Some of these points are illustrated by four fictitious data sets,
      each consisting of eleven (x, y) pairs, shown in the table.




      But I think your caution is well placed, as my OED (v4) seems to indicates that this use of fictitious is obsolete




      fictitious, a.



      (fɪkˈtɪʃəs)



      [f. L. fictīci-us (f. fingĕre to fashion, feign) + -ous: see -itious.]



      1.1 †a.1.a Artificial as opposed to natural (obs.). b.1.b Counterfeit, ‘imitation’, sham; not genuine.







      share|cite|improve this answer











      $endgroup$



      If you want to refer to your data as fictitious you'd be in good company, as that's the term Francis Anscombe used to describe his now famous quartet.



      From Anscombe, F. J. (1973). "Graphs in Statistical Analysis", Am. Stat. 27 (1):




      Some of these points are illustrated by four fictitious data sets,
      each consisting of eleven (x, y) pairs, shown in the table.




      But I think your caution is well placed, as my OED (v4) seems to indicates that this use of fictitious is obsolete




      fictitious, a.



      (fɪkˈtɪʃəs)



      [f. L. fictīci-us (f. fingĕre to fashion, feign) + -ous: see -itious.]



      1.1 †a.1.a Artificial as opposed to natural (obs.). b.1.b Counterfeit, ‘imitation’, sham; not genuine.








      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      edited Aug 4 at 22:53

























      answered Aug 4 at 14:26









      AkselAAkselA

      2811 silver badge12 bronze badges




      2811 silver badge12 bronze badges














      • $begingroup$
        In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
        $endgroup$
        – Tim
        Aug 4 at 14:38







      • 1




        $begingroup$
        @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
        $endgroup$
        – AkselA
        Aug 4 at 14:46
















      • $begingroup$
        In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
        $endgroup$
        – Tim
        Aug 4 at 14:38







      • 1




        $begingroup$
        @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
        $endgroup$
        – AkselA
        Aug 4 at 14:46















      $begingroup$
      In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
      $endgroup$
      – Tim
      Aug 4 at 14:38





      $begingroup$
      In terms of readability the first suggestion & the comments are much better alternative. No need to use uncommon, complicated words.
      $endgroup$
      – Tim
      Aug 4 at 14:38





      1




      1




      $begingroup$
      @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
      $endgroup$
      – AkselA
      Aug 4 at 14:46




      $begingroup$
      @Tim: I want to agree, but I'm not entirely sure what I'd be agreeing with. Are you saying that fictitious would be a bad choice, despite having been used in a similar context before? Because that's what I'm saying.
      $endgroup$
      – AkselA
      Aug 4 at 14:46











      7












      $begingroup$

      In IT we often call it mockup data, which can presented through a mockup (application).



      The mockup data can also be presented through a fully functional application, for instance to test the functionality of the application in a controlled manner.






      share|cite|improve this answer









      $endgroup$










      • 4




        $begingroup$
        Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
        $endgroup$
        – Tim
        Aug 4 at 20:02







      • 2




        $begingroup$
        I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
        $endgroup$
        – Mathijs Segers
        Aug 6 at 5:52






      • 1




        $begingroup$
        Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
        $endgroup$
        – ErikE
        Aug 6 at 10:22















      7












      $begingroup$

      In IT we often call it mockup data, which can presented through a mockup (application).



      The mockup data can also be presented through a fully functional application, for instance to test the functionality of the application in a controlled manner.






      share|cite|improve this answer









      $endgroup$










      • 4




        $begingroup$
        Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
        $endgroup$
        – Tim
        Aug 4 at 20:02







      • 2




        $begingroup$
        I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
        $endgroup$
        – Mathijs Segers
        Aug 6 at 5:52






      • 1




        $begingroup$
        Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
        $endgroup$
        – ErikE
        Aug 6 at 10:22













      7












      7








      7





      $begingroup$

      In IT we often call it mockup data, which can presented through a mockup (application).



      The mockup data can also be presented through a fully functional application, for instance to test the functionality of the application in a controlled manner.






      share|cite|improve this answer









      $endgroup$



      In IT we often call it mockup data, which can presented through a mockup (application).



      The mockup data can also be presented through a fully functional application, for instance to test the functionality of the application in a controlled manner.







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Aug 4 at 19:55









      ErikEErikE

      1792 bronze badges




      1792 bronze badges










      • 4




        $begingroup$
        Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
        $endgroup$
        – Tim
        Aug 4 at 20:02







      • 2




        $begingroup$
        I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
        $endgroup$
        – Mathijs Segers
        Aug 6 at 5:52






      • 1




        $begingroup$
        Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
        $endgroup$
        – ErikE
        Aug 6 at 10:22












      • 4




        $begingroup$
        Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
        $endgroup$
        – Tim
        Aug 4 at 20:02







      • 2




        $begingroup$
        I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
        $endgroup$
        – Mathijs Segers
        Aug 6 at 5:52






      • 1




        $begingroup$
        Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
        $endgroup$
        – ErikE
        Aug 6 at 10:22







      4




      4




      $begingroup$
      Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
      $endgroup$
      – Tim
      Aug 4 at 20:02





      $begingroup$
      Good point, but I believe that mockup data and simulated data are not exactly the same. When creating mockup data for unit tests, you need it only to preserve some very basic properties of the real data, while when using simulated data for statistical analysis, you usually use more sophisticated data examples.
      $endgroup$
      – Tim
      Aug 4 at 20:02





      2




      2




      $begingroup$
      I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
      $endgroup$
      – Mathijs Segers
      Aug 6 at 5:52




      $begingroup$
      I still believe ErikE is correct though, when you write analytical code you either need the real thing or mock data. Mock data can be as big as you want it to be imo.
      $endgroup$
      – Mathijs Segers
      Aug 6 at 5:52




      1




      1




      $begingroup$
      Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
      $endgroup$
      – ErikE
      Aug 6 at 10:22




      $begingroup$
      Practices probably vary as does use of terminology, I guess. For many of our tests and analyses we use live data which has been "defused" for reasons of security and anonymity. For others we create bare bones data just as Tim describes. I have no strong opinion but we do use the term mockup quite loosely.
      $endgroup$
      – ErikE
      Aug 6 at 10:22











      2












      $begingroup$

      I've seen repeated suggestions for the term "synthetic data". That term however has a broadly used, and very different meaning from what you want to express: https://en.wikipedia.org/wiki/Synthetic_data



      I am not sure there is a generally accepted scientific term, but the term "example data" seems hard to misunderstand?






      share|cite|improve this answer









      $endgroup$










      • 1




        $begingroup$
        That article seems a little confused--the relationship to anonymization is pretty tenuous.
        $endgroup$
        – Matt Krause
        Aug 6 at 16:29










      • $begingroup$
        +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
        $endgroup$
        – Darren Cook
        Aug 9 at 11:09















      2












      $begingroup$

      I've seen repeated suggestions for the term "synthetic data". That term however has a broadly used, and very different meaning from what you want to express: https://en.wikipedia.org/wiki/Synthetic_data



      I am not sure there is a generally accepted scientific term, but the term "example data" seems hard to misunderstand?






      share|cite|improve this answer









      $endgroup$










      • 1




        $begingroup$
        That article seems a little confused--the relationship to anonymization is pretty tenuous.
        $endgroup$
        – Matt Krause
        Aug 6 at 16:29










      • $begingroup$
        +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
        $endgroup$
        – Darren Cook
        Aug 9 at 11:09













      2












      2








      2





      $begingroup$

      I've seen repeated suggestions for the term "synthetic data". That term however has a broadly used, and very different meaning from what you want to express: https://en.wikipedia.org/wiki/Synthetic_data



      I am not sure there is a generally accepted scientific term, but the term "example data" seems hard to misunderstand?






      share|cite|improve this answer









      $endgroup$



      I've seen repeated suggestions for the term "synthetic data". That term however has a broadly used, and very different meaning from what you want to express: https://en.wikipedia.org/wiki/Synthetic_data



      I am not sure there is a generally accepted scientific term, but the term "example data" seems hard to misunderstand?







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Aug 5 at 11:10









      srasssrass

      212 bronze badges




      212 bronze badges










      • 1




        $begingroup$
        That article seems a little confused--the relationship to anonymization is pretty tenuous.
        $endgroup$
        – Matt Krause
        Aug 6 at 16:29










      • $begingroup$
        +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
        $endgroup$
        – Darren Cook
        Aug 9 at 11:09












      • 1




        $begingroup$
        That article seems a little confused--the relationship to anonymization is pretty tenuous.
        $endgroup$
        – Matt Krause
        Aug 6 at 16:29










      • $begingroup$
        +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
        $endgroup$
        – Darren Cook
        Aug 9 at 11:09







      1




      1




      $begingroup$
      That article seems a little confused--the relationship to anonymization is pretty tenuous.
      $endgroup$
      – Matt Krause
      Aug 6 at 16:29




      $begingroup$
      That article seems a little confused--the relationship to anonymization is pretty tenuous.
      $endgroup$
      – Matt Krause
      Aug 6 at 16:29












      $begingroup$
      +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
      $endgroup$
      – Darren Cook
      Aug 9 at 11:09




      $begingroup$
      +1 but I agree with previous comment: apart from the second paragraphs (saying that synthesized data is a type of anonymized data), the rest of that Wikipedia article does seem to be describing what the questioner wants. I.e. realistic-looking made-up data.
      $endgroup$
      – Darren Cook
      Aug 9 at 11:09











      2












      $begingroup$

      I've encountered the term 'fake data' a fair amount. I guess it could have some negative connotations but I've heard it often enough that it doesn't register negatively at all for me.



      FWIW, Andrew Gelman uses it too:



      https://statmodeling.stat.columbia.edu/2009/09/04/fake-data_simul/



      https://statmodeling.stat.columbia.edu/2019/03/23/yes-i-really-really-really-like-fake-data-simulation-and-i-cant-stop-talking-about-it/



      https://books.google.dk/books?id=lV3DIdV0F9AC&pg=PA155&lpg=PA155&dq=fake+data+simulation&source=bl&ots=6ljKB6StQ4&sig=ACfU3U17GLP_84q_HfIQB4u5O6wV0yA2Aw&hl=en&sa=X&ved=2ahUKEwiF2_eB0uvjAhWswcQBHSn5Cn04ChDoATAAegQICRAB#v=onepage&q=fake%20data%20simulation&f=false



      A quick google search for 'fake data' turns up a lot of results that seem to be using the term similarly:



      https://scientistseessquirrel.wordpress.com/2016/03/10/good-uses-for-fake-data-part-1/



      http://modernstatisticalworkflow.blogspot.com/2017/04/an-easy-way-to-simulate-fake-data-from.html



      https://clayford.github.io/dwir/dwr_12_generating_data.html



      And there's even a fakeR package, which suggests that this is relatively common:
      https://cran.r-project.org/web/packages/fakeR/fakeR.pdf






      share|cite|improve this answer









      $endgroup$



















        2












        $begingroup$

        I've encountered the term 'fake data' a fair amount. I guess it could have some negative connotations but I've heard it often enough that it doesn't register negatively at all for me.



        FWIW, Andrew Gelman uses it too:



        https://statmodeling.stat.columbia.edu/2009/09/04/fake-data_simul/



        https://statmodeling.stat.columbia.edu/2019/03/23/yes-i-really-really-really-like-fake-data-simulation-and-i-cant-stop-talking-about-it/



        https://books.google.dk/books?id=lV3DIdV0F9AC&pg=PA155&lpg=PA155&dq=fake+data+simulation&source=bl&ots=6ljKB6StQ4&sig=ACfU3U17GLP_84q_HfIQB4u5O6wV0yA2Aw&hl=en&sa=X&ved=2ahUKEwiF2_eB0uvjAhWswcQBHSn5Cn04ChDoATAAegQICRAB#v=onepage&q=fake%20data%20simulation&f=false



        A quick google search for 'fake data' turns up a lot of results that seem to be using the term similarly:



        https://scientistseessquirrel.wordpress.com/2016/03/10/good-uses-for-fake-data-part-1/



        http://modernstatisticalworkflow.blogspot.com/2017/04/an-easy-way-to-simulate-fake-data-from.html



        https://clayford.github.io/dwir/dwr_12_generating_data.html



        And there's even a fakeR package, which suggests that this is relatively common:
        https://cran.r-project.org/web/packages/fakeR/fakeR.pdf






        share|cite|improve this answer









        $endgroup$

















          2












          2








          2





          $begingroup$

          I've encountered the term 'fake data' a fair amount. I guess it could have some negative connotations but I've heard it often enough that it doesn't register negatively at all for me.



          FWIW, Andrew Gelman uses it too:



          https://statmodeling.stat.columbia.edu/2009/09/04/fake-data_simul/



          https://statmodeling.stat.columbia.edu/2019/03/23/yes-i-really-really-really-like-fake-data-simulation-and-i-cant-stop-talking-about-it/



          https://books.google.dk/books?id=lV3DIdV0F9AC&pg=PA155&lpg=PA155&dq=fake+data+simulation&source=bl&ots=6ljKB6StQ4&sig=ACfU3U17GLP_84q_HfIQB4u5O6wV0yA2Aw&hl=en&sa=X&ved=2ahUKEwiF2_eB0uvjAhWswcQBHSn5Cn04ChDoATAAegQICRAB#v=onepage&q=fake%20data%20simulation&f=false



          A quick google search for 'fake data' turns up a lot of results that seem to be using the term similarly:



          https://scientistseessquirrel.wordpress.com/2016/03/10/good-uses-for-fake-data-part-1/



          http://modernstatisticalworkflow.blogspot.com/2017/04/an-easy-way-to-simulate-fake-data-from.html



          https://clayford.github.io/dwir/dwr_12_generating_data.html



          And there's even a fakeR package, which suggests that this is relatively common:
          https://cran.r-project.org/web/packages/fakeR/fakeR.pdf






          share|cite|improve this answer









          $endgroup$



          I've encountered the term 'fake data' a fair amount. I guess it could have some negative connotations but I've heard it often enough that it doesn't register negatively at all for me.



          FWIW, Andrew Gelman uses it too:



          https://statmodeling.stat.columbia.edu/2009/09/04/fake-data_simul/



          https://statmodeling.stat.columbia.edu/2019/03/23/yes-i-really-really-really-like-fake-data-simulation-and-i-cant-stop-talking-about-it/



          https://books.google.dk/books?id=lV3DIdV0F9AC&pg=PA155&lpg=PA155&dq=fake+data+simulation&source=bl&ots=6ljKB6StQ4&sig=ACfU3U17GLP_84q_HfIQB4u5O6wV0yA2Aw&hl=en&sa=X&ved=2ahUKEwiF2_eB0uvjAhWswcQBHSn5Cn04ChDoATAAegQICRAB#v=onepage&q=fake%20data%20simulation&f=false



          A quick google search for 'fake data' turns up a lot of results that seem to be using the term similarly:



          https://scientistseessquirrel.wordpress.com/2016/03/10/good-uses-for-fake-data-part-1/



          http://modernstatisticalworkflow.blogspot.com/2017/04/an-easy-way-to-simulate-fake-data-from.html



          https://clayford.github.io/dwir/dwr_12_generating_data.html



          And there's even a fakeR package, which suggests that this is relatively common:
          https://cran.r-project.org/web/packages/fakeR/fakeR.pdf







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Aug 5 at 11:37









          mktmkt

          6,2645 gold badges27 silver badges85 bronze badges




          6,2645 gold badges27 silver badges85 bronze badges
























              1












              $begingroup$

              I use a different word depending on the manner in which I use the data. If I have found the made-up dataset lying around and have pointed my algorithm at it in a confirmatory manner, then the word "synthetic" is fine.



              However, oftentimes whenever I use this type of data, I have invented the data with the specific intent of showing off the capabilities of my algorithm. In other words, I invented data for the specific purpose of getting "good results". In such circumstances, I am fond of the term "contrived" along with an explanation of my expectations for the data. This is because I don't want anyone to make the mistake of thinking that I pointed my algorithm at some arbitrary synthetic dataset I found lying around and it really worked out well. If I have cherry-picked data (to the point of actually making it up) specifically to make my algorithm work out well, I say so. This is because such results provide evidence that my algorithm can work out well, but provide only very weak evidence that one might expect the algorithm to work out well in general. The word "contrived" really sums up nicely the fact that I have chosen the data with "good results" in mind, a priori.




              "does that give the impression of fraudulent data?"




              No, but, it is important to be clear about the source of any dataset and your a priori expectations as the experimenter when reporting your results on any dataset. The term "fraud" explicitly includes an aspect of having covered something up or having outright lied. The #1 way to avoid commission of fraud in science is to simply be honest and forthright about the nature of your data and your expectations. In other words, if your data are fabricated and you fail to say as much in any way, and there is some kind of expectation that the data are not fabricated or, worse, you claim that the data are gathered in some non-fabricated sort of way, then that is "fraud". Don't do that thing. If you want to use some synonym for the term "fabricated" that "sounds better", such as "synthetic", nobody will fault you, but at the same time I don't think that anyone will notice the difference except for you.



              A side note:



              Less obvious are circumstances where one claims to have had a priori expectations that are actually post hoc explanations. This is also fraudulent analysis of data.



              There is a danger of this when one chooses data specifically with the intent of "showing off" the capabilities of an algorithm, which is frequently the case with synthetic data.



              To be clear about why this is the case, consider that the "normal" scientific method works something like so: 1) A population $D$ is chosen 2) A hypothesis $H$ is concieved 3) $H$ is tested against $D$ (or some sample chosen from $D$). Science doesn't have to work within this narrow definition, but this is what is called "confirmatory" analysis, and is generally considered the strongest form of evidence one can provide. Since the order of events correlates with the strength of evidence, it is important to specifically document them.



              Notably, in the case of "contrived" data, the process often works more like so: 1) A hypothesis $H$ is conceived, 2) A population $D$ is chosen, 3) $H$ is tested against $D$. If you are testing an algorithm, for example, then the hypothesis that your fancy new algorithm "does a good job" might occur prior to the invention of the synthetic dataset. If this is the case, you should mention it. At the very least you should not purport that events transpired in a "confirmatory" manner, because that would lead readers to conclude that your evidence is stronger than it actually is.



              There is no problem with doing this, so long as you are honest and forthright about what you have done. If you have gone through pains to create a dataset that gives "good results", do say so. As long as you let the reader know the steps that you have taken in your data analysis, they have the information necessary to effectively weigh the evidence for or against your hypotheses. When you are not honest or are not forthright, then this may give the impression that your evidence is stronger than it really is. When you are KNOWINGLY less than honest and forthright for the sake of making your evidence seem stronger than it really is, then that is, indeed, fraudulent.



              In any case, this is why I prefer the term "contrived" for such datasets, along with a short explanation that they are, indeed, chosen with a hypothesis in mind. "Contrived" conveys the sense that not only did I create a synthetic dataset, but I did so with particular intentions that reflect the fact that my hypothesis was already in place before the creation of my dataset.



              To illustrate by an example: You create an algorithm for analysis of arbitrary time-series. You hypothesize that this algorithm will give "good results" when pointed at time-series. Consider, now, the following two possibilities:
              1) You create some synthetic data that looks like the sort of thing that you expect your algorithm to perform well on. You analyze this data and the algorithm performs well. 2) You grab some synthetic datasets because they are available why not. You analyze this data and the algorithm performs well. Which of these two circumstances provides the better evidence that your algorithm performs well on arbitrary time-series? Clearly, it is option 2. However, it might be easy to report in either option 1 or option 2 that "we applied algorithm $A$ to synthetic dataset $D$. Results are shown in Figure $x.y$." In the absence of any context, a reader might reasonably assume that these results are confirmatory (option 2), when, in the case of option 1, they are not. The reader has therefore, in option 1, been given the impression that the evidence is stronger than it really is.



              tl;dr



              Use whatever term you like, "synthetic", "contrived", "fabricated", "fictitious". However, the term that you use is insufficient to ensure that your results are not misleading. Ensure that you are clear in your report about how the data came about, including your expectations for the data and the reasons why you chose the data that you chose.






              share|cite|improve this answer









              $endgroup$



















                1












                $begingroup$

                I use a different word depending on the manner in which I use the data. If I have found the made-up dataset lying around and have pointed my algorithm at it in a confirmatory manner, then the word "synthetic" is fine.



                However, oftentimes whenever I use this type of data, I have invented the data with the specific intent of showing off the capabilities of my algorithm. In other words, I invented data for the specific purpose of getting "good results". In such circumstances, I am fond of the term "contrived" along with an explanation of my expectations for the data. This is because I don't want anyone to make the mistake of thinking that I pointed my algorithm at some arbitrary synthetic dataset I found lying around and it really worked out well. If I have cherry-picked data (to the point of actually making it up) specifically to make my algorithm work out well, I say so. This is because such results provide evidence that my algorithm can work out well, but provide only very weak evidence that one might expect the algorithm to work out well in general. The word "contrived" really sums up nicely the fact that I have chosen the data with "good results" in mind, a priori.




                "does that give the impression of fraudulent data?"




                No, but, it is important to be clear about the source of any dataset and your a priori expectations as the experimenter when reporting your results on any dataset. The term "fraud" explicitly includes an aspect of having covered something up or having outright lied. The #1 way to avoid commission of fraud in science is to simply be honest and forthright about the nature of your data and your expectations. In other words, if your data are fabricated and you fail to say as much in any way, and there is some kind of expectation that the data are not fabricated or, worse, you claim that the data are gathered in some non-fabricated sort of way, then that is "fraud". Don't do that thing. If you want to use some synonym for the term "fabricated" that "sounds better", such as "synthetic", nobody will fault you, but at the same time I don't think that anyone will notice the difference except for you.



                A side note:



                Less obvious are circumstances where one claims to have had a priori expectations that are actually post hoc explanations. This is also fraudulent analysis of data.



                There is a danger of this when one chooses data specifically with the intent of "showing off" the capabilities of an algorithm, which is frequently the case with synthetic data.



                To be clear about why this is the case, consider that the "normal" scientific method works something like so: 1) A population $D$ is chosen 2) A hypothesis $H$ is concieved 3) $H$ is tested against $D$ (or some sample chosen from $D$). Science doesn't have to work within this narrow definition, but this is what is called "confirmatory" analysis, and is generally considered the strongest form of evidence one can provide. Since the order of events correlates with the strength of evidence, it is important to specifically document them.



                Notably, in the case of "contrived" data, the process often works more like so: 1) A hypothesis $H$ is conceived, 2) A population $D$ is chosen, 3) $H$ is tested against $D$. If you are testing an algorithm, for example, then the hypothesis that your fancy new algorithm "does a good job" might occur prior to the invention of the synthetic dataset. If this is the case, you should mention it. At the very least you should not purport that events transpired in a "confirmatory" manner, because that would lead readers to conclude that your evidence is stronger than it actually is.



                There is no problem with doing this, so long as you are honest and forthright about what you have done. If you have gone through pains to create a dataset that gives "good results", do say so. As long as you let the reader know the steps that you have taken in your data analysis, they have the information necessary to effectively weigh the evidence for or against your hypotheses. When you are not honest or are not forthright, then this may give the impression that your evidence is stronger than it really is. When you are KNOWINGLY less than honest and forthright for the sake of making your evidence seem stronger than it really is, then that is, indeed, fraudulent.



                In any case, this is why I prefer the term "contrived" for such datasets, along with a short explanation that they are, indeed, chosen with a hypothesis in mind. "Contrived" conveys the sense that not only did I create a synthetic dataset, but I did so with particular intentions that reflect the fact that my hypothesis was already in place before the creation of my dataset.



                To illustrate by an example: You create an algorithm for analysis of arbitrary time-series. You hypothesize that this algorithm will give "good results" when pointed at time-series. Consider, now, the following two possibilities:
                1) You create some synthetic data that looks like the sort of thing that you expect your algorithm to perform well on. You analyze this data and the algorithm performs well. 2) You grab some synthetic datasets because they are available why not. You analyze this data and the algorithm performs well. Which of these two circumstances provides the better evidence that your algorithm performs well on arbitrary time-series? Clearly, it is option 2. However, it might be easy to report in either option 1 or option 2 that "we applied algorithm $A$ to synthetic dataset $D$. Results are shown in Figure $x.y$." In the absence of any context, a reader might reasonably assume that these results are confirmatory (option 2), when, in the case of option 1, they are not. The reader has therefore, in option 1, been given the impression that the evidence is stronger than it really is.



                tl;dr



                Use whatever term you like, "synthetic", "contrived", "fabricated", "fictitious". However, the term that you use is insufficient to ensure that your results are not misleading. Ensure that you are clear in your report about how the data came about, including your expectations for the data and the reasons why you chose the data that you chose.






                share|cite|improve this answer









                $endgroup$

















                  1












                  1








                  1





                  $begingroup$

                  I use a different word depending on the manner in which I use the data. If I have found the made-up dataset lying around and have pointed my algorithm at it in a confirmatory manner, then the word "synthetic" is fine.



                  However, oftentimes whenever I use this type of data, I have invented the data with the specific intent of showing off the capabilities of my algorithm. In other words, I invented data for the specific purpose of getting "good results". In such circumstances, I am fond of the term "contrived" along with an explanation of my expectations for the data. This is because I don't want anyone to make the mistake of thinking that I pointed my algorithm at some arbitrary synthetic dataset I found lying around and it really worked out well. If I have cherry-picked data (to the point of actually making it up) specifically to make my algorithm work out well, I say so. This is because such results provide evidence that my algorithm can work out well, but provide only very weak evidence that one might expect the algorithm to work out well in general. The word "contrived" really sums up nicely the fact that I have chosen the data with "good results" in mind, a priori.




                  "does that give the impression of fraudulent data?"




                  No, but, it is important to be clear about the source of any dataset and your a priori expectations as the experimenter when reporting your results on any dataset. The term "fraud" explicitly includes an aspect of having covered something up or having outright lied. The #1 way to avoid commission of fraud in science is to simply be honest and forthright about the nature of your data and your expectations. In other words, if your data are fabricated and you fail to say as much in any way, and there is some kind of expectation that the data are not fabricated or, worse, you claim that the data are gathered in some non-fabricated sort of way, then that is "fraud". Don't do that thing. If you want to use some synonym for the term "fabricated" that "sounds better", such as "synthetic", nobody will fault you, but at the same time I don't think that anyone will notice the difference except for you.



                  A side note:



                  Less obvious are circumstances where one claims to have had a priori expectations that are actually post hoc explanations. This is also fraudulent analysis of data.



                  There is a danger of this when one chooses data specifically with the intent of "showing off" the capabilities of an algorithm, which is frequently the case with synthetic data.



                  To be clear about why this is the case, consider that the "normal" scientific method works something like so: 1) A population $D$ is chosen 2) A hypothesis $H$ is concieved 3) $H$ is tested against $D$ (or some sample chosen from $D$). Science doesn't have to work within this narrow definition, but this is what is called "confirmatory" analysis, and is generally considered the strongest form of evidence one can provide. Since the order of events correlates with the strength of evidence, it is important to specifically document them.



                  Notably, in the case of "contrived" data, the process often works more like so: 1) A hypothesis $H$ is conceived, 2) A population $D$ is chosen, 3) $H$ is tested against $D$. If you are testing an algorithm, for example, then the hypothesis that your fancy new algorithm "does a good job" might occur prior to the invention of the synthetic dataset. If this is the case, you should mention it. At the very least you should not purport that events transpired in a "confirmatory" manner, because that would lead readers to conclude that your evidence is stronger than it actually is.



                  There is no problem with doing this, so long as you are honest and forthright about what you have done. If you have gone through pains to create a dataset that gives "good results", do say so. As long as you let the reader know the steps that you have taken in your data analysis, they have the information necessary to effectively weigh the evidence for or against your hypotheses. When you are not honest or are not forthright, then this may give the impression that your evidence is stronger than it really is. When you are KNOWINGLY less than honest and forthright for the sake of making your evidence seem stronger than it really is, then that is, indeed, fraudulent.



                  In any case, this is why I prefer the term "contrived" for such datasets, along with a short explanation that they are, indeed, chosen with a hypothesis in mind. "Contrived" conveys the sense that not only did I create a synthetic dataset, but I did so with particular intentions that reflect the fact that my hypothesis was already in place before the creation of my dataset.



                  To illustrate by an example: You create an algorithm for analysis of arbitrary time-series. You hypothesize that this algorithm will give "good results" when pointed at time-series. Consider, now, the following two possibilities:
                  1) You create some synthetic data that looks like the sort of thing that you expect your algorithm to perform well on. You analyze this data and the algorithm performs well. 2) You grab some synthetic datasets because they are available why not. You analyze this data and the algorithm performs well. Which of these two circumstances provides the better evidence that your algorithm performs well on arbitrary time-series? Clearly, it is option 2. However, it might be easy to report in either option 1 or option 2 that "we applied algorithm $A$ to synthetic dataset $D$. Results are shown in Figure $x.y$." In the absence of any context, a reader might reasonably assume that these results are confirmatory (option 2), when, in the case of option 1, they are not. The reader has therefore, in option 1, been given the impression that the evidence is stronger than it really is.



                  tl;dr



                  Use whatever term you like, "synthetic", "contrived", "fabricated", "fictitious". However, the term that you use is insufficient to ensure that your results are not misleading. Ensure that you are clear in your report about how the data came about, including your expectations for the data and the reasons why you chose the data that you chose.






                  share|cite|improve this answer









                  $endgroup$



                  I use a different word depending on the manner in which I use the data. If I have found the made-up dataset lying around and have pointed my algorithm at it in a confirmatory manner, then the word "synthetic" is fine.



                  However, oftentimes whenever I use this type of data, I have invented the data with the specific intent of showing off the capabilities of my algorithm. In other words, I invented data for the specific purpose of getting "good results". In such circumstances, I am fond of the term "contrived" along with an explanation of my expectations for the data. This is because I don't want anyone to make the mistake of thinking that I pointed my algorithm at some arbitrary synthetic dataset I found lying around and it really worked out well. If I have cherry-picked data (to the point of actually making it up) specifically to make my algorithm work out well, I say so. This is because such results provide evidence that my algorithm can work out well, but provide only very weak evidence that one might expect the algorithm to work out well in general. The word "contrived" really sums up nicely the fact that I have chosen the data with "good results" in mind, a priori.




                  "does that give the impression of fraudulent data?"




                  No, but, it is important to be clear about the source of any dataset and your a priori expectations as the experimenter when reporting your results on any dataset. The term "fraud" explicitly includes an aspect of having covered something up or having outright lied. The #1 way to avoid commission of fraud in science is to simply be honest and forthright about the nature of your data and your expectations. In other words, if your data are fabricated and you fail to say as much in any way, and there is some kind of expectation that the data are not fabricated or, worse, you claim that the data are gathered in some non-fabricated sort of way, then that is "fraud". Don't do that thing. If you want to use some synonym for the term "fabricated" that "sounds better", such as "synthetic", nobody will fault you, but at the same time I don't think that anyone will notice the difference except for you.



                  A side note:



                  Less obvious are circumstances where one claims to have had a priori expectations that are actually post hoc explanations. This is also fraudulent analysis of data.



                  There is a danger of this when one chooses data specifically with the intent of "showing off" the capabilities of an algorithm, which is frequently the case with synthetic data.



                  To be clear about why this is the case, consider that the "normal" scientific method works something like so: 1) A population $D$ is chosen 2) A hypothesis $H$ is concieved 3) $H$ is tested against $D$ (or some sample chosen from $D$). Science doesn't have to work within this narrow definition, but this is what is called "confirmatory" analysis, and is generally considered the strongest form of evidence one can provide. Since the order of events correlates with the strength of evidence, it is important to specifically document them.



                  Notably, in the case of "contrived" data, the process often works more like so: 1) A hypothesis $H$ is conceived, 2) A population $D$ is chosen, 3) $H$ is tested against $D$. If you are testing an algorithm, for example, then the hypothesis that your fancy new algorithm "does a good job" might occur prior to the invention of the synthetic dataset. If this is the case, you should mention it. At the very least you should not purport that events transpired in a "confirmatory" manner, because that would lead readers to conclude that your evidence is stronger than it actually is.



                  There is no problem with doing this, so long as you are honest and forthright about what you have done. If you have gone through pains to create a dataset that gives "good results", do say so. As long as you let the reader know the steps that you have taken in your data analysis, they have the information necessary to effectively weigh the evidence for or against your hypotheses. When you are not honest or are not forthright, then this may give the impression that your evidence is stronger than it really is. When you are KNOWINGLY less than honest and forthright for the sake of making your evidence seem stronger than it really is, then that is, indeed, fraudulent.



                  In any case, this is why I prefer the term "contrived" for such datasets, along with a short explanation that they are, indeed, chosen with a hypothesis in mind. "Contrived" conveys the sense that not only did I create a synthetic dataset, but I did so with particular intentions that reflect the fact that my hypothesis was already in place before the creation of my dataset.



                  To illustrate by an example: You create an algorithm for analysis of arbitrary time-series. You hypothesize that this algorithm will give "good results" when pointed at time-series. Consider, now, the following two possibilities:
                  1) You create some synthetic data that looks like the sort of thing that you expect your algorithm to perform well on. You analyze this data and the algorithm performs well. 2) You grab some synthetic datasets because they are available why not. You analyze this data and the algorithm performs well. Which of these two circumstances provides the better evidence that your algorithm performs well on arbitrary time-series? Clearly, it is option 2. However, it might be easy to report in either option 1 or option 2 that "we applied algorithm $A$ to synthetic dataset $D$. Results are shown in Figure $x.y$." In the absence of any context, a reader might reasonably assume that these results are confirmatory (option 2), when, in the case of option 1, they are not. The reader has therefore, in option 1, been given the impression that the evidence is stronger than it really is.



                  tl;dr



                  Use whatever term you like, "synthetic", "contrived", "fabricated", "fictitious". However, the term that you use is insufficient to ensure that your results are not misleading. Ensure that you are clear in your report about how the data came about, including your expectations for the data and the reasons why you chose the data that you chose.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Aug 5 at 11:43









                  ScottScott

                  3981 silver badge14 bronze badges




                  3981 silver badge14 bronze badges
























                      1












                      $begingroup$

                      Intuitively I would go to the term 'Dummy data', in the same sense that "Lorem ipsum..." is called 'Dummy text'.
                      The word 'Dummy' is quite general and easy to understand for people from various backgrounds and is therfore less likely to be misinterpreted by readers of a less statistical background.






                      share|cite|improve this answer









                      $endgroup$










                      • 2




                        $begingroup$
                        If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                        $endgroup$
                        – Matt Krause
                        Aug 5 at 21:24















                      1












                      $begingroup$

                      Intuitively I would go to the term 'Dummy data', in the same sense that "Lorem ipsum..." is called 'Dummy text'.
                      The word 'Dummy' is quite general and easy to understand for people from various backgrounds and is therfore less likely to be misinterpreted by readers of a less statistical background.






                      share|cite|improve this answer









                      $endgroup$










                      • 2




                        $begingroup$
                        If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                        $endgroup$
                        – Matt Krause
                        Aug 5 at 21:24













                      1












                      1








                      1





                      $begingroup$

                      Intuitively I would go to the term 'Dummy data', in the same sense that "Lorem ipsum..." is called 'Dummy text'.
                      The word 'Dummy' is quite general and easy to understand for people from various backgrounds and is therfore less likely to be misinterpreted by readers of a less statistical background.






                      share|cite|improve this answer









                      $endgroup$



                      Intuitively I would go to the term 'Dummy data', in the same sense that "Lorem ipsum..." is called 'Dummy text'.
                      The word 'Dummy' is quite general and easy to understand for people from various backgrounds and is therfore less likely to be misinterpreted by readers of a less statistical background.







                      share|cite|improve this answer












                      share|cite|improve this answer



                      share|cite|improve this answer










                      answered Aug 5 at 11:51









                      MathijsMathijs

                      266 bronze badges




                      266 bronze badges










                      • 2




                        $begingroup$
                        If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                        $endgroup$
                        – Matt Krause
                        Aug 5 at 21:24












                      • 2




                        $begingroup$
                        If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                        $endgroup$
                        – Matt Krause
                        Aug 5 at 21:24







                      2




                      2




                      $begingroup$
                      If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                      $endgroup$
                      – Matt Krause
                      Aug 5 at 21:24




                      $begingroup$
                      If it's in a regression context, I would avoid overloading "dummy", lest you have dummy variables encoding dummy data.
                      $endgroup$
                      – Matt Krause
                      Aug 5 at 21:24











                      0












                      $begingroup$

                      In business you would call it synthetic data predominantly - and occasionally simulated data or fake data. In academia you would call it pseudo-data predominantly, and occasionally simulated data. If it is the result of a Monte Carlo simulation, at times it gets referred colloquially as simply “Monte Carlo”.






                      share|cite|improve this answer









                      $endgroup$










                      • 2




                        $begingroup$
                        "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                        $endgroup$
                        – Tim
                        Aug 5 at 21:30










                      • $begingroup$
                        It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                        $endgroup$
                        – famargar
                        Aug 5 at 21:38















                      0












                      $begingroup$

                      In business you would call it synthetic data predominantly - and occasionally simulated data or fake data. In academia you would call it pseudo-data predominantly, and occasionally simulated data. If it is the result of a Monte Carlo simulation, at times it gets referred colloquially as simply “Monte Carlo”.






                      share|cite|improve this answer









                      $endgroup$










                      • 2




                        $begingroup$
                        "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                        $endgroup$
                        – Tim
                        Aug 5 at 21:30










                      • $begingroup$
                        It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                        $endgroup$
                        – famargar
                        Aug 5 at 21:38













                      0












                      0








                      0





                      $begingroup$

                      In business you would call it synthetic data predominantly - and occasionally simulated data or fake data. In academia you would call it pseudo-data predominantly, and occasionally simulated data. If it is the result of a Monte Carlo simulation, at times it gets referred colloquially as simply “Monte Carlo”.






                      share|cite|improve this answer









                      $endgroup$



                      In business you would call it synthetic data predominantly - and occasionally simulated data or fake data. In academia you would call it pseudo-data predominantly, and occasionally simulated data. If it is the result of a Monte Carlo simulation, at times it gets referred colloquially as simply “Monte Carlo”.







                      share|cite|improve this answer












                      share|cite|improve this answer



                      share|cite|improve this answer










                      answered Aug 5 at 21:06









                      famargarfamargar

                      5221 gold badge4 silver badges18 bronze badges




                      5221 gold badge4 silver badges18 bronze badges










                      • 2




                        $begingroup$
                        "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                        $endgroup$
                        – Tim
                        Aug 5 at 21:30










                      • $begingroup$
                        It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                        $endgroup$
                        – famargar
                        Aug 5 at 21:38












                      • 2




                        $begingroup$
                        "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                        $endgroup$
                        – Tim
                        Aug 5 at 21:30










                      • $begingroup$
                        It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                        $endgroup$
                        – famargar
                        Aug 5 at 21:38







                      2




                      2




                      $begingroup$
                      "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                      $endgroup$
                      – Tim
                      Aug 5 at 21:30




                      $begingroup$
                      "Monte Carlo" is the name of the method, so the "colloquial" name would be very misleading.
                      $endgroup$
                      – Tim
                      Aug 5 at 21:30












                      $begingroup$
                      It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                      $endgroup$
                      – famargar
                      Aug 5 at 21:38




                      $begingroup$
                      It’s jargon - and as any other technical jargon is meant to create a community around it, by leaving the rest of the world wondering what the hell is this about. So yes it is used and yes it is (quite voluntarily) misleading
                      $endgroup$
                      – famargar
                      Aug 5 at 21:38

















                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f420525%2fhow-to-call-made-up-data%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                      Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                      Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?