Is there a real life meaning about KMeans error?How to determine whether a bad performance is caused by data quality?How to select regression algorithm for noisy (scattered) data?Machine Learning - Choice of features for determining hypothesisWhat are the differences between logistic and linear regression?Difference between Gradient Descent and Normal Equation in Linear RegressionWhat is the best statistical measure tool or approach to resolve my problem?Linear regression load model doesn't predict as expectedAny efficient way to build non-linear regression model for polynomial features?Predicting house price using linear regressionWhat does embedding mean in machine learning?

Can I activate an iPhone without an Apple ID?

Chess Construction Challenge #2-Check!

Connect neutrals together in 3-gang box (load side) with 3x 3-way switches?

Does the Entangle spell require existing vegetation?

Possible isometry groups of open manifolds

I quit, and boss offered me 3 month "grace period" where I could still come back

I do not have power to all my breakers

Is killing off one of my queer characters homophobic?

Construct a pentagon avoiding compass use

Asking for higher salary after I increased my initial figure

Krazy language in Krazy Kat, 25 July 1936

How to make "plastic" sounding distored guitar

Are lithium batteries allowed in the International Space Station?

What impact would a dragon the size of Asia have on the environment?

Why does the trade federation become so alarmed upon learning the ambassadors are Jedi Knights?

Hot object in a vacuum

Commutator subgroup of Heisenberg group.

How long do Apple retain notifications to be pushed to iOS devices until they expire?

(algebraic topology) question about the cellular approximation theorem

How to fit a linear model in the Bayesian way in Mathematica?

Is a public company able to check out who owns its shares in very detailed format?

Crab Nebula short story from 1960s or '70s

Are L-functions uniquely determined by their values at negative integers?

Mistakenly modified `/bin/sh'



Is there a real life meaning about KMeans error?


How to determine whether a bad performance is caused by data quality?How to select regression algorithm for noisy (scattered) data?Machine Learning - Choice of features for determining hypothesisWhat are the differences between logistic and linear regression?Difference between Gradient Descent and Normal Equation in Linear RegressionWhat is the best statistical measure tool or approach to resolve my problem?Linear regression load model doesn't predict as expectedAny efficient way to build non-linear regression model for polynomial features?Predicting house price using linear regressionWhat does embedding mean in machine learning?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


I am trying to understand the meaning of error in sklearn KMeans.



In the context of house pricing prediction, the error linear regression could be considered as the money difference per square foot.



Is there a real life meaning about KMeans error?










share|improve this question











$endgroup$


















    2












    $begingroup$


    I am trying to understand the meaning of error in sklearn KMeans.



    In the context of house pricing prediction, the error linear regression could be considered as the money difference per square foot.



    Is there a real life meaning about KMeans error?










    share|improve this question











    $endgroup$














      2












      2








      2





      $begingroup$


      I am trying to understand the meaning of error in sklearn KMeans.



      In the context of house pricing prediction, the error linear regression could be considered as the money difference per square foot.



      Is there a real life meaning about KMeans error?










      share|improve this question











      $endgroup$




      I am trying to understand the meaning of error in sklearn KMeans.



      In the context of house pricing prediction, the error linear regression could be considered as the money difference per square foot.



      Is there a real life meaning about KMeans error?







      machine-learning data-mining k-means






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 7 at 10:21







      baojieqh

















      asked Jul 6 at 14:02









      baojieqhbaojieqh

      1996 bronze badges




      1996 bronze badges




















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          The K-means Error gives you, what is known as, total intra-cluster variance. K-means error



          Intra-cluster variance is the measure of how much are the points in a given cluster spread.



          The following cluster will have high intra-cluster variancesparse cluster



          In the image below, even though the number of points are same as that of the image above, the points are densely distributed and hence will have lower intra-cluster variance.
          enter image description here



          K-means Error is interested in the total of such individual cluster variances.



          Suppose for a given data, if clustering 'A' forms clusters like the first image and clustering 'B' forms clusters like the second image, you will in most cases choose the second one.



          Although this does not mean that the K-means Error is a perfect objective to optimize on to form clusters. But it pretty much catches the essence behind clustering.




          Code used for cluster plot generation -



          import numpy as np
          from matplotlib import pyplot as plt

          sparse_samples = np.random.multivariate_normal([0, 0], [[50000, 0], [0, 50000]], size=(1000))
          plt.plot(sparse_samples[:, 0], sparse_samples[:, 1], 'b+')
          axes = plt.gca()
          axes.set_xlim(-1000, 1000)
          axes.set_ylim(-1000, 1000)
          plt.show()

          dense_samples = np.random.multivariate_normal([0, 0], [[5000, 0], [0, 5000]], size=(1000))
          plt.plot(dense_samples[:, 0], dense_samples[:, 1], 'r+')
          axes = plt.gca()
          axes.set_xlim(-1000, 1000)
          axes.set_ylim(-1000, 1000)
          plt.show()


          In both cases, a 1000 datapoints from a Bivariate Normal Distribution are sampled and plotted . In the second case, the Covariance Matrix is changed to plot a denser cluster. np.random.multivariate_normal's documentation can be found here. Hope this helps!






          share|improve this answer











          $endgroup$












          • $begingroup$
            thanks for your excellent explanation. do you mind post your code for the 2 figures?
            $endgroup$
            – baojieqh
            Jul 7 at 10:25






          • 1




            $begingroup$
            Edited the answer to include the code used for plotting :)
            $endgroup$
            – Yash Jakhotiya
            Jul 7 at 12:25


















          0












          $begingroup$

          The meaning of "error" in k-Means clustering is: how much discrepancy / loss of information would you get if you substitute the k centroids to the actual observations. In other words: how good the k centroids approximate your data.



          There are several ways you can measure this "error". Usually the percentage of variance explained or the within cluster sum of errors are employed, but the choice is huge. Even a more banal Euclidean distance could work.






          share|improve this answer









          $endgroup$















            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f55179%2fis-there-a-real-life-meaning-about-kmeans-error%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            The K-means Error gives you, what is known as, total intra-cluster variance. K-means error



            Intra-cluster variance is the measure of how much are the points in a given cluster spread.



            The following cluster will have high intra-cluster variancesparse cluster



            In the image below, even though the number of points are same as that of the image above, the points are densely distributed and hence will have lower intra-cluster variance.
            enter image description here



            K-means Error is interested in the total of such individual cluster variances.



            Suppose for a given data, if clustering 'A' forms clusters like the first image and clustering 'B' forms clusters like the second image, you will in most cases choose the second one.



            Although this does not mean that the K-means Error is a perfect objective to optimize on to form clusters. But it pretty much catches the essence behind clustering.




            Code used for cluster plot generation -



            import numpy as np
            from matplotlib import pyplot as plt

            sparse_samples = np.random.multivariate_normal([0, 0], [[50000, 0], [0, 50000]], size=(1000))
            plt.plot(sparse_samples[:, 0], sparse_samples[:, 1], 'b+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()

            dense_samples = np.random.multivariate_normal([0, 0], [[5000, 0], [0, 5000]], size=(1000))
            plt.plot(dense_samples[:, 0], dense_samples[:, 1], 'r+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()


            In both cases, a 1000 datapoints from a Bivariate Normal Distribution are sampled and plotted . In the second case, the Covariance Matrix is changed to plot a denser cluster. np.random.multivariate_normal's documentation can be found here. Hope this helps!






            share|improve this answer











            $endgroup$












            • $begingroup$
              thanks for your excellent explanation. do you mind post your code for the 2 figures?
              $endgroup$
              – baojieqh
              Jul 7 at 10:25






            • 1




              $begingroup$
              Edited the answer to include the code used for plotting :)
              $endgroup$
              – Yash Jakhotiya
              Jul 7 at 12:25















            3












            $begingroup$

            The K-means Error gives you, what is known as, total intra-cluster variance. K-means error



            Intra-cluster variance is the measure of how much are the points in a given cluster spread.



            The following cluster will have high intra-cluster variancesparse cluster



            In the image below, even though the number of points are same as that of the image above, the points are densely distributed and hence will have lower intra-cluster variance.
            enter image description here



            K-means Error is interested in the total of such individual cluster variances.



            Suppose for a given data, if clustering 'A' forms clusters like the first image and clustering 'B' forms clusters like the second image, you will in most cases choose the second one.



            Although this does not mean that the K-means Error is a perfect objective to optimize on to form clusters. But it pretty much catches the essence behind clustering.




            Code used for cluster plot generation -



            import numpy as np
            from matplotlib import pyplot as plt

            sparse_samples = np.random.multivariate_normal([0, 0], [[50000, 0], [0, 50000]], size=(1000))
            plt.plot(sparse_samples[:, 0], sparse_samples[:, 1], 'b+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()

            dense_samples = np.random.multivariate_normal([0, 0], [[5000, 0], [0, 5000]], size=(1000))
            plt.plot(dense_samples[:, 0], dense_samples[:, 1], 'r+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()


            In both cases, a 1000 datapoints from a Bivariate Normal Distribution are sampled and plotted . In the second case, the Covariance Matrix is changed to plot a denser cluster. np.random.multivariate_normal's documentation can be found here. Hope this helps!






            share|improve this answer











            $endgroup$












            • $begingroup$
              thanks for your excellent explanation. do you mind post your code for the 2 figures?
              $endgroup$
              – baojieqh
              Jul 7 at 10:25






            • 1




              $begingroup$
              Edited the answer to include the code used for plotting :)
              $endgroup$
              – Yash Jakhotiya
              Jul 7 at 12:25













            3












            3








            3





            $begingroup$

            The K-means Error gives you, what is known as, total intra-cluster variance. K-means error



            Intra-cluster variance is the measure of how much are the points in a given cluster spread.



            The following cluster will have high intra-cluster variancesparse cluster



            In the image below, even though the number of points are same as that of the image above, the points are densely distributed and hence will have lower intra-cluster variance.
            enter image description here



            K-means Error is interested in the total of such individual cluster variances.



            Suppose for a given data, if clustering 'A' forms clusters like the first image and clustering 'B' forms clusters like the second image, you will in most cases choose the second one.



            Although this does not mean that the K-means Error is a perfect objective to optimize on to form clusters. But it pretty much catches the essence behind clustering.




            Code used for cluster plot generation -



            import numpy as np
            from matplotlib import pyplot as plt

            sparse_samples = np.random.multivariate_normal([0, 0], [[50000, 0], [0, 50000]], size=(1000))
            plt.plot(sparse_samples[:, 0], sparse_samples[:, 1], 'b+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()

            dense_samples = np.random.multivariate_normal([0, 0], [[5000, 0], [0, 5000]], size=(1000))
            plt.plot(dense_samples[:, 0], dense_samples[:, 1], 'r+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()


            In both cases, a 1000 datapoints from a Bivariate Normal Distribution are sampled and plotted . In the second case, the Covariance Matrix is changed to plot a denser cluster. np.random.multivariate_normal's documentation can be found here. Hope this helps!






            share|improve this answer











            $endgroup$



            The K-means Error gives you, what is known as, total intra-cluster variance. K-means error



            Intra-cluster variance is the measure of how much are the points in a given cluster spread.



            The following cluster will have high intra-cluster variancesparse cluster



            In the image below, even though the number of points are same as that of the image above, the points are densely distributed and hence will have lower intra-cluster variance.
            enter image description here



            K-means Error is interested in the total of such individual cluster variances.



            Suppose for a given data, if clustering 'A' forms clusters like the first image and clustering 'B' forms clusters like the second image, you will in most cases choose the second one.



            Although this does not mean that the K-means Error is a perfect objective to optimize on to form clusters. But it pretty much catches the essence behind clustering.




            Code used for cluster plot generation -



            import numpy as np
            from matplotlib import pyplot as plt

            sparse_samples = np.random.multivariate_normal([0, 0], [[50000, 0], [0, 50000]], size=(1000))
            plt.plot(sparse_samples[:, 0], sparse_samples[:, 1], 'b+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()

            dense_samples = np.random.multivariate_normal([0, 0], [[5000, 0], [0, 5000]], size=(1000))
            plt.plot(dense_samples[:, 0], dense_samples[:, 1], 'r+')
            axes = plt.gca()
            axes.set_xlim(-1000, 1000)
            axes.set_ylim(-1000, 1000)
            plt.show()


            In both cases, a 1000 datapoints from a Bivariate Normal Distribution are sampled and plotted . In the second case, the Covariance Matrix is changed to plot a denser cluster. np.random.multivariate_normal's documentation can be found here. Hope this helps!







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jul 7 at 12:24

























            answered Jul 6 at 15:37









            Yash JakhotiyaYash Jakhotiya

            965 bronze badges




            965 bronze badges











            • $begingroup$
              thanks for your excellent explanation. do you mind post your code for the 2 figures?
              $endgroup$
              – baojieqh
              Jul 7 at 10:25






            • 1




              $begingroup$
              Edited the answer to include the code used for plotting :)
              $endgroup$
              – Yash Jakhotiya
              Jul 7 at 12:25
















            • $begingroup$
              thanks for your excellent explanation. do you mind post your code for the 2 figures?
              $endgroup$
              – baojieqh
              Jul 7 at 10:25






            • 1




              $begingroup$
              Edited the answer to include the code used for plotting :)
              $endgroup$
              – Yash Jakhotiya
              Jul 7 at 12:25















            $begingroup$
            thanks for your excellent explanation. do you mind post your code for the 2 figures?
            $endgroup$
            – baojieqh
            Jul 7 at 10:25




            $begingroup$
            thanks for your excellent explanation. do you mind post your code for the 2 figures?
            $endgroup$
            – baojieqh
            Jul 7 at 10:25




            1




            1




            $begingroup$
            Edited the answer to include the code used for plotting :)
            $endgroup$
            – Yash Jakhotiya
            Jul 7 at 12:25




            $begingroup$
            Edited the answer to include the code used for plotting :)
            $endgroup$
            – Yash Jakhotiya
            Jul 7 at 12:25













            0












            $begingroup$

            The meaning of "error" in k-Means clustering is: how much discrepancy / loss of information would you get if you substitute the k centroids to the actual observations. In other words: how good the k centroids approximate your data.



            There are several ways you can measure this "error". Usually the percentage of variance explained or the within cluster sum of errors are employed, but the choice is huge. Even a more banal Euclidean distance could work.






            share|improve this answer









            $endgroup$

















              0












              $begingroup$

              The meaning of "error" in k-Means clustering is: how much discrepancy / loss of information would you get if you substitute the k centroids to the actual observations. In other words: how good the k centroids approximate your data.



              There are several ways you can measure this "error". Usually the percentage of variance explained or the within cluster sum of errors are employed, but the choice is huge. Even a more banal Euclidean distance could work.






              share|improve this answer









              $endgroup$















                0












                0








                0





                $begingroup$

                The meaning of "error" in k-Means clustering is: how much discrepancy / loss of information would you get if you substitute the k centroids to the actual observations. In other words: how good the k centroids approximate your data.



                There are several ways you can measure this "error". Usually the percentage of variance explained or the within cluster sum of errors are employed, but the choice is huge. Even a more banal Euclidean distance could work.






                share|improve this answer









                $endgroup$



                The meaning of "error" in k-Means clustering is: how much discrepancy / loss of information would you get if you substitute the k centroids to the actual observations. In other words: how good the k centroids approximate your data.



                There are several ways you can measure this "error". Usually the percentage of variance explained or the within cluster sum of errors are employed, but the choice is huge. Even a more banal Euclidean distance could work.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 6 at 15:35









                LeevoLeevo

                8701 silver badge14 bronze badges




                8701 silver badge14 bronze badges



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f55179%2fis-there-a-real-life-meaning-about-kmeans-error%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

                    Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                    Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form