What is an entropy graphDecompress and Analyse VMWARE EFI64 biosRemoving randomness from program execution

Do intermediate subdomains need to exist?

Why do airports remove/realign runways?

Why no parachutes in the Orion AA2 abort test?

When is one 'Ready' to make Original Contributions to Mathematics?

What was the significance of Spider-Man: Far From Home being an MCU Phase 3 film instead of a Phase 4 film?

Can a USB hub be used to access a drive from two devices?

How can I use my cell phone's light as a reading light?

What's the big deal about the Nazgûl losing their horses?

Shipped package arrived - didn't order, possible scam?

How to play a D major chord lower than the open E major chord on guitar?

Find max number you can create from an array of numbers

Why do Martians have to wear space helmets?

Why do people prefer metropolitan areas, considering monsters and villains?

Soda water first stored in refrigerator and then at room temperature

Can you create a free-floating MASYU puzzle?

How predictable is $RANDOM really?

Are "confidant" and "confident" homophones?

How can I align nodes and have arrows of the same length in tikz-cd?

Why does mean tend be more stable in different samples than median?

How do I check that users don't write down their passwords?

Park the computer

What's the difference between a type and a kind?

How to deal with a Murder Hobo Paladin?

White's last move?



What is an entropy graph


Decompress and Analyse VMWARE EFI64 biosRemoving randomness from program execution













3















I am new to reversing and I see a tool Detect It Easy and it has a feature called Entropy. I want to know what it is used for?










share|improve this question




























    3















    I am new to reversing and I see a tool Detect It Easy and it has a feature called Entropy. I want to know what it is used for?










    share|improve this question


























      3












      3








      3








      I am new to reversing and I see a tool Detect It Easy and it has a feature called Entropy. I want to know what it is used for?










      share|improve this question
















      I am new to reversing and I see a tool Detect It Easy and it has a feature called Entropy. I want to know what it is used for?







      entropy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jun 27 at 11:56









      0xC0000022L

      8,1666 gold badges31 silver badges64 bronze badges




      8,1666 gold badges31 silver badges64 bronze badges










      asked Jun 27 at 2:21









      Suman MandalSuman Mandal

      183 bronze badges




      183 bronze badges




















          5 Answers
          5






          active

          oldest

          votes


















          2















          it has a feature called Entropy. I want to know what it is used for?




          For our purposes, entropy can be though of as information density or as a measure of randomness in information, which is what makes it useful in the context of reverse engineering and binary analysis.



          Compressed and encrypted data have higher entropy than e.g. code or text data. In fact, compressed and encrypted data have close to the maximum possible level of entropy, which can be used as a heuristic to identify it as such in order to differentiate it from non-compressed/non-encrypted data.



          Example use cases in reverse engineering:



          • Malware Analysis - If we have an executable which has a header that can be parsed successfully and the program loads and runs without error, but the overall entropy level of the file is very high and the code can't be analyzed statically because the data outside of the file header and program headers looks random (hence the high entropy), it probably means that the executable is in fact compressed on disk and is decompressed at runtime. Executable compression complicates analysis, so it is a relatively common feature of programs developed for criminal purposes. If we want to analyze the code, its decompressed form need to be recovered somehow.


          • Firmware Analysis - In systems with relatively severe hardware constraints, such as embedded systems, firmware updates are often delivered in compressed form in order to save space. In order to analyse the firmware, it first needs to be determined whether it is encrypted or compressed. One way to determine this is through performing an entropy analysis of the file. If the entropy is very high, it is a good sign that the file is indeed compressed or encrypted. To proceed with analysis of the actual firmware, it must first be decompressed/decrypted. If we have a block of data with very high entropy (i.e. close to random), it makes no sense to try to treat it as code and disassemble it, because the results will be meaningless nonsense.


          • File Type Identification - Some file types can be identified on the basis of their overall entropy. For example, we can usually differentiate between image files (png, jpeg, etc) and compiled binaries (ELF, PE) because image files consist of compressed data and therefore (generally) have much higher entropy than compiled binaries.


          Besides "Detect It Easy", tools such as binwalk, ent and binvis.io can assist with calculating file entropy. You can also build your own tools that do this.






          share|improve this answer






























            5














            Entropy is interpreted as the Degree of Disorder or Randomness



            a high entropy means a highly disordered set of data



            a low entropy means an ordered set of data



            to address the comments

            order here does not mean 'a' following 'a' kind of order it is to be interpreted as random / non random state of certain data



            aaaabbbbccccdddd or "abcdabcdabcdabcd" or "adbcadbcadbcadbc" is a repetitive string whose entropy will be greater than
            aaaaaaaabbbbcccd or any shuffled representation of this string



            in the first string and its shuffled clones all have 4 chars with equal probability 4/16 or 1/4 or 25%

            but in the second string char 'a' (8/16 ) or half of the data set has the highest probability

            while 'c' (1/16) has the least or a very minuscule probability



            entropy is a thermodynamic concept that was introduced to digital science (information theory)
            as a means to calculate how random a set of data is



            simply put the highest compressed data will have the highest entropy



            where all the 255 possible bytes will have equal frequencies



            ie if 0x00 was seen 10 times in a blob
            0x10 or 0x80 or 0xff will all be seen 10 times in the same blob



            that is the blob will be a repeated sequence comprising of all bytes between of 0x0..0xff



            while a low entropy blob will have a repeated sequence comprising only of a certain byte like 0x00 0r 0x55 or two bytes 0x0d0a ox222e etc or any series one less than 255 possible byte sequences



            taking an algo from here and modifying it a little



            import math
            from collections import Counter
            base =
            'shannon' : 2.,
            'natural' : math.exp(1),
            'hartley' : 10.,
            'somrand' : 256.

            def eta(data, unit):
            if len(data) <= 1:
            return 0
            counts = Counter()
            for d in data:
            counts[d] += 1
            ent = 0
            probs = [float(c) / len(data) for c in counts.values()]
            for p in probs:
            if p > 0.:
            ent -= p * math.log(p, base[unit])
            return ent
            hes = "abcdex80x90xffxfexde"
            les = "aaaaax61x61x61x61x61"
            print ("=======================================================================================================")
            print (" type ent for hes hes ent for les les")
            print ("=======================================================================================================")
            for i in base:
            for j in range(1,4,1):
            print (i ,' ', eta( j*hes,i) , 't', (hes*j + (30 -j *10) *" " ) , ' ' , eta (j*les , i) ,'t', ("%s" % les*j ))


            you can see 'abcdex80.....' is high entropy while 'aaaaax61...' is low entropy



            :>python foo.py
            =======================================================================================================
            type ent for hes hes ent for les les
            =======================================================================================================
            shannon 3.321928094887362 abcdeÿþÞ 0.0 aaaaaaaaaa
            shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
            shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
            natural 2.3025850929940455 abcdeÿþÞ 0.0 aaaaaaaaaa
            natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
            natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
            hartley 0.9999999999999998 abcdeÿþÞ 0.0 aaaaaaaaaa
            hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
            hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
            somrand 0.4152410118609203 abcdeÿþÞ 0.0 aaaaaaaaaa
            somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
            somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa





            share|improve this answer




















            • 3





              "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

              – julian
              Jun 27 at 13:09











            • "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

              – julian
              Jun 27 at 13:12






            • 1





              @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

              – blabb
              Jun 27 at 16:47











            • You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

              – julian
              Jun 27 at 16:54











            • @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

              – perror
              Jun 28 at 7:50



















            4














            Just to add (small) piece of information to @blabb and @Johann Aydinbas answers, here is the cite from Practical Malware Analysis book regarding your question:




            Packed executables can also be detected via a technique known as entropy
            calculation. Entropy is a measure of the disorder in a system or program [...]



            Compressed or encrypted data more closely resembles random data,
            and therefore has high entropy; executables that are not encrypted or compressed have lower entropy. Automated tools for detecting packed programs often use heuristics like entropy.




            You can find additional information here, under Increased entropy header.






            share|improve this answer






























              3














              Shannon's entropy comes from information theory. It is the measure of degree of randomness of text. If a string has greater Shannon's entropy it means it's a strong password. Principally, Shannon entropy equation provides a way to predict the average minimum number of bits required to encode a string of symbols, based on the frequency of the symbols.



              Formula for base 2



              Note that the base represents the number of possible characters. Base 2 can be replaced by any base. As can be seen in this code where it's replaced by 255.



              This link has a simplest implementation of the algorithm for calculating entropy of novels and religious books. It tells us a lot. For example, that all the human generated books have nearly identical degree of fluctuation between disorder. It is a good feature of data.
              This is the link to code mentioned above.
              Information Entropy of different Books






              share|improve this answer






























                0














                First, you have to know that the term entropy is used to refer to two different concepts which are somehow related if you think twice, but as it is really not obvious at first sight, you should prefer to consider these two as different concepts.



                Defining Entropy ?



                The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system.



                On the other hand, the other entropy is coming from information theory and can be seen as a measure of the amount of information that can be stored in a system.



                Why is it useful in RE ?



                An entropy graph (to evaluate the amount of disorder) can be useful to detect the parts of the file that get close to random data. It will allow to detect the parts that have been encrypted/compressed and the parts that appear to be left untouched.



                Indeed, a high disorder in data is exactly what you want to achieve when encrypting data. And, I told you that the two entropy definitions were related, if you store a lot of information in a minimum of bytes, it appears to be with a high level of disorder, so is compression...



                That is why we use entropy graphs of files, be able to distinguish raw parts from encrypted/compressed sub-parts without any prior information of the file format.



                An Example



                For example, here is an entropy graph from the tool binwalk coming from another question from here:



                Binwalk entropy graph



                Directly from this graph we can see that there is a first part that appear to be raw (probably asm opcodes if we look at the shape of the curve), then a part which is much likely encrypted (compression does not reach an entropy of 1 with such regularity usually), and finally padding with always the same byte (e.g. 0x00 or 0xff).






                share|improve this answer




















                • 2





                  "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                  – julian
                  Jun 27 at 12:02






                • 2





                  If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                  – julian
                  Jun 27 at 12:05







                • 1





                  "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                  – julian
                  Jun 27 at 13:14






                • 1





                  it’s nothing personal. It’s just not correct.

                  – julian
                  Jun 27 at 16:08






                • 1





                  It looks personal, just by the way you are harassing me with that.

                  – perror
                  Jun 27 at 16:09














                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "489"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f21555%2fwhat-is-an-entropy-graph%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2















                it has a feature called Entropy. I want to know what it is used for?




                For our purposes, entropy can be though of as information density or as a measure of randomness in information, which is what makes it useful in the context of reverse engineering and binary analysis.



                Compressed and encrypted data have higher entropy than e.g. code or text data. In fact, compressed and encrypted data have close to the maximum possible level of entropy, which can be used as a heuristic to identify it as such in order to differentiate it from non-compressed/non-encrypted data.



                Example use cases in reverse engineering:



                • Malware Analysis - If we have an executable which has a header that can be parsed successfully and the program loads and runs without error, but the overall entropy level of the file is very high and the code can't be analyzed statically because the data outside of the file header and program headers looks random (hence the high entropy), it probably means that the executable is in fact compressed on disk and is decompressed at runtime. Executable compression complicates analysis, so it is a relatively common feature of programs developed for criminal purposes. If we want to analyze the code, its decompressed form need to be recovered somehow.


                • Firmware Analysis - In systems with relatively severe hardware constraints, such as embedded systems, firmware updates are often delivered in compressed form in order to save space. In order to analyse the firmware, it first needs to be determined whether it is encrypted or compressed. One way to determine this is through performing an entropy analysis of the file. If the entropy is very high, it is a good sign that the file is indeed compressed or encrypted. To proceed with analysis of the actual firmware, it must first be decompressed/decrypted. If we have a block of data with very high entropy (i.e. close to random), it makes no sense to try to treat it as code and disassemble it, because the results will be meaningless nonsense.


                • File Type Identification - Some file types can be identified on the basis of their overall entropy. For example, we can usually differentiate between image files (png, jpeg, etc) and compiled binaries (ELF, PE) because image files consist of compressed data and therefore (generally) have much higher entropy than compiled binaries.


                Besides "Detect It Easy", tools such as binwalk, ent and binvis.io can assist with calculating file entropy. You can also build your own tools that do this.






                share|improve this answer



























                  2















                  it has a feature called Entropy. I want to know what it is used for?




                  For our purposes, entropy can be though of as information density or as a measure of randomness in information, which is what makes it useful in the context of reverse engineering and binary analysis.



                  Compressed and encrypted data have higher entropy than e.g. code or text data. In fact, compressed and encrypted data have close to the maximum possible level of entropy, which can be used as a heuristic to identify it as such in order to differentiate it from non-compressed/non-encrypted data.



                  Example use cases in reverse engineering:



                  • Malware Analysis - If we have an executable which has a header that can be parsed successfully and the program loads and runs without error, but the overall entropy level of the file is very high and the code can't be analyzed statically because the data outside of the file header and program headers looks random (hence the high entropy), it probably means that the executable is in fact compressed on disk and is decompressed at runtime. Executable compression complicates analysis, so it is a relatively common feature of programs developed for criminal purposes. If we want to analyze the code, its decompressed form need to be recovered somehow.


                  • Firmware Analysis - In systems with relatively severe hardware constraints, such as embedded systems, firmware updates are often delivered in compressed form in order to save space. In order to analyse the firmware, it first needs to be determined whether it is encrypted or compressed. One way to determine this is through performing an entropy analysis of the file. If the entropy is very high, it is a good sign that the file is indeed compressed or encrypted. To proceed with analysis of the actual firmware, it must first be decompressed/decrypted. If we have a block of data with very high entropy (i.e. close to random), it makes no sense to try to treat it as code and disassemble it, because the results will be meaningless nonsense.


                  • File Type Identification - Some file types can be identified on the basis of their overall entropy. For example, we can usually differentiate between image files (png, jpeg, etc) and compiled binaries (ELF, PE) because image files consist of compressed data and therefore (generally) have much higher entropy than compiled binaries.


                  Besides "Detect It Easy", tools such as binwalk, ent and binvis.io can assist with calculating file entropy. You can also build your own tools that do this.






                  share|improve this answer

























                    2












                    2








                    2








                    it has a feature called Entropy. I want to know what it is used for?




                    For our purposes, entropy can be though of as information density or as a measure of randomness in information, which is what makes it useful in the context of reverse engineering and binary analysis.



                    Compressed and encrypted data have higher entropy than e.g. code or text data. In fact, compressed and encrypted data have close to the maximum possible level of entropy, which can be used as a heuristic to identify it as such in order to differentiate it from non-compressed/non-encrypted data.



                    Example use cases in reverse engineering:



                    • Malware Analysis - If we have an executable which has a header that can be parsed successfully and the program loads and runs without error, but the overall entropy level of the file is very high and the code can't be analyzed statically because the data outside of the file header and program headers looks random (hence the high entropy), it probably means that the executable is in fact compressed on disk and is decompressed at runtime. Executable compression complicates analysis, so it is a relatively common feature of programs developed for criminal purposes. If we want to analyze the code, its decompressed form need to be recovered somehow.


                    • Firmware Analysis - In systems with relatively severe hardware constraints, such as embedded systems, firmware updates are often delivered in compressed form in order to save space. In order to analyse the firmware, it first needs to be determined whether it is encrypted or compressed. One way to determine this is through performing an entropy analysis of the file. If the entropy is very high, it is a good sign that the file is indeed compressed or encrypted. To proceed with analysis of the actual firmware, it must first be decompressed/decrypted. If we have a block of data with very high entropy (i.e. close to random), it makes no sense to try to treat it as code and disassemble it, because the results will be meaningless nonsense.


                    • File Type Identification - Some file types can be identified on the basis of their overall entropy. For example, we can usually differentiate between image files (png, jpeg, etc) and compiled binaries (ELF, PE) because image files consist of compressed data and therefore (generally) have much higher entropy than compiled binaries.


                    Besides "Detect It Easy", tools such as binwalk, ent and binvis.io can assist with calculating file entropy. You can also build your own tools that do this.






                    share|improve this answer














                    it has a feature called Entropy. I want to know what it is used for?




                    For our purposes, entropy can be though of as information density or as a measure of randomness in information, which is what makes it useful in the context of reverse engineering and binary analysis.



                    Compressed and encrypted data have higher entropy than e.g. code or text data. In fact, compressed and encrypted data have close to the maximum possible level of entropy, which can be used as a heuristic to identify it as such in order to differentiate it from non-compressed/non-encrypted data.



                    Example use cases in reverse engineering:



                    • Malware Analysis - If we have an executable which has a header that can be parsed successfully and the program loads and runs without error, but the overall entropy level of the file is very high and the code can't be analyzed statically because the data outside of the file header and program headers looks random (hence the high entropy), it probably means that the executable is in fact compressed on disk and is decompressed at runtime. Executable compression complicates analysis, so it is a relatively common feature of programs developed for criminal purposes. If we want to analyze the code, its decompressed form need to be recovered somehow.


                    • Firmware Analysis - In systems with relatively severe hardware constraints, such as embedded systems, firmware updates are often delivered in compressed form in order to save space. In order to analyse the firmware, it first needs to be determined whether it is encrypted or compressed. One way to determine this is through performing an entropy analysis of the file. If the entropy is very high, it is a good sign that the file is indeed compressed or encrypted. To proceed with analysis of the actual firmware, it must first be decompressed/decrypted. If we have a block of data with very high entropy (i.e. close to random), it makes no sense to try to treat it as code and disassemble it, because the results will be meaningless nonsense.


                    • File Type Identification - Some file types can be identified on the basis of their overall entropy. For example, we can usually differentiate between image files (png, jpeg, etc) and compiled binaries (ELF, PE) because image files consist of compressed data and therefore (generally) have much higher entropy than compiled binaries.


                    Besides "Detect It Easy", tools such as binwalk, ent and binvis.io can assist with calculating file entropy. You can also build your own tools that do this.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jun 27 at 14:10









                    julianjulian

                    4,4092 gold badges11 silver badges42 bronze badges




                    4,4092 gold badges11 silver badges42 bronze badges





















                        5














                        Entropy is interpreted as the Degree of Disorder or Randomness



                        a high entropy means a highly disordered set of data



                        a low entropy means an ordered set of data



                        to address the comments

                        order here does not mean 'a' following 'a' kind of order it is to be interpreted as random / non random state of certain data



                        aaaabbbbccccdddd or "abcdabcdabcdabcd" or "adbcadbcadbcadbc" is a repetitive string whose entropy will be greater than
                        aaaaaaaabbbbcccd or any shuffled representation of this string



                        in the first string and its shuffled clones all have 4 chars with equal probability 4/16 or 1/4 or 25%

                        but in the second string char 'a' (8/16 ) or half of the data set has the highest probability

                        while 'c' (1/16) has the least or a very minuscule probability



                        entropy is a thermodynamic concept that was introduced to digital science (information theory)
                        as a means to calculate how random a set of data is



                        simply put the highest compressed data will have the highest entropy



                        where all the 255 possible bytes will have equal frequencies



                        ie if 0x00 was seen 10 times in a blob
                        0x10 or 0x80 or 0xff will all be seen 10 times in the same blob



                        that is the blob will be a repeated sequence comprising of all bytes between of 0x0..0xff



                        while a low entropy blob will have a repeated sequence comprising only of a certain byte like 0x00 0r 0x55 or two bytes 0x0d0a ox222e etc or any series one less than 255 possible byte sequences



                        taking an algo from here and modifying it a little



                        import math
                        from collections import Counter
                        base =
                        'shannon' : 2.,
                        'natural' : math.exp(1),
                        'hartley' : 10.,
                        'somrand' : 256.

                        def eta(data, unit):
                        if len(data) <= 1:
                        return 0
                        counts = Counter()
                        for d in data:
                        counts[d] += 1
                        ent = 0
                        probs = [float(c) / len(data) for c in counts.values()]
                        for p in probs:
                        if p > 0.:
                        ent -= p * math.log(p, base[unit])
                        return ent
                        hes = "abcdex80x90xffxfexde"
                        les = "aaaaax61x61x61x61x61"
                        print ("=======================================================================================================")
                        print (" type ent for hes hes ent for les les")
                        print ("=======================================================================================================")
                        for i in base:
                        for j in range(1,4,1):
                        print (i ,' ', eta( j*hes,i) , 't', (hes*j + (30 -j *10) *" " ) , ' ' , eta (j*les , i) ,'t', ("%s" % les*j ))


                        you can see 'abcdex80.....' is high entropy while 'aaaaax61...' is low entropy



                        :>python foo.py
                        =======================================================================================================
                        type ent for hes hes ent for les les
                        =======================================================================================================
                        shannon 3.321928094887362 abcdeÿþÞ 0.0 aaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞ 0.0 aaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞ 0.0 aaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞ 0.0 aaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa





                        share|improve this answer




















                        • 3





                          "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                          – julian
                          Jun 27 at 13:09











                        • "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                          – julian
                          Jun 27 at 13:12






                        • 1





                          @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                          – blabb
                          Jun 27 at 16:47











                        • You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                          – julian
                          Jun 27 at 16:54











                        • @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                          – perror
                          Jun 28 at 7:50
















                        5














                        Entropy is interpreted as the Degree of Disorder or Randomness



                        a high entropy means a highly disordered set of data



                        a low entropy means an ordered set of data



                        to address the comments

                        order here does not mean 'a' following 'a' kind of order it is to be interpreted as random / non random state of certain data



                        aaaabbbbccccdddd or "abcdabcdabcdabcd" or "adbcadbcadbcadbc" is a repetitive string whose entropy will be greater than
                        aaaaaaaabbbbcccd or any shuffled representation of this string



                        in the first string and its shuffled clones all have 4 chars with equal probability 4/16 or 1/4 or 25%

                        but in the second string char 'a' (8/16 ) or half of the data set has the highest probability

                        while 'c' (1/16) has the least or a very minuscule probability



                        entropy is a thermodynamic concept that was introduced to digital science (information theory)
                        as a means to calculate how random a set of data is



                        simply put the highest compressed data will have the highest entropy



                        where all the 255 possible bytes will have equal frequencies



                        ie if 0x00 was seen 10 times in a blob
                        0x10 or 0x80 or 0xff will all be seen 10 times in the same blob



                        that is the blob will be a repeated sequence comprising of all bytes between of 0x0..0xff



                        while a low entropy blob will have a repeated sequence comprising only of a certain byte like 0x00 0r 0x55 or two bytes 0x0d0a ox222e etc or any series one less than 255 possible byte sequences



                        taking an algo from here and modifying it a little



                        import math
                        from collections import Counter
                        base =
                        'shannon' : 2.,
                        'natural' : math.exp(1),
                        'hartley' : 10.,
                        'somrand' : 256.

                        def eta(data, unit):
                        if len(data) <= 1:
                        return 0
                        counts = Counter()
                        for d in data:
                        counts[d] += 1
                        ent = 0
                        probs = [float(c) / len(data) for c in counts.values()]
                        for p in probs:
                        if p > 0.:
                        ent -= p * math.log(p, base[unit])
                        return ent
                        hes = "abcdex80x90xffxfexde"
                        les = "aaaaax61x61x61x61x61"
                        print ("=======================================================================================================")
                        print (" type ent for hes hes ent for les les")
                        print ("=======================================================================================================")
                        for i in base:
                        for j in range(1,4,1):
                        print (i ,' ', eta( j*hes,i) , 't', (hes*j + (30 -j *10) *" " ) , ' ' , eta (j*les , i) ,'t', ("%s" % les*j ))


                        you can see 'abcdex80.....' is high entropy while 'aaaaax61...' is low entropy



                        :>python foo.py
                        =======================================================================================================
                        type ent for hes hes ent for les les
                        =======================================================================================================
                        shannon 3.321928094887362 abcdeÿþÞ 0.0 aaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞ 0.0 aaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞ 0.0 aaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞ 0.0 aaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa





                        share|improve this answer




















                        • 3





                          "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                          – julian
                          Jun 27 at 13:09











                        • "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                          – julian
                          Jun 27 at 13:12






                        • 1





                          @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                          – blabb
                          Jun 27 at 16:47











                        • You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                          – julian
                          Jun 27 at 16:54











                        • @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                          – perror
                          Jun 28 at 7:50














                        5












                        5








                        5







                        Entropy is interpreted as the Degree of Disorder or Randomness



                        a high entropy means a highly disordered set of data



                        a low entropy means an ordered set of data



                        to address the comments

                        order here does not mean 'a' following 'a' kind of order it is to be interpreted as random / non random state of certain data



                        aaaabbbbccccdddd or "abcdabcdabcdabcd" or "adbcadbcadbcadbc" is a repetitive string whose entropy will be greater than
                        aaaaaaaabbbbcccd or any shuffled representation of this string



                        in the first string and its shuffled clones all have 4 chars with equal probability 4/16 or 1/4 or 25%

                        but in the second string char 'a' (8/16 ) or half of the data set has the highest probability

                        while 'c' (1/16) has the least or a very minuscule probability



                        entropy is a thermodynamic concept that was introduced to digital science (information theory)
                        as a means to calculate how random a set of data is



                        simply put the highest compressed data will have the highest entropy



                        where all the 255 possible bytes will have equal frequencies



                        ie if 0x00 was seen 10 times in a blob
                        0x10 or 0x80 or 0xff will all be seen 10 times in the same blob



                        that is the blob will be a repeated sequence comprising of all bytes between of 0x0..0xff



                        while a low entropy blob will have a repeated sequence comprising only of a certain byte like 0x00 0r 0x55 or two bytes 0x0d0a ox222e etc or any series one less than 255 possible byte sequences



                        taking an algo from here and modifying it a little



                        import math
                        from collections import Counter
                        base =
                        'shannon' : 2.,
                        'natural' : math.exp(1),
                        'hartley' : 10.,
                        'somrand' : 256.

                        def eta(data, unit):
                        if len(data) <= 1:
                        return 0
                        counts = Counter()
                        for d in data:
                        counts[d] += 1
                        ent = 0
                        probs = [float(c) / len(data) for c in counts.values()]
                        for p in probs:
                        if p > 0.:
                        ent -= p * math.log(p, base[unit])
                        return ent
                        hes = "abcdex80x90xffxfexde"
                        les = "aaaaax61x61x61x61x61"
                        print ("=======================================================================================================")
                        print (" type ent for hes hes ent for les les")
                        print ("=======================================================================================================")
                        for i in base:
                        for j in range(1,4,1):
                        print (i ,' ', eta( j*hes,i) , 't', (hes*j + (30 -j *10) *" " ) , ' ' , eta (j*les , i) ,'t', ("%s" % les*j ))


                        you can see 'abcdex80.....' is high entropy while 'aaaaax61...' is low entropy



                        :>python foo.py
                        =======================================================================================================
                        type ent for hes hes ent for les les
                        =======================================================================================================
                        shannon 3.321928094887362 abcdeÿþÞ 0.0 aaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞ 0.0 aaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞ 0.0 aaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞ 0.0 aaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa





                        share|improve this answer















                        Entropy is interpreted as the Degree of Disorder or Randomness



                        a high entropy means a highly disordered set of data



                        a low entropy means an ordered set of data



                        to address the comments

                        order here does not mean 'a' following 'a' kind of order it is to be interpreted as random / non random state of certain data



                        aaaabbbbccccdddd or "abcdabcdabcdabcd" or "adbcadbcadbcadbc" is a repetitive string whose entropy will be greater than
                        aaaaaaaabbbbcccd or any shuffled representation of this string



                        in the first string and its shuffled clones all have 4 chars with equal probability 4/16 or 1/4 or 25%

                        but in the second string char 'a' (8/16 ) or half of the data set has the highest probability

                        while 'c' (1/16) has the least or a very minuscule probability



                        entropy is a thermodynamic concept that was introduced to digital science (information theory)
                        as a means to calculate how random a set of data is



                        simply put the highest compressed data will have the highest entropy



                        where all the 255 possible bytes will have equal frequencies



                        ie if 0x00 was seen 10 times in a blob
                        0x10 or 0x80 or 0xff will all be seen 10 times in the same blob



                        that is the blob will be a repeated sequence comprising of all bytes between of 0x0..0xff



                        while a low entropy blob will have a repeated sequence comprising only of a certain byte like 0x00 0r 0x55 or two bytes 0x0d0a ox222e etc or any series one less than 255 possible byte sequences



                        taking an algo from here and modifying it a little



                        import math
                        from collections import Counter
                        base =
                        'shannon' : 2.,
                        'natural' : math.exp(1),
                        'hartley' : 10.,
                        'somrand' : 256.

                        def eta(data, unit):
                        if len(data) <= 1:
                        return 0
                        counts = Counter()
                        for d in data:
                        counts[d] += 1
                        ent = 0
                        probs = [float(c) / len(data) for c in counts.values()]
                        for p in probs:
                        if p > 0.:
                        ent -= p * math.log(p, base[unit])
                        return ent
                        hes = "abcdex80x90xffxfexde"
                        les = "aaaaax61x61x61x61x61"
                        print ("=======================================================================================================")
                        print (" type ent for hes hes ent for les les")
                        print ("=======================================================================================================")
                        for i in base:
                        for j in range(1,4,1):
                        print (i ,' ', eta( j*hes,i) , 't', (hes*j + (30 -j *10) *" " ) , ' ' , eta (j*les , i) ,'t', ("%s" % les*j ))


                        you can see 'abcdex80.....' is high entropy while 'aaaaax61...' is low entropy



                        :>python foo.py
                        =======================================================================================================
                        type ent for hes hes ent for les les
                        =======================================================================================================
                        shannon 3.321928094887362 abcdeÿþÞ 0.0 aaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        shannon 3.321928094887362 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞ 0.0 aaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        natural 2.3025850929940455 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞ 0.0 aaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        hartley 0.9999999999999998 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞ 0.0 aaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaa
                        somrand 0.4152410118609203 abcdeÿþÞabcdeÿþÞabcdeÿþÞ 0.0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Jun 27 at 17:59

























                        answered Jun 27 at 7:19









                        blabbblabb

                        9,7731 gold badge7 silver badges24 bronze badges




                        9,7731 gold badge7 silver badges24 bronze badges







                        • 3





                          "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                          – julian
                          Jun 27 at 13:09











                        • "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                          – julian
                          Jun 27 at 13:12






                        • 1





                          @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                          – blabb
                          Jun 27 at 16:47











                        • You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                          – julian
                          Jun 27 at 16:54











                        • @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                          – perror
                          Jun 28 at 7:50













                        • 3





                          "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                          – julian
                          Jun 27 at 13:09











                        • "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                          – julian
                          Jun 27 at 13:12






                        • 1





                          @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                          – blabb
                          Jun 27 at 16:47











                        • You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                          – julian
                          Jun 27 at 16:54











                        • @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                          – perror
                          Jun 28 at 7:50








                        3




                        3





                        "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                        – julian
                        Jun 27 at 13:09





                        "a high entropy means a highly disordered set of data a low entropy means an ordered set of data" <- This is a false statement. Order is not relevant, because entropy is calculated over a distribution where each value in that distribution has a probability associated with it. Compressed and encrypted data have high entropy because the probability associated with each byte value in the distribution is roughly equal (the distribution of byte values in the data is close to uniform), not because of the order the byte values appear in the bytestream.

                        – julian
                        Jun 27 at 13:09













                        "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                        – julian
                        Jun 27 at 13:12





                        "where all the 255 possible bytes will have equal frequencies" <- You probably meant "where all byte values between 0 and 255 (256 total) have an equal probability in the overall distribution" (the frequency of each byte value between 0-255 is the same in the distribution).

                        – julian
                        Jun 27 at 13:12




                        1




                        1





                        @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                        – blabb
                        Jun 27 at 16:47





                        @julian what do you mean by order ? like a follows a b => b kind of order ? order in my answer does not mean a sorted / sequential / non sequential data i meant it as in an orderly / non random state the most random repetitive data where the count of each value tends to be equal has the highest entropy it may be a (military type ordered / sorted set like aaaabbbbccccdddd 4[a,b,c,d] but this will tend to have an entropy greater than aaaaaaaabbbbbccd 8[a],4[b],2[c],1[d] here is a theory link using hard technical words en.wikipedia.org/wiki/Entropy_(information_theory)

                        – blabb
                        Jun 27 at 16:47













                        You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                        – julian
                        Jun 27 at 16:54





                        You said “ordered set”, so maybe I misunderstood your meaning. Anyway, your main point about the relationship between entropy and randomness is correct.

                        – julian
                        Jun 27 at 16:54













                        @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                        – perror
                        Jun 28 at 7:50






                        @julian: What are you saying ??? I do think that blabb is correct. You seems to be stuck on only one definition of entropy, but they are two... which are both perfectly valid. So, stop yelling at everyone please.

                        – perror
                        Jun 28 at 7:50












                        4














                        Just to add (small) piece of information to @blabb and @Johann Aydinbas answers, here is the cite from Practical Malware Analysis book regarding your question:




                        Packed executables can also be detected via a technique known as entropy
                        calculation. Entropy is a measure of the disorder in a system or program [...]



                        Compressed or encrypted data more closely resembles random data,
                        and therefore has high entropy; executables that are not encrypted or compressed have lower entropy. Automated tools for detecting packed programs often use heuristics like entropy.




                        You can find additional information here, under Increased entropy header.






                        share|improve this answer



























                          4














                          Just to add (small) piece of information to @blabb and @Johann Aydinbas answers, here is the cite from Practical Malware Analysis book regarding your question:




                          Packed executables can also be detected via a technique known as entropy
                          calculation. Entropy is a measure of the disorder in a system or program [...]



                          Compressed or encrypted data more closely resembles random data,
                          and therefore has high entropy; executables that are not encrypted or compressed have lower entropy. Automated tools for detecting packed programs often use heuristics like entropy.




                          You can find additional information here, under Increased entropy header.






                          share|improve this answer

























                            4












                            4








                            4







                            Just to add (small) piece of information to @blabb and @Johann Aydinbas answers, here is the cite from Practical Malware Analysis book regarding your question:




                            Packed executables can also be detected via a technique known as entropy
                            calculation. Entropy is a measure of the disorder in a system or program [...]



                            Compressed or encrypted data more closely resembles random data,
                            and therefore has high entropy; executables that are not encrypted or compressed have lower entropy. Automated tools for detecting packed programs often use heuristics like entropy.




                            You can find additional information here, under Increased entropy header.






                            share|improve this answer













                            Just to add (small) piece of information to @blabb and @Johann Aydinbas answers, here is the cite from Practical Malware Analysis book regarding your question:




                            Packed executables can also be detected via a technique known as entropy
                            calculation. Entropy is a measure of the disorder in a system or program [...]



                            Compressed or encrypted data more closely resembles random data,
                            and therefore has high entropy; executables that are not encrypted or compressed have lower entropy. Automated tools for detecting packed programs often use heuristics like entropy.




                            You can find additional information here, under Increased entropy header.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jun 27 at 8:30









                            bart1ebart1e

                            8621 gold badge1 silver badge12 bronze badges




                            8621 gold badge1 silver badge12 bronze badges





















                                3














                                Shannon's entropy comes from information theory. It is the measure of degree of randomness of text. If a string has greater Shannon's entropy it means it's a strong password. Principally, Shannon entropy equation provides a way to predict the average minimum number of bits required to encode a string of symbols, based on the frequency of the symbols.



                                Formula for base 2



                                Note that the base represents the number of possible characters. Base 2 can be replaced by any base. As can be seen in this code where it's replaced by 255.



                                This link has a simplest implementation of the algorithm for calculating entropy of novels and religious books. It tells us a lot. For example, that all the human generated books have nearly identical degree of fluctuation between disorder. It is a good feature of data.
                                This is the link to code mentioned above.
                                Information Entropy of different Books






                                share|improve this answer



























                                  3














                                  Shannon's entropy comes from information theory. It is the measure of degree of randomness of text. If a string has greater Shannon's entropy it means it's a strong password. Principally, Shannon entropy equation provides a way to predict the average minimum number of bits required to encode a string of symbols, based on the frequency of the symbols.



                                  Formula for base 2



                                  Note that the base represents the number of possible characters. Base 2 can be replaced by any base. As can be seen in this code where it's replaced by 255.



                                  This link has a simplest implementation of the algorithm for calculating entropy of novels and religious books. It tells us a lot. For example, that all the human generated books have nearly identical degree of fluctuation between disorder. It is a good feature of data.
                                  This is the link to code mentioned above.
                                  Information Entropy of different Books






                                  share|improve this answer

























                                    3












                                    3








                                    3







                                    Shannon's entropy comes from information theory. It is the measure of degree of randomness of text. If a string has greater Shannon's entropy it means it's a strong password. Principally, Shannon entropy equation provides a way to predict the average minimum number of bits required to encode a string of symbols, based on the frequency of the symbols.



                                    Formula for base 2



                                    Note that the base represents the number of possible characters. Base 2 can be replaced by any base. As can be seen in this code where it's replaced by 255.



                                    This link has a simplest implementation of the algorithm for calculating entropy of novels and religious books. It tells us a lot. For example, that all the human generated books have nearly identical degree of fluctuation between disorder. It is a good feature of data.
                                    This is the link to code mentioned above.
                                    Information Entropy of different Books






                                    share|improve this answer













                                    Shannon's entropy comes from information theory. It is the measure of degree of randomness of text. If a string has greater Shannon's entropy it means it's a strong password. Principally, Shannon entropy equation provides a way to predict the average minimum number of bits required to encode a string of symbols, based on the frequency of the symbols.



                                    Formula for base 2



                                    Note that the base represents the number of possible characters. Base 2 can be replaced by any base. As can be seen in this code where it's replaced by 255.



                                    This link has a simplest implementation of the algorithm for calculating entropy of novels and religious books. It tells us a lot. For example, that all the human generated books have nearly identical degree of fluctuation between disorder. It is a good feature of data.
                                    This is the link to code mentioned above.
                                    Information Entropy of different Books







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jun 27 at 14:09









                                    Random Science StuffRandom Science Stuff

                                    311 bronze badge




                                    311 bronze badge





















                                        0














                                        First, you have to know that the term entropy is used to refer to two different concepts which are somehow related if you think twice, but as it is really not obvious at first sight, you should prefer to consider these two as different concepts.



                                        Defining Entropy ?



                                        The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system.



                                        On the other hand, the other entropy is coming from information theory and can be seen as a measure of the amount of information that can be stored in a system.



                                        Why is it useful in RE ?



                                        An entropy graph (to evaluate the amount of disorder) can be useful to detect the parts of the file that get close to random data. It will allow to detect the parts that have been encrypted/compressed and the parts that appear to be left untouched.



                                        Indeed, a high disorder in data is exactly what you want to achieve when encrypting data. And, I told you that the two entropy definitions were related, if you store a lot of information in a minimum of bytes, it appears to be with a high level of disorder, so is compression...



                                        That is why we use entropy graphs of files, be able to distinguish raw parts from encrypted/compressed sub-parts without any prior information of the file format.



                                        An Example



                                        For example, here is an entropy graph from the tool binwalk coming from another question from here:



                                        Binwalk entropy graph



                                        Directly from this graph we can see that there is a first part that appear to be raw (probably asm opcodes if we look at the shape of the curve), then a part which is much likely encrypted (compression does not reach an entropy of 1 with such regularity usually), and finally padding with always the same byte (e.g. 0x00 or 0xff).






                                        share|improve this answer




















                                        • 2





                                          "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                          – julian
                                          Jun 27 at 12:02






                                        • 2





                                          If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                          – julian
                                          Jun 27 at 12:05







                                        • 1





                                          "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                          – julian
                                          Jun 27 at 13:14






                                        • 1





                                          it’s nothing personal. It’s just not correct.

                                          – julian
                                          Jun 27 at 16:08






                                        • 1





                                          It looks personal, just by the way you are harassing me with that.

                                          – perror
                                          Jun 27 at 16:09
















                                        0














                                        First, you have to know that the term entropy is used to refer to two different concepts which are somehow related if you think twice, but as it is really not obvious at first sight, you should prefer to consider these two as different concepts.



                                        Defining Entropy ?



                                        The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system.



                                        On the other hand, the other entropy is coming from information theory and can be seen as a measure of the amount of information that can be stored in a system.



                                        Why is it useful in RE ?



                                        An entropy graph (to evaluate the amount of disorder) can be useful to detect the parts of the file that get close to random data. It will allow to detect the parts that have been encrypted/compressed and the parts that appear to be left untouched.



                                        Indeed, a high disorder in data is exactly what you want to achieve when encrypting data. And, I told you that the two entropy definitions were related, if you store a lot of information in a minimum of bytes, it appears to be with a high level of disorder, so is compression...



                                        That is why we use entropy graphs of files, be able to distinguish raw parts from encrypted/compressed sub-parts without any prior information of the file format.



                                        An Example



                                        For example, here is an entropy graph from the tool binwalk coming from another question from here:



                                        Binwalk entropy graph



                                        Directly from this graph we can see that there is a first part that appear to be raw (probably asm opcodes if we look at the shape of the curve), then a part which is much likely encrypted (compression does not reach an entropy of 1 with such regularity usually), and finally padding with always the same byte (e.g. 0x00 or 0xff).






                                        share|improve this answer




















                                        • 2





                                          "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                          – julian
                                          Jun 27 at 12:02






                                        • 2





                                          If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                          – julian
                                          Jun 27 at 12:05







                                        • 1





                                          "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                          – julian
                                          Jun 27 at 13:14






                                        • 1





                                          it’s nothing personal. It’s just not correct.

                                          – julian
                                          Jun 27 at 16:08






                                        • 1





                                          It looks personal, just by the way you are harassing me with that.

                                          – perror
                                          Jun 27 at 16:09














                                        0












                                        0








                                        0







                                        First, you have to know that the term entropy is used to refer to two different concepts which are somehow related if you think twice, but as it is really not obvious at first sight, you should prefer to consider these two as different concepts.



                                        Defining Entropy ?



                                        The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system.



                                        On the other hand, the other entropy is coming from information theory and can be seen as a measure of the amount of information that can be stored in a system.



                                        Why is it useful in RE ?



                                        An entropy graph (to evaluate the amount of disorder) can be useful to detect the parts of the file that get close to random data. It will allow to detect the parts that have been encrypted/compressed and the parts that appear to be left untouched.



                                        Indeed, a high disorder in data is exactly what you want to achieve when encrypting data. And, I told you that the two entropy definitions were related, if you store a lot of information in a minimum of bytes, it appears to be with a high level of disorder, so is compression...



                                        That is why we use entropy graphs of files, be able to distinguish raw parts from encrypted/compressed sub-parts without any prior information of the file format.



                                        An Example



                                        For example, here is an entropy graph from the tool binwalk coming from another question from here:



                                        Binwalk entropy graph



                                        Directly from this graph we can see that there is a first part that appear to be raw (probably asm opcodes if we look at the shape of the curve), then a part which is much likely encrypted (compression does not reach an entropy of 1 with such regularity usually), and finally padding with always the same byte (e.g. 0x00 or 0xff).






                                        share|improve this answer















                                        First, you have to know that the term entropy is used to refer to two different concepts which are somehow related if you think twice, but as it is really not obvious at first sight, you should prefer to consider these two as different concepts.



                                        Defining Entropy ?



                                        The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system.



                                        On the other hand, the other entropy is coming from information theory and can be seen as a measure of the amount of information that can be stored in a system.



                                        Why is it useful in RE ?



                                        An entropy graph (to evaluate the amount of disorder) can be useful to detect the parts of the file that get close to random data. It will allow to detect the parts that have been encrypted/compressed and the parts that appear to be left untouched.



                                        Indeed, a high disorder in data is exactly what you want to achieve when encrypting data. And, I told you that the two entropy definitions were related, if you store a lot of information in a minimum of bytes, it appears to be with a high level of disorder, so is compression...



                                        That is why we use entropy graphs of files, be able to distinguish raw parts from encrypted/compressed sub-parts without any prior information of the file format.



                                        An Example



                                        For example, here is an entropy graph from the tool binwalk coming from another question from here:



                                        Binwalk entropy graph



                                        Directly from this graph we can see that there is a first part that appear to be raw (probably asm opcodes if we look at the shape of the curve), then a part which is much likely encrypted (compression does not reach an entropy of 1 with such regularity usually), and finally padding with always the same byte (e.g. 0x00 or 0xff).







                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Jun 28 at 12:16

























                                        answered Jun 27 at 8:55









                                        perrorperror

                                        11.7k18 gold badges71 silver badges131 bronze badges




                                        11.7k18 gold badges71 silver badges131 bronze badges







                                        • 2





                                          "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                          – julian
                                          Jun 27 at 12:02






                                        • 2





                                          If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                          – julian
                                          Jun 27 at 12:05







                                        • 1





                                          "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                          – julian
                                          Jun 27 at 13:14






                                        • 1





                                          it’s nothing personal. It’s just not correct.

                                          – julian
                                          Jun 27 at 16:08






                                        • 1





                                          It looks personal, just by the way you are harassing me with that.

                                          – perror
                                          Jun 27 at 16:09













                                        • 2





                                          "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                          – julian
                                          Jun 27 at 12:02






                                        • 2





                                          If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                          – julian
                                          Jun 27 at 12:05







                                        • 1





                                          "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                          – julian
                                          Jun 27 at 13:14






                                        • 1





                                          it’s nothing personal. It’s just not correct.

                                          – julian
                                          Jun 27 at 16:08






                                        • 1





                                          It looks personal, just by the way you are harassing me with that.

                                          – perror
                                          Jun 27 at 16:09








                                        2




                                        2





                                        "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                        – julian
                                        Jun 27 at 12:02





                                        "The entropy that you want to know about can be defined as the amount of order, disorder, or chaos in a thermodynamic system." <- This is a false statement. Software is not a thermodynamic system and does not possesses the property "energy" (heat). Software consists of encoded information, therefore to measure its properties - such as its Shannon entropy - the tools provided by information theory are appropriate.

                                        – julian
                                        Jun 27 at 12:02




                                        2




                                        2





                                        If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                        – julian
                                        Jun 27 at 12:05






                                        If you don't believe me, please examine the code that was used to generate the entropy plot in your post. Binwalk calculates information entropy level in terms of either zlib compression ratio or Shannon entropy.

                                        – julian
                                        Jun 27 at 12:05





                                        1




                                        1





                                        "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                        – julian
                                        Jun 27 at 13:14





                                        "Your definition of entropy is perfectly valid if you are considering information theory. But, unfortunately, the entropy referred here is much likely the one coming from thermodynamics (i.e. the degree of disorder)." <- The meaning seems quite clear.

                                        – julian
                                        Jun 27 at 13:14




                                        1




                                        1





                                        it’s nothing personal. It’s just not correct.

                                        – julian
                                        Jun 27 at 16:08





                                        it’s nothing personal. It’s just not correct.

                                        – julian
                                        Jun 27 at 16:08




                                        1




                                        1





                                        It looks personal, just by the way you are harassing me with that.

                                        – perror
                                        Jun 27 at 16:09






                                        It looks personal, just by the way you are harassing me with that.

                                        – perror
                                        Jun 27 at 16:09


















                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Reverse Engineering Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f21555%2fwhat-is-an-entropy-graph%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                        Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                                        Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?