Does m,n? regex actually minimize repetitions or does it minimize number of characters matched? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Match all occurrences of a regexA comprehensive regex for phone number validationCount the number occurrences of a character in a stringWhat regex will match every character except comma ',' or semi-colon ';'?RegEx match open tags except XHTML self-contained tagsRegex: matching up to the first occurrence of a characterRegex Match all characters between two stringsCheck whether a string matches a regex in JSThe regex characters ?, $, |How does regex m,n? work in Python?

Are these square matrices always diagonalisable?

Could a cockatrice have parasitic embryos?

How would it unbalance gameplay to rule that Weapon Master allows for picking a fighting style?

What *exactly* is electrical current, voltage, and resistance?

In search of the origins of term censor, I hit a dead end stuck with the greek term, to censor, λογοκρίνω

Does every subgroup of an abelian group have to be abelian?

Why did Israel vote against lifting the American embargo on Cuba?

France's Public Holidays' Puzzle

Israeli soda type drink

Where can I find how to tex symbols for different fonts?

Putting Ant-Man on house arrest

"Working on a knee"

Co-worker works way more than he should

TV series episode where humans nuke aliens before decrypting their message that states they come in peace

How to keep bees out of canned beverages?

Coin Game with infinite paradox

Raising a bilingual kid. When should we introduce the majority language?

Why does the Cisco show run command not show the full version, while the show version command does?

What is /etc/mtab in Linux?

What was Apollo 13's "Little Jolt" after MECO?

Stretch a Tikz tree

Marquee sign letters

Where did Arya get these scars?

What is a good proxy for government quality?



Does m,n? regex actually minimize repetitions or does it minimize number of characters matched?



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Match all occurrences of a regexA comprehensive regex for phone number validationCount the number occurrences of a character in a stringWhat regex will match every character except comma ',' or semi-colon ';'?RegEx match open tags except XHTML self-contained tagsRegex: matching up to the first occurrence of a characterRegex Match all characters between two stringsCheck whether a string matches a regex in JSThe regex characters ?, $, |How does regex m,n? work in Python?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








5















According to the Python3 Regex documentation:



m,n?



Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.




However, this seems to be contradicted by the following experiment:



import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')


... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.



Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?










share|improve this question



















  • 2





    Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

    – Patrick Artner
    2 days ago












  • I agree with @PatrickArtner, that is very bad form.

    – kabanus
    2 days ago






  • 3





    @PatrickArtner I think he was trying to be helpful and simplify it.

    – Barmar
    2 days ago











  • @PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

    – James Shapiro
    2 days ago

















5















According to the Python3 Regex documentation:



m,n?



Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.




However, this seems to be contradicted by the following experiment:



import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')


... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.



Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?










share|improve this question



















  • 2





    Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

    – Patrick Artner
    2 days ago












  • I agree with @PatrickArtner, that is very bad form.

    – kabanus
    2 days ago






  • 3





    @PatrickArtner I think he was trying to be helpful and simplify it.

    – Barmar
    2 days ago











  • @PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

    – James Shapiro
    2 days ago













5












5








5








According to the Python3 Regex documentation:



m,n?



Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.




However, this seems to be contradicted by the following experiment:



import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')


... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.



Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?










share|improve this question
















According to the Python3 Regex documentation:



m,n?



Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.




However, this seems to be contradicted by the following experiment:



import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')


... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.



Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?







python regex python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Jean-François Fabre

107k1059117




107k1059117










asked 2 days ago









James ShapiroJames Shapiro

428315




428315







  • 2





    Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

    – Patrick Artner
    2 days ago












  • I agree with @PatrickArtner, that is very bad form.

    – kabanus
    2 days ago






  • 3





    @PatrickArtner I think he was trying to be helpful and simplify it.

    – Barmar
    2 days ago











  • @PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

    – James Shapiro
    2 days ago












  • 2





    Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

    – Patrick Artner
    2 days ago












  • I agree with @PatrickArtner, that is very bad form.

    – kabanus
    2 days ago






  • 3





    @PatrickArtner I think he was trying to be helpful and simplify it.

    – Barmar
    2 days ago











  • @PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

    – James Shapiro
    2 days ago







2




2





Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

– Patrick Artner
2 days ago






Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.

– Patrick Artner
2 days ago














I agree with @PatrickArtner, that is very bad form.

– kabanus
2 days ago





I agree with @PatrickArtner, that is very bad form.

– kabanus
2 days ago




3




3





@PatrickArtner I think he was trying to be helpful and simplify it.

– Barmar
2 days ago





@PatrickArtner I think he was trying to be helpful and simplify it.

– Barmar
2 days ago













@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

– James Shapiro
2 days ago





@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.

– James Shapiro
2 days ago












5 Answers
5






active

oldest

votes


















6














m,n? tries to match as few times as possible, but it's not going to reach into the abc|d|abcde and change the behavior of |. | still tries the left option first.



(abc|d|abcde)1,2? tries to match (abc|d|abcde) once, and succeeds, matching abc. The regex engine then continues on with the rest of the pattern and tries to match (e|f), which fails. It backtracks and tries to match another repetition of abc|d|abcde, and matches d. It continues on to (e|f) again and matches e successfully.



Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde) before trying a second (abc|d|abcde). It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2? comes first.






share|improve this answer
































    4














    It doesn't force the minimum number of repetitions, it just allows it to match fewer times.



    When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde) will match abc rather than abcde, and then the next repetition will match e. It doesn't backtrack to see if theres a result with fewer repetitions.



    The solution is to put the longer alternatives first. Change it to



    regex = re.compile('(abcde|abc|d)1,2?(e|f)')





    share|improve this answer























    • Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

      – user2357112
      2 days ago











    • What I meant is that it's allowed to use fewer matches even though more matches exist.

      – Barmar
      2 days ago











    • Greedy quantifiers try to match as many possible matches.

      – Barmar
      2 days ago











    • And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

      – user2357112
      2 days ago











    • Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

      – Barmar
      2 days ago


















    2














    Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:




    import re
    regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
    rex.match('abcdefghij')



    The first match for 2,3 is abc|def|gh (left to right).



    If you reorder your pattern, you get what you want:



    import re
    rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
    print(rex.match('abcdefghij').group())


    Output:



    abcdefghij





    share|improve this answer






























      1














      The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.



      Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.



      Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.



      What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.






      share|improve this answer






























        1














        The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2? which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.



        When the matching starts, the engine finds abc first and then tries to match e or f next, which does not match because there is a d following abc.



        But because the quantifier is 1,2? there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc but it can match d. Then it tries to match e or f again, where it can match e






        share|improve this answer























          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55780753%2fdoes-m-n-regex-actually-minimize-repetitions-or-does-it-minimize-number-of-ch%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          6














          m,n? tries to match as few times as possible, but it's not going to reach into the abc|d|abcde and change the behavior of |. | still tries the left option first.



          (abc|d|abcde)1,2? tries to match (abc|d|abcde) once, and succeeds, matching abc. The regex engine then continues on with the rest of the pattern and tries to match (e|f), which fails. It backtracks and tries to match another repetition of abc|d|abcde, and matches d. It continues on to (e|f) again and matches e successfully.



          Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde) before trying a second (abc|d|abcde). It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2? comes first.






          share|improve this answer





























            6














            m,n? tries to match as few times as possible, but it's not going to reach into the abc|d|abcde and change the behavior of |. | still tries the left option first.



            (abc|d|abcde)1,2? tries to match (abc|d|abcde) once, and succeeds, matching abc. The regex engine then continues on with the rest of the pattern and tries to match (e|f), which fails. It backtracks and tries to match another repetition of abc|d|abcde, and matches d. It continues on to (e|f) again and matches e successfully.



            Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde) before trying a second (abc|d|abcde). It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2? comes first.






            share|improve this answer



























              6












              6








              6







              m,n? tries to match as few times as possible, but it's not going to reach into the abc|d|abcde and change the behavior of |. | still tries the left option first.



              (abc|d|abcde)1,2? tries to match (abc|d|abcde) once, and succeeds, matching abc. The regex engine then continues on with the rest of the pattern and tries to match (e|f), which fails. It backtracks and tries to match another repetition of abc|d|abcde, and matches d. It continues on to (e|f) again and matches e successfully.



              Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde) before trying a second (abc|d|abcde). It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2? comes first.






              share|improve this answer















              m,n? tries to match as few times as possible, but it's not going to reach into the abc|d|abcde and change the behavior of |. | still tries the left option first.



              (abc|d|abcde)1,2? tries to match (abc|d|abcde) once, and succeeds, matching abc. The regex engine then continues on with the rest of the pattern and tries to match (e|f), which fails. It backtracks and tries to match another repetition of abc|d|abcde, and matches d. It continues on to (e|f) again and matches e successfully.



              Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde) before trying a second (abc|d|abcde). It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2? comes first.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 2 days ago

























              answered 2 days ago









              user2357112user2357112

              159k13177272




              159k13177272























                  4














                  It doesn't force the minimum number of repetitions, it just allows it to match fewer times.



                  When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde) will match abc rather than abcde, and then the next repetition will match e. It doesn't backtrack to see if theres a result with fewer repetitions.



                  The solution is to put the longer alternatives first. Change it to



                  regex = re.compile('(abcde|abc|d)1,2?(e|f)')





                  share|improve this answer























                  • Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                    – user2357112
                    2 days ago











                  • What I meant is that it's allowed to use fewer matches even though more matches exist.

                    – Barmar
                    2 days ago











                  • Greedy quantifiers try to match as many possible matches.

                    – Barmar
                    2 days ago











                  • And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                    – user2357112
                    2 days ago











                  • Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                    – Barmar
                    2 days ago















                  4














                  It doesn't force the minimum number of repetitions, it just allows it to match fewer times.



                  When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde) will match abc rather than abcde, and then the next repetition will match e. It doesn't backtrack to see if theres a result with fewer repetitions.



                  The solution is to put the longer alternatives first. Change it to



                  regex = re.compile('(abcde|abc|d)1,2?(e|f)')





                  share|improve this answer























                  • Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                    – user2357112
                    2 days ago











                  • What I meant is that it's allowed to use fewer matches even though more matches exist.

                    – Barmar
                    2 days ago











                  • Greedy quantifiers try to match as many possible matches.

                    – Barmar
                    2 days ago











                  • And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                    – user2357112
                    2 days ago











                  • Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                    – Barmar
                    2 days ago













                  4












                  4








                  4







                  It doesn't force the minimum number of repetitions, it just allows it to match fewer times.



                  When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde) will match abc rather than abcde, and then the next repetition will match e. It doesn't backtrack to see if theres a result with fewer repetitions.



                  The solution is to put the longer alternatives first. Change it to



                  regex = re.compile('(abcde|abc|d)1,2?(e|f)')





                  share|improve this answer













                  It doesn't force the minimum number of repetitions, it just allows it to match fewer times.



                  When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde) will match abc rather than abcde, and then the next repetition will match e. It doesn't backtrack to see if theres a result with fewer repetitions.



                  The solution is to put the longer alternatives first. Change it to



                  regex = re.compile('(abcde|abc|d)1,2?(e|f)')






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 days ago









                  BarmarBarmar

                  438k36262366




                  438k36262366












                  • Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                    – user2357112
                    2 days ago











                  • What I meant is that it's allowed to use fewer matches even though more matches exist.

                    – Barmar
                    2 days ago











                  • Greedy quantifiers try to match as many possible matches.

                    – Barmar
                    2 days ago











                  • And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                    – user2357112
                    2 days ago











                  • Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                    – Barmar
                    2 days ago

















                  • Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                    – user2357112
                    2 days ago











                  • What I meant is that it's allowed to use fewer matches even though more matches exist.

                    – Barmar
                    2 days ago











                  • Greedy quantifiers try to match as many possible matches.

                    – Barmar
                    2 days ago











                  • And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                    – user2357112
                    2 days ago











                  • Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                    – Barmar
                    2 days ago
















                  Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                  – user2357112
                  2 days ago





                  Both m,n and m,n? allow any number of matches from m to n. m,n? isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)

                  – user2357112
                  2 days ago













                  What I meant is that it's allowed to use fewer matches even though more matches exist.

                  – Barmar
                  2 days ago





                  What I meant is that it's allowed to use fewer matches even though more matches exist.

                  – Barmar
                  2 days ago













                  Greedy quantifiers try to match as many possible matches.

                  – Barmar
                  2 days ago





                  Greedy quantifiers try to match as many possible matches.

                  – Barmar
                  2 days ago













                  And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                  – user2357112
                  2 days ago





                  And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching r"a1,2a" against "aa" results in the 1,2 only matching once even though two matches exist, because it has to backtrack.

                  – user2357112
                  2 days ago













                  Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                  – Barmar
                  2 days ago





                  Of course, because everything is constrained by the rest of the pattern. The issue is what they match against aaa. Lazy matches once, eager matches twice.

                  – Barmar
                  2 days ago











                  2














                  Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:




                  import re
                  regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
                  rex.match('abcdefghij')



                  The first match for 2,3 is abc|def|gh (left to right).



                  If you reorder your pattern, you get what you want:



                  import re
                  rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
                  print(rex.match('abcdefghij').group())


                  Output:



                  abcdefghij





                  share|improve this answer



























                    2














                    Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:




                    import re
                    regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
                    rex.match('abcdefghij')



                    The first match for 2,3 is abc|def|gh (left to right).



                    If you reorder your pattern, you get what you want:



                    import re
                    rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
                    print(rex.match('abcdefghij').group())


                    Output:



                    abcdefghij





                    share|improve this answer

























                      2












                      2








                      2







                      Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:




                      import re
                      regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
                      rex.match('abcdefghij')



                      The first match for 2,3 is abc|def|gh (left to right).



                      If you reorder your pattern, you get what you want:



                      import re
                      rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
                      print(rex.match('abcdefghij').group())


                      Output:



                      abcdefghij





                      share|improve this answer













                      Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:




                      import re
                      regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
                      rex.match('abcdefghij')



                      The first match for 2,3 is abc|def|gh (left to right).



                      If you reorder your pattern, you get what you want:



                      import re
                      rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
                      print(rex.match('abcdefghij').group())


                      Output:



                      abcdefghij






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 2 days ago









                      Patrick ArtnerPatrick Artner

                      27.1k62544




                      27.1k62544





















                          1














                          The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.



                          Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.



                          Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.



                          What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.






                          share|improve this answer



























                            1














                            The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.



                            Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.



                            Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.



                            What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.






                            share|improve this answer

























                              1












                              1








                              1







                              The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.



                              Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.



                              Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.



                              What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.






                              share|improve this answer













                              The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.



                              Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.



                              Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.



                              What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered 2 days ago









                              tripleeetripleee

                              97k14135191




                              97k14135191





















                                  1














                                  The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2? which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.



                                  When the matching starts, the engine finds abc first and then tries to match e or f next, which does not match because there is a d following abc.



                                  But because the quantifier is 1,2? there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc but it can match d. Then it tries to match e or f again, where it can match e






                                  share|improve this answer



























                                    1














                                    The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2? which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.



                                    When the matching starts, the engine finds abc first and then tries to match e or f next, which does not match because there is a d following abc.



                                    But because the quantifier is 1,2? there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc but it can match d. Then it tries to match e or f again, where it can match e






                                    share|improve this answer

























                                      1












                                      1








                                      1







                                      The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2? which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.



                                      When the matching starts, the engine finds abc first and then tries to match e or f next, which does not match because there is a d following abc.



                                      But because the quantifier is 1,2? there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc but it can match d. Then it tries to match e or f again, where it can match e






                                      share|improve this answer













                                      The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2? which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.



                                      When the matching starts, the engine finds abc first and then tries to match e or f next, which does not match because there is a d following abc.



                                      But because the quantifier is 1,2? there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc but it can match d. Then it tries to match e or f again, where it can match e







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered 2 days ago









                                      The fourth birdThe fourth bird

                                      26.6k91630




                                      26.6k91630



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55780753%2fdoes-m-n-regex-actually-minimize-repetitions-or-does-it-minimize-number-of-ch%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

                                          Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                          Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form