Does m,n? regex actually minimize repetitions or does it minimize number of characters matched? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Match all occurrences of a regexA comprehensive regex for phone number validationCount the number occurrences of a character in a stringWhat regex will match every character except comma ',' or semi-colon ';'?RegEx match open tags except XHTML self-contained tagsRegex: matching up to the first occurrence of a characterRegex Match all characters between two stringsCheck whether a string matches a regex in JSThe regex characters ?, $, |How does regex m,n? work in Python?
Are these square matrices always diagonalisable?
Could a cockatrice have parasitic embryos?
How would it unbalance gameplay to rule that Weapon Master allows for picking a fighting style?
What *exactly* is electrical current, voltage, and resistance?
In search of the origins of term censor, I hit a dead end stuck with the greek term, to censor, λογοκρίνω
Does every subgroup of an abelian group have to be abelian?
Why did Israel vote against lifting the American embargo on Cuba?
France's Public Holidays' Puzzle
Israeli soda type drink
Where can I find how to tex symbols for different fonts?
Putting Ant-Man on house arrest
"Working on a knee"
Co-worker works way more than he should
TV series episode where humans nuke aliens before decrypting their message that states they come in peace
How to keep bees out of canned beverages?
Coin Game with infinite paradox
Raising a bilingual kid. When should we introduce the majority language?
Why does the Cisco show run command not show the full version, while the show version command does?
What is /etc/mtab in Linux?
What was Apollo 13's "Little Jolt" after MECO?
Stretch a Tikz tree
Marquee sign letters
Where did Arya get these scars?
What is a good proxy for government quality?
Does m,n? regex actually minimize repetitions or does it minimize number of characters matched?
Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Match all occurrences of a regexA comprehensive regex for phone number validationCount the number occurrences of a character in a stringWhat regex will match every character except comma ',' or semi-colon ';'?RegEx match open tags except XHTML self-contained tagsRegex: matching up to the first occurrence of a characterRegex Match all characters between two stringsCheck whether a string matches a regex in JSThe regex characters ?, $, |How does regex m,n? work in Python?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
According to the Python3 Regex documentation:
m,n?
Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.
However, this seems to be contradicted by the following experiment:
import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')
... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.
Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?
python regex python-3.x
add a comment |
According to the Python3 Regex documentation:
m,n?
Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.
However, this seems to be contradicted by the following experiment:
import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')
... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.
Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?
python regex python-3.x
2
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
3
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago
add a comment |
According to the Python3 Regex documentation:
m,n?
Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.
However, this seems to be contradicted by the following experiment:
import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')
... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.
Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?
python regex python-3.x
According to the Python3 Regex documentation:
m,n?
Causes the resulting RE to match from m to n repetitions of the
preceding RE, attempting to match as few repetitions as possible. This
is the non-greedy version of the previous qualifier. For example, on
the 6-character string 'aaaaaa', a3,5 will match 5 'a' characters,
while a3,5? will only match 3 characters.
However, this seems to be contradicted by the following experiment:
import re
regex = re.compile('(abc|d|abcde)1,2?(e|f)')
regex.match('abcdef')
... which matches 'abcde'. This necessarily involves 2 repetitions of (abc|d|abcde), namely, 'abc' and 'd'. However, there was an alternative match candidate that involved only 1 repetition of (abc|d|abcde), namely 'abcde'.
Am I misreading the documentation, or does m,n? actually minimize the number of characters matched (or some other objective), and not the number of repetitions?
python regex python-3.x
python regex python-3.x
edited 2 days ago
Jean-François Fabre♦
107k1059117
107k1059117
asked 2 days ago
James ShapiroJames Shapiro
428315
428315
2
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
3
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago
add a comment |
2
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
3
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago
2
2
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
3
3
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago
add a comment |
5 Answers
5
active
oldest
votes
m,n?
tries to match as few times as possible, but it's not going to reach into the abc|d|abcde
and change the behavior of |
. |
still tries the left option first.
(abc|d|abcde)1,2?
tries to match (abc|d|abcde)
once, and succeeds, matching abc
. The regex engine then continues on with the rest of the pattern and tries to match (e|f)
, which fails. It backtracks and tries to match another repetition of abc|d|abcde
, and matches d
. It continues on to (e|f)
again and matches e
successfully.
Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde)
before trying a second (abc|d|abcde)
. It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2?
comes first.
add a comment |
It doesn't force the minimum number of repetitions, it just allows it to match fewer times.
When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde)
will match abc
rather than abcde
, and then the next repetition will match e
. It doesn't backtrack to see if theres a result with fewer repetitions.
The solution is to put the longer alternatives first. Change it to
regex = re.compile('(abcde|abc|d)1,2?(e|f)')
Bothm,n
andm,n?
allow any number of matches fromm
ton
.m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)
– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matchingr"a1,2a"
against"aa"
results in the1,2
only matching once even though two matches exist, because it has to backtrack.
– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match againstaaa
. Lazy matches once, eager matches twice.
– Barmar
2 days ago
|
show 1 more comment
Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:
import re
regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
rex.match('abcdefghij')
The first match for 2,3 is abc|def|gh
(left to right).
If you reorder your pattern, you get what you want:
import re
rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
print(rex.match('abcdefghij').group())
Output:
abcdefghij
add a comment |
The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.
Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.
Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.
What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.
add a comment |
The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2?
which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.
When the matching starts, the engine finds abc
first and then tries to match e
or f
next, which does not match because there is a d
following abc
.
But because the quantifier is 1,2?
there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc
but it can match d
. Then it tries to match e
or f
again, where it can match e
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55780753%2fdoes-m-n-regex-actually-minimize-repetitions-or-does-it-minimize-number-of-ch%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
m,n?
tries to match as few times as possible, but it's not going to reach into the abc|d|abcde
and change the behavior of |
. |
still tries the left option first.
(abc|d|abcde)1,2?
tries to match (abc|d|abcde)
once, and succeeds, matching abc
. The regex engine then continues on with the rest of the pattern and tries to match (e|f)
, which fails. It backtracks and tries to match another repetition of abc|d|abcde
, and matches d
. It continues on to (e|f)
again and matches e
successfully.
Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde)
before trying a second (abc|d|abcde)
. It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2?
comes first.
add a comment |
m,n?
tries to match as few times as possible, but it's not going to reach into the abc|d|abcde
and change the behavior of |
. |
still tries the left option first.
(abc|d|abcde)1,2?
tries to match (abc|d|abcde)
once, and succeeds, matching abc
. The regex engine then continues on with the rest of the pattern and tries to match (e|f)
, which fails. It backtracks and tries to match another repetition of abc|d|abcde
, and matches d
. It continues on to (e|f)
again and matches e
successfully.
Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde)
before trying a second (abc|d|abcde)
. It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2?
comes first.
add a comment |
m,n?
tries to match as few times as possible, but it's not going to reach into the abc|d|abcde
and change the behavior of |
. |
still tries the left option first.
(abc|d|abcde)1,2?
tries to match (abc|d|abcde)
once, and succeeds, matching abc
. The regex engine then continues on with the rest of the pattern and tries to match (e|f)
, which fails. It backtracks and tries to match another repetition of abc|d|abcde
, and matches d
. It continues on to (e|f)
again and matches e
successfully.
Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde)
before trying a second (abc|d|abcde)
. It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2?
comes first.
m,n?
tries to match as few times as possible, but it's not going to reach into the abc|d|abcde
and change the behavior of |
. |
still tries the left option first.
(abc|d|abcde)1,2?
tries to match (abc|d|abcde)
once, and succeeds, matching abc
. The regex engine then continues on with the rest of the pattern and tries to match (e|f)
, which fails. It backtracks and tries to match another repetition of abc|d|abcde
, and matches d
. It continues on to (e|f)
again and matches e
successfully.
Perhaps you expected the backtracking to try a different option for the first (abc|d|abcde)
before trying a second (abc|d|abcde)
. It doesn't do that. It would try that eventually, if it had to, but trying more matches for the 1,2?
comes first.
edited 2 days ago
answered 2 days ago
user2357112user2357112
159k13177272
159k13177272
add a comment |
add a comment |
It doesn't force the minimum number of repetitions, it just allows it to match fewer times.
When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde)
will match abc
rather than abcde
, and then the next repetition will match e
. It doesn't backtrack to see if theres a result with fewer repetitions.
The solution is to put the longer alternatives first. Change it to
regex = re.compile('(abcde|abc|d)1,2?(e|f)')
Bothm,n
andm,n?
allow any number of matches fromm
ton
.m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)
– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matchingr"a1,2a"
against"aa"
results in the1,2
only matching once even though two matches exist, because it has to backtrack.
– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match againstaaa
. Lazy matches once, eager matches twice.
– Barmar
2 days ago
|
show 1 more comment
It doesn't force the minimum number of repetitions, it just allows it to match fewer times.
When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde)
will match abc
rather than abcde
, and then the next repetition will match e
. It doesn't backtrack to see if theres a result with fewer repetitions.
The solution is to put the longer alternatives first. Change it to
regex = re.compile('(abcde|abc|d)1,2?(e|f)')
Bothm,n
andm,n?
allow any number of matches fromm
ton
.m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)
– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matchingr"a1,2a"
against"aa"
results in the1,2
only matching once even though two matches exist, because it has to backtrack.
– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match againstaaa
. Lazy matches once, eager matches twice.
– Barmar
2 days ago
|
show 1 more comment
It doesn't force the minimum number of repetitions, it just allows it to match fewer times.
When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde)
will match abc
rather than abcde
, and then the next repetition will match e
. It doesn't backtrack to see if theres a result with fewer repetitions.
The solution is to put the longer alternatives first. Change it to
regex = re.compile('(abcde|abc|d)1,2?(e|f)')
It doesn't force the minimum number of repetitions, it just allows it to match fewer times.
When you have multiple alternatives that can match, regexp engines handle things differently. Some are "eager" and use the first matching alternatives, others use the "longest match" rule. Apparently Python is eager. So (abc|d|abcde)
will match abc
rather than abcde
, and then the next repetition will match e
. It doesn't backtrack to see if theres a result with fewer repetitions.
The solution is to put the longer alternatives first. Change it to
regex = re.compile('(abcde|abc|d)1,2?(e|f)')
answered 2 days ago
BarmarBarmar
438k36262366
438k36262366
Bothm,n
andm,n?
allow any number of matches fromm
ton
.m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)
– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matchingr"a1,2a"
against"aa"
results in the1,2
only matching once even though two matches exist, because it has to backtrack.
– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match againstaaa
. Lazy matches once, eager matches twice.
– Barmar
2 days ago
|
show 1 more comment
Bothm,n
andm,n?
allow any number of matches fromm
ton
.m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)
– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matchingr"a1,2a"
against"aa"
results in the1,2
only matching once even though two matches exist, because it has to backtrack.
– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match againstaaa
. Lazy matches once, eager matches twice.
– Barmar
2 days ago
Both
m,n
and m,n?
allow any number of matches from m
to n
. m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)– user2357112
2 days ago
Both
m,n
and m,n?
allow any number of matches from m
to n
. m,n?
isn't different in what it allows. It's different in what it tries first. (This operator doesn't make sense in a longest-match, non-backtracking engine.)– user2357112
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
What I meant is that it's allowed to use fewer matches even though more matches exist.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
Greedy quantifiers try to match as many possible matches.
– Barmar
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching
r"a1,2a"
against "aa"
results in the 1,2
only matching once even though two matches exist, because it has to backtrack.– user2357112
2 days ago
And lazy quantifiers try to match as few as possible, but both allow other options. For example, matching
r"a1,2a"
against "aa"
results in the 1,2
only matching once even though two matches exist, because it has to backtrack.– user2357112
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match against
aaa
. Lazy matches once, eager matches twice.– Barmar
2 days ago
Of course, because everything is constrained by the rest of the pattern. The issue is what they match against
aaa
. Lazy matches once, eager matches twice.– Barmar
2 days ago
|
show 1 more comment
Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:
import re
regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
rex.match('abcdefghij')
The first match for 2,3 is abc|def|gh
(left to right).
If you reorder your pattern, you get what you want:
import re
rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
print(rex.match('abcdefghij').group())
Output:
abcdefghij
add a comment |
Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:
import re
regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
rex.match('abcdefghij')
The first match for 2,3 is abc|def|gh
(left to right).
If you reorder your pattern, you get what you want:
import re
rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
print(rex.match('abcdefghij').group())
Output:
abcdefghij
add a comment |
Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:
import re
regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
rex.match('abcdefghij')
The first match for 2,3 is abc|def|gh
(left to right).
If you reorder your pattern, you get what you want:
import re
rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
print(rex.match('abcdefghij').group())
Output:
abcdefghij
Regex does not evaluate all possible options and chooses the smallest one - if delivers the first one it finds:
import re
regex = re.compile('(abc|def|gh|abcde|fghi)2,3?(i|j)')
rex.match('abcdefghij')
The first match for 2,3 is abc|def|gh
(left to right).
If you reorder your pattern, you get what you want:
import re
rex = re.compile('(abcde|fghi|abc|def|gh)2,3?(i|j)')
print(rex.match('abcdefghij').group())
Output:
abcdefghij
answered 2 days ago
Patrick ArtnerPatrick Artner
27.1k62544
27.1k62544
add a comment |
add a comment |
The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.
Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.
Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.
What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.
add a comment |
The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.
Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.
Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.
What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.
add a comment |
The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.
Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.
Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.
What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.
The nongreedy modifier adjusts the preference of the regex engine. Normally, with greedy matching, the engine will prefer the longest leftmost possible match. With stingy (nongreedy) matching, the match governed by the stingy modifier will prefer the shortest leftmost possible match.
Notice that the engine will still do its darndest to find a match. If a longer expression will allow the engine to return a match where a shorter one won't, the engine will pick the one which allows it to return a match.
Perhaps it is helpful to think about this in terms of backtracking. With a greedy match, the engine will start with the hypothesis that a maximal repetition will succeed, then if that fails, backtrack and try successively shorter matches until it finds a match, or the search space is exhausted. With stingy matching, the preferred hypothesis is the shortest possible match, and backtracking will attempt successively increasing the number of repetitions.
What's probably tripping up your (unclear) example is that alternation specifies an explicit order for backtracking. The engine will explore matching starting with the first pattern among the alternatives, and only proceed to try the other alternatives if it runs out of search space before finding a match.
answered 2 days ago
tripleeetripleee
97k14135191
97k14135191
add a comment |
add a comment |
The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2?
which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.
When the matching starts, the engine finds abc
first and then tries to match e
or f
next, which does not match because there is a d
following abc
.
But because the quantifier is 1,2?
there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc
but it can match d
. Then it tries to match e
or f
again, where it can match e
add a comment |
The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2?
which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.
When the matching starts, the engine finds abc
first and then tries to match e
or f
next, which does not match because there is a d
following abc
.
But because the quantifier is 1,2?
there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc
but it can match d
. Then it tries to match e
or f
again, where it can match e
add a comment |
The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2?
which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.
When the matching starts, the engine finds abc
first and then tries to match e
or f
next, which does not match because there is a d
following abc
.
But because the quantifier is 1,2?
there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc
but it can match d
. Then it tries to match e
or f
again, where it can match e
The alternation in the group starts matching from the left and has a non greedy quantifier of 1,2?
which matches as few as needed and applies to that group. Python does not try to find the longest match in the alternation, but the first match.
When the matching starts, the engine finds abc
first and then tries to match e
or f
next, which does not match because there is a d
following abc
.
But because the quantifier is 1,2?
there is still one option left to try for the first group and backtracks to try to find a match again. It can not match abc
but it can match d
. Then it tries to match e
or f
again, where it can match e
answered 2 days ago
The fourth birdThe fourth bird
26.6k91630
26.6k91630
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55780753%2fdoes-m-n-regex-actually-minimize-repetitions-or-does-it-minimize-number-of-ch%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Why do you change your whole regex code so all answers get invalidated? We use your example and you change it .... so you disconnect our efforts from what is asked and later viewers can't understand why we answered like we did.
– Patrick Artner
2 days ago
I agree with @PatrickArtner, that is very bad form.
– kabanus
2 days ago
3
@PatrickArtner I think he was trying to be helpful and simplify it.
– Barmar
2 days ago
@PatrickArtner, sorry about that. I was trying to make the example simpler for anyone who reads this post in the future. My original example was needlessly complicated, involving 2-3 matches, when it should have involved 1-2. Apologies for the ensuing mayhem.
– James Shapiro
2 days ago