Why are ambiguous grammars bad?Finding a unambiguous grammarWhy is left recursion bad?Show that every grammar for an inherently ambiguous CFL has infinitely many ambiguitiesAmbiguous Grammar and SLR parsing table : No conflicts?Find unambiguous grammar for an ambiguous grammarHow to find whether a grammar is ambiguous?Converting Ambiguous Grammar G to LL(1)unambiguous grammar but it's not LR(1)LR parsers and ambiguous and non deterministic grammarsIs there any relationship between grammar being ambiguous and the language itself?Whether it's necessary for a grammar to be ambiguous when it is both left recursive and right recursive
How can I maintain game balance while allowing my player to craft genuinely useful items?
Why was New Asgard established at this place?
Do Battery Electrons Only Move If There is a Positive Terminal at the End of the Wire?
Is using Legacy mode is a bad thing to do?
Digital signature that is only verifiable by one specific person
How to use random to choose colors
Is this broken pipe the reason my freezer is not working? Can it be fixed?
How to prevent cables getting intertwined
Do my partner and son need an SSN to be dependents on my taxes?
How to ask if I can mow my neighbor's lawn
Justifying Affordable Bespoke Spaceships
You may find me... puzzling
Credit card validation in C
How could I create a situation in which a PC has to make a saving throw or be forced to pet a dog?
Is it a bad idea to have a pen name with only an initial for a surname?
how to find which software is doing ssh connection?
How to sort human readable size
How can I prevent a user from copying files on another hard drive?
How can the US president give an order to a civilian?
Harmonic Series Phase Difference?
Who was the youngest Executive Producer?
Is a sequel allowed to start before the end of the first book?
Build a scale without computer
How can I ping multiple IP addresses at the same time?
Why are ambiguous grammars bad?
Finding a unambiguous grammarWhy is left recursion bad?Show that every grammar for an inherently ambiguous CFL has infinitely many ambiguitiesAmbiguous Grammar and SLR parsing table : No conflicts?Find unambiguous grammar for an ambiguous grammarHow to find whether a grammar is ambiguous?Converting Ambiguous Grammar G to LL(1)unambiguous grammar but it's not LR(1)LR parsers and ambiguous and non deterministic grammarsIs there any relationship between grammar being ambiguous and the language itself?Whether it's necessary for a grammar to be ambiguous when it is both left recursive and right recursive
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
New contributor
$endgroup$
add a comment |
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
New contributor
$endgroup$
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
New contributor
$endgroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
compilers ambiguity
New contributor
New contributor
edited Jun 10 at 3:42
user2357112
24717
24717
New contributor
asked Jun 9 at 12:52
HIRAK MONDALHIRAK MONDAL
14029
14029
New contributor
New contributor
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
1
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between >>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between >>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
6 Answers
6
active
oldest
votes
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "419"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
HIRAK MONDAL is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f110402%2fwhy-are-ambiguous-grammars-bad%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
answered Jun 9 at 13:11
Yuval FilmusYuval Filmus
202k15197359
202k15197359
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
12
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator
+
).$endgroup$
– Bakuriu
Jun 9 at 21:39
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator
+
).$endgroup$
– Bakuriu
Jun 9 at 21:39
13
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
edited 17 hours ago
answered Jun 10 at 8:50
ComFreekComFreek
25117
25117
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
answered Jun 9 at 21:28
DavislorDavislor
91349
91349
add a comment |
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
answered Jun 10 at 13:58
David RicherbyDavid Richerby
72.6k16112202
72.6k16112202
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
1
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
answered Jun 11 at 9:31
ratchet freakratchet freak
3,023910
3,023910
add a comment |
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
edited Jun 12 at 20:24
answered Jun 11 at 20:18
Jerry CoffinJerry Coffin
33127
33127
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
HIRAK MONDAL is a new contributor. Be nice, and check out our Code of Conduct.
HIRAK MONDAL is a new contributor. Be nice, and check out our Code of Conduct.
HIRAK MONDAL is a new contributor. Be nice, and check out our Code of Conduct.
HIRAK MONDAL is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f110402%2fwhy-are-ambiguous-grammars-bad%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16