Abstract
In computer science, an ambiguous grammar is a formal grammar for which there exists a string that can have more than one leftmost derivation, while an unambiguous grammar is a formal grammar for which every valid string has a unique leftmost derivation. For real-world programming languages, the reference CFG is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are generally resolute by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous. It has been known since 1962 that the ambiguity problem for context-free grammars is undesirables. Ambiguity in context-free grammars is a frequent problem in language design and parser invention, as well as in applications where grammars are used as models of real-world physical structures. We observe that there is a simple linguistic categorization of the grammar ambiguity problem, and we show how to develop this to conservatively approximate the problem based on local regular approximations and grammar unfolding. As an application, we consider grammars that occur in RNA analysis in bioinformatics, and we demonstrate that our static analysis of context-free grammars is sufficiently precise and efficient to be sensibly useful.