Related smells: Packages Not Clearly Named, Duplicate Abstraction, Inconsistent Naming Convention, Improper Quote Usage
There are a lot of potential problems with names used within a grammar, mostly concerning nonterminal names and labels. Many grammar notations do not support labels (decorative names for production rules or right hand side subexpressions) [SAC-2012-Zaytsev], but realistic metalanguages tend to have them in some form. Nonterminal names, on the other hand, are essential – they are optional only in notations for regular expressions, and present in all grammar notations of the context-free kind and beyond.
One can blame names to be uncommunicative, like the names from the last example: abc
or pqr
are much worse for the readability and maintainability of the grammar than if_statement
, CompilationUnit
or DIGIT
, similarly to how this is a problem in programming in general [ICPC-2017-BeniaminiGOF]. One can also investigate whether naming policies are present and how they are respected. For instance, if all nonterminals are camelcased, but one is lowercase with an underscore separator, it was probably a misspelling – cases like this were reported in a MediaWiki grammar which was created by several unrelated grammar engineers [MediaWiki2011]. If can also be the case that the naming policy carries semantic meaning: typically lexical nonterminals and/or preterminals are named in uppercase, to distinguish them visually when they are used next to others like this:
if_stmt ::= IF condition THEN expression ENDIF;
Sometimes naming policies represent namescoping, which is considered a bad smell in OOP but is much less so in grammars because all names are global (at least up to a module level, if we have modules). An example:
VarDef ::= VarName DefKeyword VarType;
ConstDef ::= ConstName DefKeyword ConstType;
Finally, names can be misleading and contain words that contradict the definition of the named entity. For example:
WhileStatement ::= "while" Condition Block
| "repeat" Block "until" Condition ;