Interpreter
Co je to interpreter (interpret) Motivace Obecný problém, jehož různé instance je třeba často řešit Jednotlivé instance lze vyjádřit větami v jednoduchém jazyce Obecné řešení Vytvoříme interpret tohoto jazyka Forma abstraktního syntaktického stromu Interpretace věty jazyka = řešení dané instance problému MOTIVATION If a particular kind of problem occurs often enough, then it might be worth while to express instances of the problem as sentences in a simple language. Then you can build an interpreter that solves the problem by interpreting these sentences. Example: - regular expressions. Rather than building custom algorithms to match each pattern against strings, search algorithms could interpret a regular expression that specifies a set of strings to match. boolean formula parsing xml into GUI The Interpreter pattern describes how to represent sentences in the language and interpret these sentences. In this example, the pattern describes how to define a grammar for regular expressions, represent a particular regular expression, and how to interpret that regular expression.
Interpreter – součásti vzoru Vzor obsahuje: Gramatiku Popisující jazyk, v němž budeme přijímat instance problému Co nejjednodušší Reprezentaci gramatiky v kódu Pro každé pravidlo gramatiky specifikuje třídu Třídy jsou jednotně zastřešeny abstraktním předkem Vztahy mezi třídami (dědičnost) odpovídají gramatice Reprezentaci kontextu interpretace Vzor neobsahuje: Parser pro konstrukci syntaktického stromu instance problému Many reasons why the parser isn’t included: - external buffers required for full construction of the syntax tree - usage of creational patterns - customized parsing methods Every grammar rule is represented by one class: - non-terminal: class contains (pointers to) other objects - terminal: node defines transcription
Interpreter – struktura obecně AbstractExpression (RegularExpression) - declares an abstract Interpret operation that is common to all nodes in the abstract syntax tree. TerminalExpression (LiteralExpression) - implements an Interpret operation associated with terminal symbols in the grammar. - an instance is required for every terminal symbol in a sentence. NonterminalExpression (AlternationExpression,RepetitionExpression, SequenceExpressions) - one such class is required for every rule R ::= R1 R2 ... Rn in the grammar. - maintains instance variables of type AbstractExpression for each of the symbols R1 through Rn. - implements an Interpret operation for nonterminal symbols in the grammar. Interpret typically calls itself recursively on the variables representing R1 through Rn. Context - contains information that's global to the interpreter Client - builds (or is given) an abstract syntax tree representing a particular sentence in the language that the grammar defines. The abstract syntax tree is assembled from instances of the NonterminalExpression and TerminalExpression classes. - invokes the Interpret operation.
Interpreter – účastníci AbstractExpression Deklaruje abstraktní metodu Interpret() Implementace zajišťuje interpretaci zpracovávaného pojmu TerminalExpression Implementuje metodu Interpret() asociovanou s terminálem gramatiky Instance pro každý terminální symbol ve vstupu (větě) NonterminalExpression Implementuje metodu Interpret() neterminálu gramatiky Třída pro každé pravidlo R::=R1R2…RN gramatiky Udržuje instance proměnných typu AbstractExpression pro každý symbol R1…RN Context Udržuje globální informace Client Dostane (vytvoří) abstraktní syntaktický strom reprezentující konkrétní větu jazyka složený z instancí NonterminalExpression a TerminalExpression Volá metodu Interpret() Collaborations The client builds (or is given) the sentence as an abstract syntax tree of NonterminalExpression and TerminalExpression instances. - Then the client initializes the context and invokes the Interpretoperation. Each NonterminalExpression node defines Interpret in terms of Interpret on each subexpression. The Interpret operation of each TerminalExpression defines the base case in the recursion. - The Interpret operations at each node use the context to store and access the state of the interpreter.
Interpreter - ilustrace Hudebník = interpereter Grammar = musical notation Context = tempo
Klasický příklad – gramatika Příklad: gramatika regulárního výrazu expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression '*' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }* The Interpreter pattern uses a class to represent each grammar rule. Symbols on the right-hand side of the rule are instance variables of these classes. The grammar above is represented by five classes: an abstract class RegularExpression and its four subclasses LiteralExpression, AlternationExpression, SequenceExpression, and RepetitionExpression. The last three classes define variables that hold subexpressions. 7
Klasický příklad – reprezentace gramatiky Příklad: gramatika regulárního výrazu expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression '*' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }* Abstraktní třída Její reprezentace v kódu Třída pro každé pravidlo gramatiky (instance udržují podvýraz), symboly na pravých stranách pravidel jsou v proměnných We can create an interpreter for these regular expressions by defining the Interpret operation on each subclass of RegularExpression. Interpret takes as an argument the context in which to interpret the expression. The context contains the input string and information on how much of it has been matched so far. Each subclass of RegularExpression implements Interpret to match the next part of the input string based on the current context. For example, - LiteralExpression will check if the input matches the literal it defines, - AlternationExpression will check if the input matches any of its alternatives, - RepetitionExpression will check if the input has multiple copies of expression it repeats, and so on. 8
Klasický příklad – reprezentace vět Příklad: gramatika regulárního výrazu expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression '*' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }* Abstraktní syntaktický strom Každý regulární výraz je reprezentován abstraktním syntaktickým stromem, tvořeným instancemi zmíněných tříd Every regular expression defined by this grammar is represented by an abstract syntax tree made up of instances of these classes. For example, the abstract syntax tree. 9
Klasický příklad – reprezentace vět Příklad: gramatika regulárního výrazu expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression '*' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }* Abstraktní syntaktický strom Každý regulární výraz je reprezentován abstraktním syntaktickým stromem, tvořeným instancemi zmíněných tříd Every regular expression defined by this grammar is represented by an abstract syntax tree made up of instances of these classes. For example, the abstract syntax tree. Reprezentace regulárního výrazu raining & ( dog | cats ) * 10
Příklad s booleovskými výrazy v Java(1) Práce s booleovskými výrazy BooleanExp ::= VariableExp | Constant | OrExp | AndExp | NotExp | '(' BooleanExp ')' AndExp ::= BooleanExp 'and' BooleanExp OrExp ::= BooleanExp 'or' BooleanExp NotExp ::= 'not' BooleanExp Constant ::= 'true' | 'false' VariableExp ::= 'A' | 'B' | ... | 'X' | 'Y' | 'Z' Interface pro všechny třídy definující booleovský výraz interface BooleanExp { public bool interpret(Context context); }; Kontext definuje mapování proměnných na booleovské hodnoty tj. konstanty ‘true’ a ‘false’ The example is a system for manipulating and evaluating Boolean expressions implemented in C++The terminal symbols in this language are Boolean variables, that is, the constants true and false. Nonterminal symbols represent expressions containing the operators and, or, and not. We define two operations on Boolean expressions. The first, Evaluate, evaluates a Boolean expression in a context that assigns a true or false value to each variable. The second operation, Replace, produces a new Boolean expression by replacing a variable with an expression. Replace show the Interpreter pattern can be used for more than just evaluating expressions. In this case, it manipulates the expression itself. The class Context defines a mapping from variables to Boolean values, which we represent with the C++ constants true and false. For simplicity, we ignore operator precedence and assume it's the responsibility of which ever object constructs the syntax tree. class Context { public bool lookup(String name); public void assign(VariableExp exp, boolean bool); };
Příklad s booleovskými výrazy v Java(2) Třída pro reprezentaci pravidla VariableExp ::= 'A' | 'B' | ... | 'X' | 'Y' | 'Z‘ class VariableExp implements BooleanExp { private String name; VariableExp(String name){ this.name = name; }; public boolean interpret(Context context){ return context.lookup(name); } Třída pro reprezentaci pravidla Constant ::= 'true' | 'false' class Constant implements BooleanExp { private boolean bool; Constant(boolean bool){ this.bool = bool; }; public boolean interpret(Context context){ return bool; } Evaluating a variable returns its value in the current context. To replace a variable with an expression, we check to see if the variable has the same name as the one it is passed as an argument.
Příklad s booleovskými výrazy v Java(3) Třída pro reprezentaci pravidla AndExp ::= BooleanExp 'and' BooleanExp class AndExp implements BooleanExp { private BooleanExp operand1; private BooleanExp operand2; AndExp(BooleanExp op1, BooleanExp op2){ operand1 = op1; operand2 = op2; }; public boolean interpret(Context context){ return operand1.interpret(context) && operand2.interpret(context); Evaluating an AndExp evaluates its operands and returns the logical "and" of the results. An AndExp implements Copy and Replace by making recursive calls on its operands. Obdobně také třídy pro pravidla OrExp a NotExp
Příklad s booleovskými výrazy v Java(4) Vytvoření instance výrazu a jeho interpretace BooleanExp expression; Context context; VariableExp x = new VariableExp("X"); VariableExp y = new VariableExp("Y"); expression = new OrExp( new AndExp(new Constant(true), x), new AndExp(y, new NotExp(x)) ); context.assign(x, false); context.assign(y, true); boolean result = expression.intepret(context); Vytvoření abstraktního syntaktického stromu pro výraz (true and x) or (y and (not x)) Ohodnocení proměnných The expression evaluates to true for this assignment to x and y. We can evaluate the expression with a different assignment to the variables simply by changing the context. Many kinds of operations can "interpret" a sentence. Of the three operations defined for BooleanExp, Evaluate fits our idea of what an interpreter should do most closely—that is, it interprets a program or expression and returns a simple result. However, Replace can be viewed as an interpreter as well. It's an interpreter whose context is the name of the variable being replaced along with the expression that replaces it, and whose result is a new expression. Even Copy can be thought of as an interpreter with an empty context. It may seem a little strange to consider Replace and Copy to be interpreters, because these are just basic operations on trees. The examples in Visitor (366) illustrate how all three operations can be refactored into a separate "interpreter" visitor, thus showing that the similarity is deep. Interpretuje výraz jako true, můžeme změnit ohodnocení a znovu provést interpretaci
Interpreter – použití s dalšími vzory Composite Nejčastější kombinace Struktura stromu je implementace Composite Iterator Klasické procházení strukturou Důležitý společný abstraktní předek Flyweight Typické pro překladače Sdílení konstantních výrazů vyhodnocovaných v compile-time 15
Interpreter – použití s dalšími vzory Visitor Můžeme použít pro zvýšení udržitelnosti a možnosti rozšíření metody interpret S užitím polymorfismu na Visitor můžeme dosáhnout zcela různých interpretací Příklady: Aritmetické výrazy a přepínání mezi prefix, postfix a infix notací Booleovské výrazy – různá chování, např. vyhodnocení pravdivostní hodnoty formule a převod CNF Konfigurační soubory: převod mezi XML a plain-text formátem ... 16
Příklad reálného použití – ELResolver Java package javax.el.ELResolver public abstract class ELResolver public class ArrayELResolver extends ELResolver public class BeanELResolver extends ELResolver public class CompositeELResolver extends ELResolver ... Expression Language (also referred to as the EL), provides an important mechanism for enabling the presentation layer (web pages) to communicate with the application logic (managed beans). EL provides a way to use simple expressions to perform the following tasks: Dynamically read application data stored in JavaBeans components, various data structures, and implicit objects Dynamically write data, such as user input into forms, to JavaBeans components Invoke arbitrary static and public methods Dynamically perform arithmetic operations 17
Komplexní příklad v C# – římské číslice(1) class App { static void Main() string roman = "MCMXXVIII"; Context context = new Context(roman); List<Expression> tree = new List<Expression>(); tree.Add(new ThousandExpression()); tree.Add(new HundredExpression()); tree.Add(new TenExpression()); tree.Add(new OneExpression()); foreach (Expression exp in tree) exp.Interpret(context); Console.WriteLine("{0} = {1}", roman, context.Output); }
Komplexní příklad v C# – římské číslice(2) class Context { public Context(string input) this.Input = input; } public string Input get; set; public int Output
Komplexní příklad v C# – římské číslice(3) abstract class Expression { public void Interpret(Context context) if (context.Input.Length == 0) return; if (context.Input.StartsWith(Nine())) context.Output += (9 * Multiplier()); context.Input = context.Input.Substring(2); } else if (context.Input.StartsWith(Four())) context.Output += (4 * Multiplier()); else if (context.Input.StartsWith(Five())) context.Output += (5 * Multiplier()); context.Input = context.Input.Substring(1); while (context.Input.StartsWith(One())) context.Output += (1 * Multiplier()); public abstract string One(); public abstract string Four(); public abstract string Five(); public abstract string Nine(); public abstract int Multiplier();
Komplexní příklad v C# – římské číslice(4) class ThousandExpression : Expression { public override string One() { return "M"; } public override string Four() { return " "; } public override string Five() { return " "; } public override string Nine() { return " "; } public override int Multiplier() { return 1000; } } class HundredExpression : Expression public override string One() { return "C"; } public override string Four() { return "CD"; } public override string Five() { return "D"; } public override string Nine() { return "CM"; } public override int Multiplier() { return 100; } class TenExpression : Expression public override string One() { return "X"; } public override string Four() { return "XL"; } public override string Five() { return "L"; } public override string Nine() { return "XC"; } public override int Multiplier() { return 10; } class OneExpression : Expression public override string One() { return "I"; } public override string Four() { return "IV"; } public override string Five() { return "V"; } public override string Nine() { return "IX"; } public override int Multiplier() { return 1; }
Interpreter - shrnutí Typické použití Omezení použitelnosti Parsery a kompilátory Omezení použitelnosti Interpretace jazyka, jehož věty lze vyjádřit abstraktním syntaktickým stromem Gramatika jazyka je jednoduchá Složitější gramatiky → nepřehledný kód, exploze tříd Efektivita není kriticky důležitá Jinak lépe nekonstruovat syntaktický strom → stavový automat KNOWN USAGE The Interpreter pattern is widely used in compilers implemented with object-oriented languages, as the Smalltalk compilers are. SPECTalk uses the pattern to interpret descriptions of input file formats [Sza92]. The QOCA constraint-solving toolkit uses it to evaluate constraints [HHMV92]. Considered in its most general form (i.e., an operation distributed over a class hierarchy based on the Composite pattern), nearly every use of the Composite pattern will also contain the Interpreter pattern. But the Interpreter pattern should be reserved for those cases in which you want to think of the class hierarchy as defining a language. Specialized database query languages such as SQL. Specialized computer languages which are often used to describe communication protocols APPLICABILITY Use the Interpreter pattern when there is a language to interpret, and you can represent statements in the language as abstract syntax trees. The Interpreter pattern works best when - the grammar is simple. For complex grammars, the class hierarchy for the grammar becomes large and unmanageable. Tools such as parser generators are a better alternative in such cases. They can interpret expressions without building abstract syntax trees, which can save space and possibly time. - efficiency is not a critical concern. The most efficient interpreters are usually not implemented by interpreting parse trees directly but by first translating them into another form. For example, regular expressions are often transformed into state machines. But even then, the translator can be implemented by the Interpreter pattern, so the pattern is still applicable. 22
Interpreter - shrnutí Výhody Nevýhody Související návrhové vzory lehce rozšířitelná/změnitelná gramatika jednoduchá implementace gramatiky přidávání dalších metod interpretace Nevýhody složitá gramatika těžce udržovatelná Související návrhové vzory Composite Abstraktní syntaktický strom je instancí NV Composite Iterator Využití k průchodu strukturou Flyweight Sdílení terminálových symbolů uvnitř syntaktického stromu Typické u programovacích jazyků (častý výskyt té samé proměnné) Visitor Definice/změna interpretace všech uzlů abstraktního syntaktického stromu jednou třídou CONSEQUENCES The Interpreter pattern has the following benefits and liabilities: - It's easy to change and extend the grammar. Because the pattern uses classes to represent grammar rules, you can use inheritance to change or extend the grammar. Existing expressions can be modified incrementally, and new expressions can be defined as variations on old ones. - Implementing the grammar is easy, too. Classes defining nodes in the abstract syntax tree have similar implementations. These classes are easy to write, and often their generation can be automated with a compiler or parser generator. - Complex grammars are hard to maintain. The Interpreter pattern defines at least one class for every rule in the grammar (grammar rules defined using BNF may require multiple classes). Hence grammars containing many rules can be hard to manage and maintain. Other design patterns can be applied to mitigate the problem (see Implementation). But when the grammar is very complex, other techniques such as parser or compiler generators are more appropriate. - Adding new ways to interpret expressions. The Interpreter pattern makes it easier to evaluate an expression in a new way. For example, you can support pretty printing or type-checking an expression by defining a new operation on the expression classes. If you keep creating new ways of interpreting an expression, then consider using the Visitor pattern to avoid changing the grammar classes. 23
Interpreter – odkazy a literatura GoF E. Gamma, R. Helm, R. Johnson, J. Vlissides: Design Patterns (Elements of Reusable Object-Oriented Software, 1995) Wikipedia http://en.wikipedia.org/wiki/Interpreter_pattern ELResolver https://docs.oracle.com/javaee/5/api/javax/el/ELResolver.html 24