PATR II Compiler Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf
PATR II Compiler28 Notationskonventionen Instantiierungsmodus von Argumenten –Blau: Input-Argumente –Rot: Output-Argumente Cut –roter Cut ! –grüner Cut ! Prädikatsdefinitionen –abgeschlossen –wird fortgesetzt
PATR II Compiler29 Direktiven % external resources :- [tokenize].% load tokenizer % operators :- op(510, xfy, : ).% attr:val :- op(600, xfx, ===).% path equation :- op(1100,xfx,'--->').% syntax rule, lexical entry :- op(1200,xfx,'::'). % description annotation
PATR II Compiler30 3 Compiler-Komponenten Tokenizer –Input: PATR II-Grammatik –Output: Token-Zeilen Präprozessor –Input: Token-Zeilen –Output: Token-Sätze Syntax-Compiler –Input: Token-Sätze –Output: Prolog-Klauseln compile_grammar(File):- clear_grammar, tokenize_file(File), read_sentences, compile_sentences.
PATR II Compiler31 Tokenizer-Input ; Shieb1.ptr ; Sample grammar one from Shieber 1986 ; Grammar Rules ; Rule {sentence formation} S --> NP VP: = =. Rule {trivial verb phrase} VP --> V: =. ; Lexicon ; Word uther: = NP = masculine third = singular.
PATR II Compiler32 Tokenizer Output = Präprozessor Input line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]). line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1),... line(3,[ ]). line(4,[ ]). line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]). line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),... line(7,[ ]). line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]). line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]). line(10,[b(1),o($ $),b(1),o($=$),b(1),o($<$),u($VP$),b(1),... line(11,[b(1),o($ $),b(1),o($=$),b(1),... line(12,[b(1)]). line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]). line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]). line(15,[b(1),o($ $),b(1),o($=$),b(1),o($<$),u($V$),b(1), line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),... line(42,[eof]).
PATR II Compiler33 Präprozessor Output = Compiler Input sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),... sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),... sentence(16,24,[u($Word$),l($uther$),o($:$),o($ $),o($=$),... sentence(25,30,[u($Word$),l($knights$),o($:$),o($ $),o($=$),... sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($ $),o($=$),... sentence(37,41,[u($Word$),l($sleep$),o($:$),o($ $),o($=$),... sentence(42,42,[eof]). Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren.
PATR II Compiler34 Präprozessor: Main Loop read_sentences:- abolish(cnt/1), write('preprocessing...'), nl, repeat, count(I), read_sentence(N,M,S), assert(sentence(N,M,S)), put(13), tab(3), write(I), write(' sentences preprocessed'), S = [eof], !, nl. read_sentence(N,M,S):- retract(line(N,L)), read_sentence(L,N,M,S), !. Backtracking
PATR II Compiler35 Präprozessor: Satz lesen read_sentence([eof],N,N,[eof]):- !.% end of file read_sentence([o($.$)|_],N,N,[]):- !.% end of sentence read_sentence([o($;$)|_],N,M,S):- !,% skip comment N1 is N+1, retract(line(N1,L)),% next line read_sentence(L,N1,M,S). read_sentence([],N,M,S):- !,% end of line N1 is N+1, retract(line(N1,L)),% next line read_sentence(L,N1,M,S). read_sentence([b(_)|T1],N,M,T2):- !,% skip blanks read_sentence(T1,N,M,T2). read_sentence([H|T1],N,M,[H|T2]):-% collect tokens read_sentence(T1,N,M,T2).
PATR II Compiler36 Compiler: Main Loop compile_sentences:- abolish(cnt/1), write('compiling...'), nl, retract(sentence(N,M,S)), compile_sentence((N,M),C,S,[]), assert(C), count(I), put(13), tab(3), write(I), write(' sentences compiled'), S = [eof], !, nl. Backtracking
PATR II Compiler37 Compiler: Satztypen % compile_sentence(Position,Clause,Sentence,Rest) compile_sentence(_,C) --> [eof], !, {C = finished}. compile_sentence(_,C) --> syntax_rule(C), !. compile_sentence(_,C) --> lex_entry(C), !. compile_sentence(_,C) --> template(C), !. compile_sentence(P,_,_,_):- P = (N,M), nl, write(' error in sentence between lines '), write(N), write(' and '), write(M), nl, fail.
PATR II Compiler38 Syntax-Regeln syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C). syntax_rule_cont((Expansion :: Descr)) --> rule_name, sr_expansion(Expansion,Sugar), rs(:), !, sr_path_equations(Equations,Sugar), {sr_sugar_cats(Sugar,Equations,Descr)}.
PATR II Compiler39 Reservierte Symbole rs(=) --> [o($=$)], !. rs(:)--> [o($:$)], !. rs( [o($<$)], !. rs(>) --> [o($>$)], !. rs('{') --> [o(${$)], !. rs('}') --> [o($}$)], !. rs('Rule') --> [u($Rule$)], !. rs('Word') --> [u($Word$)], !. rs('Let') --> [u($Let$)], !. rs('be') --> [l($be$)], !. rs('-->') --> [o($-$),o($-$),o($>$)], !. Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:).
PATR II Compiler40 Weitere Terminalsymbole uatom(A) --> [u(S)], {atom_string(A,S)}. latom(A) --> [l(S)], {atom_string(A,S)}. satom(A) --> [s(S)], {atom_string(A,S)}. int(I) --> [i(I)]. atom(A) --> uatom(A), !. atom(A) --> latom(A), !. atom(A) --> satom(A), !. atomic(A) --> atom(A), !. atomic(A) --> int(A), !.
PATR II Compiler41 Regelnamen rule_name --> rs('{'), !, % start of rule name curley_braces_terminated_string. rule_name --> [].% rule names are optional curley_braces_terminated_string --> rs('}'), !.% end of rule name curley_braces_terminated_string --> [_], % read any symbol curley_braces_terminated_string. Regelnamen werden überlesen und nicht in die Prolog- Repräsentation der Regeln übernommen.
PATR II Compiler42 Regelexpansion sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar), rs('-->'), sr_rhs(RHS,RSugar). sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar). sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar). ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars). ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar). fsd(Var,(FSD,Var)) --> uatom(FSD).
PATR II Compiler43 Syntax-Regeln: Pfadgleichungen sr_path_equations((E,Es),Sugar) --> sr_path_equation(E,Sugar), sr_path_equations(Es,Sugar). sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar). sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=), sr_val(RHS,Sugar). sr_val(V,Sugar) --> sr_path(V,Sugar). sr_val(V,_) --> atomic(V).
PATR II Compiler44 Syntax-Regeln: Pfade sr_path(Var,Sugar) --> rs( ), {member((FSD,Var),Sugar)}, !. sr_path(Var:P,Sugar) --> rs( ), {member((FSD,Var),Sugar)}, !. ne_feature_seq(F) --> feature(F). ne_feature_seq(F:P) --> feature(F), ne_feature_seq(P). fsd(FSD) --> uatom(FSD). feature(F) --> atomic(F).
PATR II Compiler45 Syntaktischer Zucker sr_sugar_cats([(Cat,Var)|Sugar],Equations, ((Var:cat === Cat),Descr)):- sr_sugar_cats(Sugar,Equations,Descr). sr_sugar_cats([],Descr,Descr). Rule {sentence formation} S --> NP VP: = =. Rule {sentence formation} X 0 --> X 1 X 2 : = S = NP = VP = =.
PATR II Compiler46 Lexikalische Einträge lex_entry(C) --> rs('Word'), !, lex_entry_cont(C). lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L), rs(:), !, lex_definition(FS, Descr). lexeme(L) --> atom(L).
PATR II Compiler47 Lexikon: Merkmalsstrukturen lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef), lex_definition(FS,LDefs). lex_definition(FS,LDef) --> lexdef(FS,LDef). lexdef(FS,LDef) --> template_name(FS,LDef), !. lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !.
PATR II Compiler48 Lexikon: Pfadgleichungen lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !, lex_val(FS, RHS). lex_path(FS,FS:P) --> rs( ), !. lex_val(FS,V) --> lex_path(FS,V). lex_val(_,V) --> atomic(V).
PATR II Compiler49 Templates template(C) --> rs('Let'), !, template_cont(C). template_cont((N :- TDef)) --> template_name(FS,N), rs('be'), template_definition(FS,TDef), {assert(template(N))}.
PATR II Compiler50 Templates: Head & Body template_name(FS,N) --> atom(A), {N =.. [A,FS]}. template_definition(FS,TDef) --> lex_definition(FS,TDef).
PATR II Compiler51 Löschen einer Grammatik clear_templates:- template(T), T =.. [F,_], abolish(F/1), fail. clear_templates:- abolish(template/1). clear_grammar:- abolish('::'/2), abolish(line/2), abolish(sentence/3), clear_templates.
PATR II Compiler52 Compiler Output A ---> B, C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head. A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular.
PATR II Compiler53 Resourcen Grammatiken PATR II / Prolog –shieb1.ptr / shieb1.arishieb1.ptrshieb1.ari –shieb2.ptr / shieb2.arishieb2.ptrshieb2.ari –shieb3.ptr / shieb3.arishieb3.ptrshieb3.ari –shieb4.ptr / shieb4.arishieb4.ptrshieb4.ari Tokens –shieb1.tok (Tokenizer)shieb1.tok –shieb1.snt (Präprozessor)shieb1.snt PATR II Interpreter –patrlcl.ari: Left-corner mit Linkingpatrlcl.ari –patrlclc.ari: Left-corner mit Linking und Syntaxbäumenpatrlclc.ari –patr-ii.ari: DCGpatr-ii.ari PATR II Compiler –patrcomp.aripatrcomp.ari –patr-ii.ari: DCGpatr-ii.ari
PATR II Compiler54 Offene Probleme und Erweiterungen Syntaktischer Zucker der Form VP_1 VP_2 X Lexikalische Regeln Templates in Syntaxregeln Negation und Disjunktion Default Vererbung (Priority Union)...
PATR II Compiler55 Literatur Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes. Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley. Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall.