User:Steamerandy/sandbox/TEST

This is an old revision of this page, as edited by Steamerandy (talk | contribs) at 19:56, 30 November 2019 (→‎goal-oriented programming). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This is a wiki markup test and example page.

Added test ahead of Sub heading 1 lvl2

Subject space

stack-oriented programming

stack-oriented programming is a programming paradigm that relies on a stack (or stacks) for manipulating data and/or passing parameters. Several programming languages fit this description, notably Forth, RPL, PostScript,

Programing with formula

For the most part programming in the parser language is writing linguistic formula. It is an art to be proficient. It's a lot like working with algebraic equations. There are, some times, many ways to express the same language constructs. Though backtracking is available it should be avoided. Many times backtracking can be completely avoided by factoring. Other times ordering the alternative so more commonly used language constructs are tried first. In many cases backtracking will only be needed for error recovery. i.e. When the parser is unable to recognize the input. Backtracking and scanning to find the end of the construct.

$(-';' .ANY) ';'

Many things that are common to compilers in general are provided by the metacompiler. A fully automated memory managed dynamic object system. Intergrated file input and output are provided.

There are four types of rules. Character class, token, syntax, and unparse rules that are best described by examples and discussion. Starting with the simplest and progressing up to the more complex.

The driving rule

Rules are functions that perform tests on the input. A rule is it's self a test that returns it's success or failure. The top rule or program rule drives the compilation. This sounds like a difficult task. But is usually the simplest rule. We start at the top and write a rule defining what a program is in the language we intend to compile.

1) program = $declaration;  // A program is a sequence of declaration.

Comments are delimited using //. Text including the // to the end of line are ignored.

The above driving rule is simply declaring a program to be a series of declaration. The $ operator is specifying that zero or more declaration are to be parsed. If we needed to require at least one declaration the rule would be written:

2) program = declaration $declaration;  // One or more declaration.

A more complicated driving rule for the COBOL language might look like:

3) program = IDENTIFICATION_Division
             ENVIRONMENT_Division
             DATA_Division
             PROCEDURE_Division;

The rules above are an LR description of the entire program. They directly generate code. Rule (1) above in assembly is straight forward. A rule is a function that returns a condition of success or failure. The $, zero or more operator, is a programed loop that applies to the expression following it. The expression code is generated within a loop construct. DO <something> WHILE <successful>. So that means we need a label for the start of the loop, generate code for the <something> and generate code to jmp back to the labeled start of the loop on success. On failure of the something the $ loop operator is successful. So in the program rule we have code to return success.

C++ procedures are used in the examples here. Success or failure is returned in the processor status zero or equel flag. The actual code is IA86 assembly language encapsulated in a "__declspec(naked) void" function. C++ is used as an MOL(Machine Oriented Language). Some support functions can be written C++ while others can be written in encapsulated asm functions as the parsing code is shown here. Writing and implementing a metacompiler this way is simplified by using an existing development platform. The metacompiler can eventually produce linkable files using the same C++ support routines developed at this stage.


//   program = $declaration;
__declspec(naked) void program(){ _asm{
l1:	call	declaration
	je	l1
	cmp	al,al
	ret
}}

The implementation of the metagrammar rule in this case is vary simple. The __declspec(naked) creates a c++ function having no preamble or exit code. The _asm { ... } is declaring the contained code to be assembly language. Above the entire function is assembly code. There is no additional code being generated by the C++ compiler.

When program is called it is entered at the first assembly code line: were the declaration rule is called:

l1:     call    declaration

The call is an assembly instruction that calls a function. The l1: is a label used in addressing that line of code. A grammar rule returns success or failure in the processor status register zero flag. The __declspec(naked) allows programming outside the C++ context. We are not using C calling conventions writing in assembly. The void program() declaration is only providing a label for our rule.

         je      l1

Tests the processor flag and if a equal condition is true it jumps to the l1 tag calling declaration repeatedly as long as declaration is successful. The $declaration is zero or more declarations. If declaration fails the status flag will indicate a NE condition and the je l1 will instead proceed to the next instruction

        cmp al,al

that will set the equal condition as the $ zero or more operator has succeeded. The rule returns an eq condition to it's caller. This is assuming declarations will have specific backtracking error recovery. The alternate rule, program = declaration $declaration; requiring at least one declaration is coded with comments as follows:

__declspec(naked) void program(){
// program is declared to be a naked asm function
// Naked functions having no stack frame do not use ebp.
_asm {   // naked function containing only assembly code.
     call  declaration    // call the declaration rule
     je    l2             // Test success
     ret                  // return failure
l1:  call  declaration    // calls declaration function
     je    l1             // declaration called repeatedly
                          // until ne indicates failure
     cmp  al,al           // comparing al to itself sets
                          // the EQ condition in the
                          // processor status register.
I2:  ret                  // returns success. rules
}}                        //  have no declared variables.

If we were coding the grammer of our metacompiler here. Writing the compiler, translating the language, coding it in assembly would be the start of the bootstrap process. The next step would be defining declaration. That is however not what this is about. We are explaining the parser programming. We have explained how the grammar rules are to be used as a top down parser. The COBOL driving rule is an example of an LR rule defining the order of the COBOL divisions.

Grammar rules of a matacompiler

Here we have a part of the grammar rules of an advanced Schorre metacompiler. The program driving rule has allready been discussed. The declarations rule specifies all possable declarations. We have compiler directives that starts with a # character. Then comment specifies comment syntax. Comments are modeled after the C language. The global declaration is used to declare external linkages, global variables. Code sections declerations are global with in a compile module. Except external linkages. Various constructs start with an "identifier. Grammar rules. generator code sequencers etc.

program =  $ declarations;

declarations =	"#" dirictive
              | comment
              | global	  	 DECLAR(*1)                 
              |(id (grammar	 PARSER(*1)
                  | sequencer	 GENERATOR(*1)
                  | optimizer	 ISO(*1)
                  | pseudo_op	 PRODUCTION(*1) 
                  | emitor_op	 MACHOP(*1))
                || ERRORX("!Grammer error") garbol);

grammar = (':' class  :CLASS   // character class define
         |".." token  :TOKEN   // token rule
         |"==" syntax :BCKTRAK // backtrack grammar rule
         |'='  syntax :SYNTAX  // grammar rule.
        	)';'!2	  // Combine name and rule tree
                  	$(-break -"/*" .any);

comment = "//" $(-break .any) 
        | "/*" $(-"*/" .any) "*/"

The grammar rule may be a token or a syntax rule. A character class defines a named set of characters. This declaration is showing how language constructs are factored to avoid backtracking. In the declarations rule the constructs starting with an id have been factored into a group. The rule operator determining the construct. The grammar rules are only one type of language construct. Other function declarations also start with an id. A sequencer, optimizer, pseudo_op, or emitor_op all began with an id.

CWIC Variables

Variables hold objects. Variables may be static or dynamic. Static variables may be local to the file in which they are declared or global across files. Stack variables are local to the block in which they are declared and blocks contained within the declaring block following the declaration. A stack variable is unique to the invocation of the declaring block.

A variable is not typed. Objects are typed. A variable may hold a string at some point and an integer another. A variable may be declared in a block. A variable may also be declared by use in a pattern. In which case it's name is a declaration visible in the action. It's use in the pattern is an implied assignment of the pattern element were placed. Although it is more commonly used to hold the returned result of a generator call in the pattetn. i.e. The pattern ..

(ADD[x,y])

matches an ADD tree having two branches. The branches would be assigned to the variables x and y occupieing the branch positions. They would be assigned the tree branches unprocessed. The pattern may call a generators to process the branches. The generator may return values. In the unparse rule a variable may be assigned the returned value of a called generator in the pattern.

eval(ADD[eval(x),eval(y)]) =>

The pattern matches an ADD tree having two branches. In which case eval is called with the left branch and x assigned the return value. Then calling eval with the right branch and return value assigned to y. The action is then executed. x and y are local stack variables visable to the pattern's action.

Patterns implement the tree crawling, unparsing strategies needed in linearizing code.

Unparse Rules

An unparse rule is used to match a language construct pattern or object. A generator is in a way like an overloaded function. The generator name applies to a set of unparse transform rules. Each rule is an Unparse patten and an action. The action is executed when its associated Unparse pattern is successful.

<name>{(<pattern 1>) => <action 1>
       (<pattern 2>) => <action 2>
               ●
               ●
               ●
       (<pattern n>) => <action n>}

The unparse rules order is significant. The first successful unparse's action will be taken. They can match lists written in tree form. Variables placed in a list positions are are local to the transform. The unparse rule is written as an object pattern to be matched. Patterns may contain variables. Variables are not matched but assigned the object in its. place. In a way a pattern is an argument list and the action a function body. Tests may be used in patterns. A variable having taken on the object in it place may be tested. The factorial function illistrates the pattern testing in action.

factorial{(0)   => return 1;
          (×>0) => return x*factorial(x - 1);}

Sense there is no factorial unparse rule accepting a negative value. A negative number passed to the factorial generator would return failure. Success and failure are the same is in the grammar formula. A generator can return failure to the calling function.

Unlike overloaded functions an unparse pattern is runtime executable code. You may of noticed the functional programming parallels here. That is no coincidence as the CWIC generator language is a dialect of lisp 2. The unparse rules are intended to take abstract parse trees apart.

codegen{(ADD[x,y]) =>  ...;}

The pattern ADD[x,y] matches an ADD node tree and assigns it's left branch to x and right branch to y. The variables x and y are local variables to the patterns action. A feature of the pattern is that a generator can be called from within the pattern. But its arguments and return are mirored. That is when calling a generator from within the unparse its argument is the object of the place in which it resides. And the parameters are local variables assigned returned values. We won't local variables to get those objects. CWIC has a shorthand method of specifying actions determined by pattens. The short hand method was vectors. Vectors are simply corosponding lists.

eval{(#N[eval(x),eval(y)]) => {
       #N = ADD, SUB, MPY, DIV;
       #A = x+y, x-y, x*y, x/y;
       return #A;}

     (number(x)) => return x;
     (x) => return val:(x);}

The above eval generator evaluates an arithmetic expression. The #N is a vector of node names. #A is a vector of expressions corosponding actions to the node names. The pattern matching any node from the #N vector, calls itself with the left tree branch returning x. then calls itself with the right tree branch returning y. The return then perform the corosponding #N vector action on x and y to get the return value. The second unparse rule matches a number and returns it. The third pattern a variable returns a its val attribute.

The following is an example of CWIC code. The sub languages required a declaration. ".SYNTAX" declared the following to be in the syntax sub-language.

.SYNTAX
PR0GRAM = $(ST / DECLARATI0N) '.END' ;
DECLARATI0N = ID '=' NUM ';' :EQU!2 DECL[*1];
ST = ID ':=' EXP ';' :STORE!2 C0MP1LE[ *1];
EXP = TERM $(('+':ADD/'-':SUB) TERM !2);
TERM = FACTOR $(('*':MPY/'/':DIV) FACTOR !2);
FACTOR = ID / '(' EXP ')' / NUM;
LET: 'A'/ 'B'/ 'C'/ 'D'/ 'E'/ 'F'/ 'G'/ 'H'/ 'I'/ 'J'/ 'K'/ 'L'/ 'M'/
     'N'/ 'O'/ 'P'/ 'Q'/ 'R'/ 'S'/ 'T'/ 'U'/ 'V'/ 'W'/ 'X'/ 'Y'/ 'Z';
DGT: '0'/ '1'/ '2'/ '3'/ '4'/ '5'/ '6'/ '7'/ '8'/ '9';
ALPHNUM: LET / DGT;
ID  .. LET $ALPHNUM;
NUM .. DGT $DGT MAKENUM[];
.F1N15H
.STOP SETUP PROGRAM

The following is a CWIC generator example.

.GENERATOR
DECL(EQU[X, Y]) => DEF:(X) := Y;
COMPILE(STORE[X, Y]) => DEF:(X) := EVAL(Y); PR1NT(DEF:(X));
EVAL(ID(X))     => DEF:(X);
    (NUMBER(X)) => X;
    (#V1[EVAL(X), EVAL(Y)]) => #U1;
       #V = ADD, SUB, MPY, DIV
       #U = X+Y, X-Y, X*Y. X/Y
.F1N15H
.STOP SETUP PROGRAM

CWIC is an old language usually punched on cards using a key punch having a limited character set. As you can see its syntax is a bit different then the other examples here. The unparse patterns and actions are conceptilly no different.

Early metacompilers

The early history of metacompilers is closely tied with the history of SIG/PLAN Working group 1 on Syntax Driven Compilers. The group was started primarily through the effort of Howard Metcalfe in the Los Angeles area.[1] In the fall of 1962 Howard Metcalfe designed two compiler-writing interpreters. One used a bottom-to-top analysis technique based on a method described by Ledley and Wilson.[2] The other used a top-to-bottom approach based on a work by glennie to generate random English sentences from a context-free grammar.[3]

At the same time, Val Schorre described two "meta machines". One generative and one analytic. The generative machine was implemented and produced random algebraic expressions. Meta I the first metacompiler was implemented by Schorre on an IBM 1401 at UCLA in January 1963. His original interpreters and metamachines were written directly in a pseudo-machine language. Meta II, however, was written in a higher-level metalanguage able to describe its own compilation into the pseudo-machine language.[4][5][6]

Lee Schmidt at Bolt, Beranek, and Newman wrote a metacompiler in March 1963 that utilized a CRT display on the time-sharing PDP-l.[7] This compiler produced actual machine code rather than interpretive code and was partially bootstrapped from Meta I.

Schorre bootstrapped Mteta II from Meta I during the Spring of 1963. The paper on the refined metacompiler system presented at the 1964 Philadelphia ACM conference is the first paper on a metacompiler available as a general reference. The syntax and implementation technique of Schorre's system laid the foundation for most of the systems that followed. The system was implemented on a small 1401, and was used to implement a small ALGOL-like language.

Many similar systems immediately followed.

Roger Rutman of A. C. Sparkplug developed and implemented LOGIK, a language for logical design simulation, on the IBM 7090 in January 1964.[8] This compiler used an algorithm that produced efficient code for Boolean expressions.

Another paper in the 1964 ACM proceedings describes Meta III, developed by Schneider and Johnson at UCLA for the IBM 7090.[9] Meta III represents an attempt to produce efficient machine code, for a large class of languages. Meta III was implemented completely in assembly language. Two compilers were written in Meta III, CODOL, a compiler-writing demonstration compiler, and PUREGOL, a dialect of ALGOL 60. (It was pure gall to call it ALGOL).

Late in 1964, Lee Schmidt bootstrapped the metacompiler EQGEN, from the PDP-l to the Beckman 420. EQGEN was a logic equation generating language.

In 1964, System Development Corporation began a major effort in the development of metacompilers. This effort includes powerful metacompilers, Bookl, and Book2 written in LISP which have extensive tree-searching and backup capability. An outgrowth of one of the Q-32 systems at SDC is Meta 5.[10] The Meta 5 system incorporates backup of the input stream and enough other facilities to parse any context-sensitive language. This system was successfully released to a wide number of users and had many string-manipulation applications other than compiling. It has many elaborate push-down stacks, attribute setting and testing facilities, and output mechanisms. The fact that Meta 5 successfully translates JOVIAL programs to PL/l programs clearly demonstrates its power and flexibility.

The LOT system was developed during 1966 at Stanford Research Institute and was modeled very closely after Meta II.[11] It ha6d new special-purpose constructs allowing it to generate a compiler which would in turn be able to compile a subset of PL/l. This system had extensive statistic-gathering facilities and was used to study the characteristics of top-down analysis.

SIMPLE is a specialized translator system designed to aid the writing of pre-processors for PL/I, SIMPLE, written in PL/I, is composed of three components: An executive, a syntax analyzer and a semantic constructor.[12]

The TREE META compiler was developed at Stanford Research Institute in Menlo Park, California. April 1968.

The early metacompiler history is well documented in the TREE META manual. TREE META paralleled some of the SDC developments. Unlike earlier metacompilers it separated the semantics processing from the syntax processing. The syntax rules contained tree building operations that combined recognized language elements with tree nodes. The tree structure representation of the input was then processed by a simple form of unparse rules. The unparse rules used node recognition and attribute testing that when matched resulted in the associated action being performed. In addition like tree element could also be tested in an unparse rule. Unparse rules were also a recursive language being able to call unparse rules passing elements of thee tree before the action of the unparse rule was performed.

The concept of the metarnachine originally put forth by Glennie is so simple that three hardware versions have been designed and one actually implemented. The latter at Washington University in St. Louis. This machine was built from macro-modular components and has for instructions the codes described by Schorre.

CWIC (Compiler for Writing and Implementing Compilers) is the last known Schorre metacompiler. It was developed at Systems Development Corporation by Erwin Book, Dewey Val Schorre and Steven J. Sherman With the full power of (lisp 2) a list processing language optimizing algorithms could operate on syntax generated lists and trees before code generation. CWIC also had a symbol table built into the language.

With the resurgence of domain-specific languages and the need for parser generators which are easy to use, easy to understand, and easy to maintain, metacompilers are becoming a valuable tool for advanced software engineering projects.


META II a syntax-oriented compiler writing language

META-3 syntax-directed compiler writing compiler to generate efficient code

Syntax directed translation

Schorre metalanguages are clasical examples of syntax-directed translation. The source language specification and parsing is programmed using reductive phrase structure boolean formula. Each formula specifying a constituent test against the input source character stream. Grammar and code generation are Stack-oriented programming languages. META II runs on a stack machine and generates code for a stack machine.

The parsing specification languages developed by Schorre do not clearly fall into a programming language paradigm. They resembles phrase structured analytic grammar rules that most closely align with the functional programming paradigm. There are no formal arguments to parser function. All take their input source from a character stream.

The specifications starts with a top driving "program" formula. The complete language specification is programed using formula that define constituent formulations. Sense a formula is a boolean function expressed as a logical expression. We have the basis for the functional programming paradigm claim.[* 1]


A rule specifies some language construct or token of the object language. As a funcion each character from the input stream is analyzed, testing its compliance to the rule. If unsuccessful the rule returns failure. If successful it will have advanced the input stream over the characters in compliance.

expr = term $(('+'|'-') term);

The expr rule first calls term. If term returns failure expr returns failure. If term is successful the expr rule then specifies that zero or more terms preceded by a + or - may follow. The zero or more operator $ is a programed loop. The loop successfully terminates when it fails to match the first or start element of the loop expression. In this case the loop expression is:

(('+'|'-') term)

The $ loops the grouped expression. The first specified match in the loop expression is grouped. ('+'|'-'). A plus or minus character must be matched for the loop to proceed. If a plus or minus is in the the input stream with only white space proceeding it the test will succeed and term will be called. When a plus or minus is not found the loop will terminate successfully. When a plus or minus is found a term must follow. If a term is not found after matching a plus or minus there is probably an error in the input. Such failing after a match is successful will cause a backtrack failure. Character string constants are delimited by " marks. Single character constants may use the single '. A '<character>' is a single character string and may contain a '. Three single ' quote characters is a one character string containing a '. Generally ' strings are used in character class declarations where single characters are required. They may be used anywhere for single character strings.

goal-oriented programming

In the parser programming languages a top-down goal-oriented reductive method of syntax analysis is programed. The main goal is to match a complete program module starting with the program formula.

program = $((declaration
            |.EOF .STOP
            )
            \ERRORX["Error"]        
             $(-';' .ANY) .ANY
           );

This main "program" goal is acompplished by looking for sub-goal entities from left to right and down as individual subgoals. The above, using the $ (zero or more loop operator), specifies a program to be a sequence of declaration that terminates successfully on end of file .EOF. were .STOP breaks out of the $ zero or more loop.

$((declaration | .EOF .STOP)
\ERRORX["Error"] $(-';' .ANY) ';')

On failing to parse a declaration it checks for .EOF (End Of File). Not at .EOF the or a long_fail on partial match of a declaration the \ backtrack alternative is taken, an error is assumed and reported using the ERRORX["Error"] function. After which recovery is attempted scanning for a ';' character using $(-';' .ANY) .ANY which scans for a ; character using a negative -';' look ahead. A look ahead, negative or positive, does not advance the parse. .ANY in the loop and outside the loop then match the character tested by the -';'.

There are several types of declaration. Each a sub-goal:

declaration =	"#" dirictive
              | comment
              | global	 	DECLAR[*1]                 
              | (id (formula   PARSER[*1]
                    | production GENERATOR[*1]
                    | optimizer ISO[*1]
                    | pseudo_op PRODUCTION[*1] 
                    | emitor_op MACHOP[*1]
                    )
                \ ERRORX["!Grammer error"] garbol);

These subgoals dirictive, comment, global, and id that is followed by nested subgoals: formula, production , optimizer, pseudo_op, and emitor_op defined by their respective formulae in terms of other entities are tested in order. and so the process of recognising a complete program becomes a matter of looking for subgoals right down to the level at which characters are recognised.

id .. let $(+'_' alphanum | alphanum);

Note, that an id token formula is required to start with a let and must end with an alphanum. An id may not start or end with an underscore '_' character. The +'_' makes the underscore included in the id string. Without the + operator underscores would not be significant. i_d would match id, treated is identical symbols. The above token rule does not allow multiple underscore characters to appear together. i.e. i__d is not a valid id.

CWIC and SLIC removed from Intro

CWIC is made up of 3 languages .. SYNTAX, GENERATOR and MOL360. Each a seperate compiler. CWIC was primarily designed for use on an IBM 360. SLIC a single compiler added a PSEUDO (assembler macro) operation language= and a MACHOP language allowing target machine code generation symbolically.

I implemented SLIC on a DEC-System-10. It started out as a CWIC implementation for the DEC-10. But with so many processor architectures at the time I decided to try and make SLIC more generlized. CWIC generated 8 bit byte, 16 and 32 bit word output targeting the IBM 360 line of computers. The term plant, planted and planting refer to code generation operations. CWIC used mathematical expressions inclosed in angle brackets.

< <expression> >.

SLIC needed to generate 36 bit word code for the DEC-10. I replaced the CWIC code expressions with PSEUDO instructions. PSEUDO instructions are like assembly macros, implementing higher level functions. PSEUDO instructions however are executable functions but are not executed as they are generated. They are appended to a list associated with a named section. I also refereed to this code generation as planting instructions.

A MACHOP is a function used to define a machine operation or instruction. MACHOPs define an assembly operations. It's invocation syntax is designed to appear as assembly code. MACHOPs make code generation much more readable and easier to debug. Assembly code optionally output to the compiler list file helped both.compiler development and the end user of the compiler. Assembly level debugers were the norm on those days.

In generating code for different computer instruction set architecturs and numonics the PSEUDO instructions provide a clean separation between specific machine instructions and language required functionality. PSEUDO instructions are planted into declared code sections. Any number of sections may be declared. PSEUDOs are executed when in a generator function a section is flushed. On flushing the pseudo list may optionally be processed by In Sequence Optimization transforms before execution. The PSEUDO and MACHOP languages of SLIC provided readable symbolic code generation while also producing linkable machine code files.

MACHOP functions define assembly like instructions, transforming their parameters into a sequence of bit fields. PSEUDO operation coded in the PSEUDO language use MACHOPs to output linkable code. The ISO, In Sequence Optimization, was never implemented. The MACHOP language was also implemented as a seperate assembler generator. SLIC is not publicly published. It was used to implement a COBOL compiler that was used to write hotel front desk applications that ran on TI-990 mini-computers.

CWIC and SLIC have parser (or SYNTAX) languages simular to TREEMETA. They all work very simular to META II. The main difference is TREEMETA, CWIC and SLIC have transform operators for combining tokens and/or tree constructs into abstract parse trees that at some point are passed to a generator function. TREEMETA uses unparse rules to transform abstract parse trees into sequential code. CWIC and SLIC use a block structured LISP 2 like language. One could described the parameters of the CWIC & SLIC generator functions as unparse rules. Generator functions are are akin to overloaded functions in c or c++. Only parameter matching is dynamic. The are defined is having a single entry point followed by one or more productions. A production having specific entry parameter requirements and a production body.

Using node names and tree pattern matching parameter they disassemble trees passed to them. I.e.

  (ADD[x,y])-> <production body>

The argument pattern recognizes an ADD tree having two branches that are placed into local variables x and y before executing the functions code production.

  (ADD[x,y],w)-> ...

Above the pattern calls for two arguments the first an ADD tree and the second w. Variables are not typed. Objects they contain carry their type. w may be a string in one invocation and an integer on different invocation.

Symbols, recognized by token formule, are automatically entered into the dictionary unless a conversion function is called prior to exiting. Supplied functions convert recognized sequences into typed objects. They may only be used as the last operation of a token formula.

Basic operations

The basic operations of Boolean calculus are as follows.

  • AND (conjunction), denoted xy (sometimes x AND y or Kxy), satisfies xy = 1 if x = y = 1 and xy = 0 otherwise.
  • OR (disjunction), denoted xy (sometimes x OR y or Axy), satisfies xy = 0 if x = y = 0 and xy = 1 otherwise.
  • NOT (negation), denoted ¬x (sometimes NOT x, Nx or !x), satisfies ¬x = 0 if x = 1 and ¬x = 1 if x = 0.

Alternatively the values of xy, xy, and ¬x can be expressed by tabulating their values with truth tables as follows.

JUNK

intro

The transformation aspects need explaining as they depend on the reductive analysis methodology.

Backup is avoided by the extensive use of factoring in the syntax equations.[4]

Compilers written in META II (including itself) compiled into assembly code for virtual stack machines. Interpreters for the virtual stack machines are written for each language. A generalized stack machine assembler was part of the META II package. The parsing language evolved adding look ahead and backtracking features. Code generation evolving into separate specialized sub-languages.

In 1968 TREE-META's parsing language used transformation constructs producing intermediate Abstract Parse Trees and separated code generation into a specialized tree-crawling unparse-rule sub-language producing assembly text files.

CWIC advanced code generation producing byte addressable binary code using lisp 2 proceedural generator actions and tree crawling unparse rule argument selection. Examples of IBM 360 machine code generation were demonstrated in a 1969 ACM SEGPLAN meeting.


What separates these parser programming languages from grammars input to a parser generator is their semantics. The semantic's of lexeme recognition is a feature in stark opposition to the readibility disadvantage claimed of scannerless parsing. Using grouping and ordering tests appropriately backtracking can be eliminated when parsing most ambiguous language constructs. Ordering alternatives is important as they are attempted in the order written. The first successful alternative is taking.

Taught others to use it. In 1975 - 76 SLIC was used to write a COBOL cross compiler and a TI 990 cross assembler.

edit 2 2

SLIC further improved code generation adding a MACHOP (machine operation) assembly instruction defining language and a PSEUDO macro like language. The generator language producing PSEUDO instructions called planting. Plant, planted, planting etc are terms originally used by CWIC to described code generation. PSEUDO instructions are programed procedural functions that call MACHOPs to output relocatable machine code. A PSEUDO instruction is not called immediately when planted. The PSEUDO functions entry point and arguments, are appended to a named section list. Sections are used to separate code by type. I.E. Data, code, constant etc. The ISO (In Sequence Optimizer) language optionally processes PSEUDO instruction lists before their execution. Sections are flushed to output code to an object file. Upon flushing a section's ISOs are run on the PSEUDO instructions. ISOs function in a manner simular to stream editors. marching PSEUDO patterns and replacing them them. Planted PSEUDO instructions are not text, but structured data. Flushing a section involves calling PSEUDO instruction from the sections list. Each PSEUDO list entry is an object structure that during flushing is removed from the list head, called and released freeing memory.

CWIC and SLIC are advanced syntax driven metacompilers whose parser programming languages, are designed to simplify the following aspects of compiler construction:

  1. lexical and syntactic analysis.
  2. Reporting of detected errors and recovery. Error recovery is programed skipping characters to a point were compilation may continue.
  3. Automated entry of symbol tokens into dictionary, or symbol tables,
  4. Construction of Abstract Parse Trees.

The code sequencing generator language is a LISP 2 dialect list processing, programming languages with features designed for:

  1. Special case code sequence recognition and optimization.
  2. Global optimization over the directed graph of a program.
  3. Constant expression evaluation at compile time.
  4. Common sub expression recognition and elimination.
  5. Mixed mode arithmetic.
  6. Undefined and duplicate symbols detection and reporting.
  7. Separation of instruction, data, constant etc into named sections.
  8. Features providing for creating multi-pass compilers.
  9. Stacked dictionaries that facilitate compilation of nested block structure languages.
  10. Attaching attributes to symbol facilitating the association of symbolic or numeric data with simbols in the dictionary.
  11. Interrogation and minipulation of symbol attributes.

The SLIC PSEUDO and MACHOP sub-languages are designed to:

  1. Separate actual target machine code generation out of the recursive tree crawling generator language.
  2. Simplify optimization of machine specific code sequances.
  3. Divide development effort into areas of expertise. Syntactic analysis, tree processing algorithms, code sequentialization, machine specific coding, and hardware instructions.

The MACHOP language further provides for defining assemblers having a common set of capabilities. All using a common linker.

As these languages progressed each generation included the syntax analysis feature of previous languages. The code generation features changed the most.

There is no easy way to fully describe the metalanguages in a simple ordered progression. It may take a bit of work to fully understand them.

Not all Schorre based metaprogramming languages followed his naming convention:

The term META "language" with META in capital letters is used to denote any compiler-writing language so developed.[4]

META II's syntax recognizing language is a subset of TREE-META's, CWIC's, and SLIC's syntax languages. They replace the code output constructs of META II with tree building operators. CWIC and SLIC add program controled backtracking, a dictionary and token recognition equations.

edit 3 3

I first learned of CWIC in 1969 at an L.A. ACM SEGPLAN meeting were Erwin Book gave a presentation on the CWIC compiler writing system. Erwin gave me a set of CWIC SYNTAX and GENERATOR language manuals. SDC released a short paper on CWIC in 1970. CWIC and SLIC have separate parsing and code generating sub-languages. The SYNTAX (parsing) language Is based on META II. It is extended, adding backtrack alternative operators, look ahead operators. The primitive text output constructs replaced by Abstract Parse Tree transformation operators. Token and character class rules were other innovative additions of the CWIC syntax language. Abstract Parse Trees are passed to generator functions having unparse rule disassembled arguments and LISP 2 procedural actions. The Abstract Parse Trees are lists whose first element is a node.

edit 4 4

They use a phrase structure reductive analytical grammar programming language to specify the parser for a language and compile to executable code. They produce a top down parser. Ambiguities can usually be avoided by factoring grammar rules. If not other language features provide extensive backtracking and look ahead. Backtracking is specific, controlled by using backtracking alternative.

edit 5 5

The grammar rule META "language" is a stack-oriented functional programming language. Each rule is a boolean function that analyzes the input, testing against grammer specifications, and if successful transforming it into a semantic equilivant abstract parse tree (META II the exception produced machine code for a pseudo stack machine). When unsuccessfull failure is returned and nothing is produced. Any partial production having been removed.

edit 6 6

Every piece of minuplatable data is an object. Objects are not defined in the languages but by the language. The actions of an unparse rule are written in a lisp 2 dialect. Using lisp terminology an atom or atomic object belongs to a class. The list object is an object container. A list class functions as a dynamic array of objects. The term element is used when refering to an atom or list. A list may contain any number of elements. Atoms may have attributes. All objects have a class. The class of an object deturmins its use. A list is a class. Class is used to describe built-in object types supported by the metacompiler languages. Some object classes have attributes. A symbol object may have a type attribute that defines it's use in a program being translated. Termonalgy can get confusing in describing a language defining language. The intent is to use class in talking about native object types hopefully distinguishing them from the target language data type.

It is the norm of high level programming languages to have libraries of support function. The MOL, Machine Orianted Language, is used to write the underlying library support function for the syntax and generator languages. SLIC combined all but the MOL language into a single compiler. The languages are really sub-languages each spicilized to perform a specific function. Like sections in a COBOL program. The symtax sub-language specifying the source language and transforming of it into list data structures easly processed by the generator or sequentilizer sub-language. The generator sub-language, generating equivalent sequential instruction. In computer science the term syntax is used in describing a language's structure above the word level. CWIC and later metacompilers specify a language fully from its character set. Words are defined using token rules. Token rules create atomic objects. Strings and numbers are simple atom class token objects. Symbol tokens are dictionary objects and may have attributes with values assigned to them. Lists are structured objects that contain objects. They are dynamic arrays of objects. A tree is a list whose first entry is a node object. A node object is a class used in identifying a tree. A node is a cross between a string and symbol class object. Like a string a node is not a dictionary entry. It is identified by its name string but may have attributes assigned to it like a symbol. The node name is a global identity but each node instance has its own unique attributes and values.


Symbols are dictionary entry tokens. The dictionary is a symbol table stack. A symbol table is an object that may be assigned to an attribute or variable. The dictionary stack provides layered symbol table entries required for block structured languages. Assigning a symbol table to an attribute could be used in implementing macro procedures or functions.


CWIC and SLIC have separate parsing and code generating sub-languages. The SYNTAX (parsing) language's are based on META II. They are extended adding backtrack alternative and look ahead operators. The primitive text output constructs in the SYNTAX equations of META II replaced by node and abstract parse trees transformation operators. Token and character class rules were other innovative additions of CWIC's syntax language. Abstract parse trees are passed to generator functions having unparse rule argument selecting parameters and LISP 2 procedural actions. TREE-META also transforms source code input into Trees passed unparse equations.

BNF

rules. BNF was created by John Backus as a specification metalanguage to be used in the specification of the ALGOL 60 language. Peter Naur edited the ALGOL report to include the BNF language specifications. Today Naur is given half the credit for BNF. Its general power is illistrated in section 2.3 defining comments:

ALGOL comment specifications
The sequence of Basic symbols is equivalent to
comment <any sequence not containing a ;> :
;
begin comment <any sequence not containing a ;>;
begin
end <any sequence not containing a end or ; or else>
end

Above commenting explained using textual descriptions contained in angle brakets, i.e. not a class name. Comments are not exactly in rule form. As grammar rewrite rules they would require multiple terminal symbols on the left. Thus not CFG rules. BNF uses the alternative, or symbol | to separate alternative language constructs. A sequence of linguistic classes and/or symbols is an implied AND. A linguistic variable is defined only once by a single linguistic equation.

<expr>:=<term>|<expr><aop><term>

The | BNF alternative (or), operator is not simple to explain. In a way it's like convoluted exclusive or forcing the longest match. An <expr> is a sequence of terms separated by <aop>s. Written as above it express algebraic precedence. The second alternative <expr><aop><term> with <expr> being to the left of <aop> implies its evaluation before <aop><term>. That is the expression:

a-b+c-d

is evaluated as if grouped:

(((a)-b)+c)-d

Each parenthesized group being an <expr>. This convoluted way of expressing a sequence is what lead to the development of look ahead parser generators. BNF may have both productive and analytical inturpetations. No programming language at that time required more then looking at the current lexeme. See operator precedence grammar and simple precedence grammar.

What Is A Scannerless Parser

This article fails in explaining scanerless parsers.

I present here parser programming languages that includes the complete grammar analysis to character level. Sorry if I go into to much detail. I use the term parser programming language because these are different then most parser generators today and the fact that they are truly programming languages that are also grammars. These parser programming languages are from the 1960's. They seam to be misunderstood and overlooked today.

In the last generation of these languagrs here are three recognition levels and three corresponding parser equation types.

Character class rules:

bin: '0'|'1';

oct: bin|'2'|'3'|'4'|'5'|'6'|'7';

dgt: oct|'8'|'9';

hex: dgt|'A'|'B'|'C'|'D'|'E'|'F'
     |'a'|'b'|'c'|'d'|'e'|'f';

upr: 'A'|'B'|'C'|'D'|'E'|'F'|'G'
    |'H'|'I'|'J'|'K'|'L'|'M'|'N'
    |'O'|'P'|'Q'|'R'|'S'|'T'|'U'
    |'V'|'W'|'X'|'Y'|'Z'

lwr: 'a'|'b'|'c'|'d'|'e'|'f'|'g'
    |'h'|'i'|'j'|'k'|'l'|'m'|'n'
    |'o'|'p'|'q'|'r'|'s'|'t'|'u'
    |'v'|'w'|'x'|'y'|'z';

let: upr|lwr;

alphanum: let | dgt;

symchr: alphanum | '_';

skip_class: 0h08|0h09|0h0A|0h0B
           |0h0C|0h0D|' '; 

The above is from SLIC (System of Languages for Implementing Compilers). Character classes are identical to CWIC (Compiler for Writing and Implementing Compilers) except for the skip_class rule being able to override the built in default skip_class. Character class rules generate a character class table indexed by a character's numeric code. Flags within the indexed entry deturmin the character's class memberships. A character class name is used in token and syntax equations to match any character belonging to the named class.

Below are some example token rules:

integer ..  ("0b"|"0B")           bin $bin MAKEBIN[]  // binary integer
           |("0o"|"0O")           oct $oct MAKEOCT[]  // octal integer
           |("0x"|"0X"|"0h"|"0H") hex $hex MAKEHEX[]  // hex integer
           |                      dgt $dgt MAKEINT[]; // decimal integer;

id .. let $symchr;

string .. '"' $(-'"' .any | """""",'"') """" MAKESTR();

char .. ''' .any ''' MAKESTR();

Token rules create token objects. Quoted strings in token rules are not normally kept as part of the token.

CWIC includes an implementation of LISP 2 for generator procedural blocks. SLIC also uses LISP 2 generator and PSEUDO procedural blocks. LISP 2 dynamic memory managed objects are part of these compiler writing languages. Token rules by default create and catalog symbol objects. Intercede, object creating, functions like MAKEINT[], MAKEBIN[], MAKEOCT[], MAKEHEX[], and MAKESTR[] bypass cataloging, creating typed objects. There are two push-down stacks directly manipulated by operators in the parser language. Token objects are pushed on the parse stack. The parse stack is used in creating a syntax tree. Characters recognized by quoted strings in syntax and token equations are not normally kept. A string may be preceeded by a +. +"ADD" would test for the string "ADD" and if matched an "ADD" string object would be created and pushed on the parse stack. A comma preceeding a string is not a test. The string object is simply pushed on the parse stack. Below the id rule accepts the same form as above except the '_' is not kept as part of the id.

id .. let $(alphanum|'_');

A_1 is then treated as the same symbol as A1. The _ is accepted but not kept.

id .. let $(alphanum|+'_');

Using +'_' keeps the _ as part of the symbol. The symchar being a single test is more efficient then the two separate tests (alphanum|+'_'). A character class test is a single test. The class bit mask is anded with the class_table indexed by the character's numeric code. if the result is not equal to zero it is considered a member of the character class. On an IA-86 ISA a test instruction is used.

id .. let $symchr;

Allows a symbol to end with an '_'. If we wish to only to allow a '_' between characters of a symbol we can define id as:

id .. let $(alphanum|+'_' alphanum);

In parsing when token rules are called and successful they leave an object on the parse stack. There are two stacks used in building a syntax tree. The parse stack is were The ! tree building operator combine parse stack entries with the top node stack entry creating a tree that is pushed on the pasre stack.

expr = term $(('+':ADD|'-':SUB)term!2);
term = factor $(('*':MPY|'/':DIV)factor!2);
factor = '(' expr ')'| integer | id;

The above parses a simple arithmetic expression and transforms it into a tree. Each equation is a test function. We may look at test as a boolean (success or failure) type. A test returns success or failure. A quited string is a test. A syntax equation is a boolean parsing expression test. There is two types a failure. In a sequence the first lexeme test failing is nonbacktracking transferring control to the next alternative or returning failure to a calling function or out of a grouped sequence. Failure of subsequent tests result in a backtrack failure. A backtrack alternative designated by \ Creates a backtrack stack frame befor starting it's left alternative. A backtrack failure resets the stack to the top backtrack frame and returns failure. The parse is reset to the saved parse state and failure returned. An initial backtrack frame is created befor calling the starting syntax equation. Returning failure to it is a compiler (internal) error.

Of note. Recognizing strings is done with a quoted string. The '+' string recognizes the single character string. Leading skip_class characters may be skipped when matching strings. When the '+' string is matched an ADD node is created by :ADD and pushed on the node stack. The preceeding $ (zero or more) operator is equilivant to the Kleene star. Token and lexems are different in these languages. A token is a lexem. But not all lexems are tokens. Tokens are recognized by token equations. A string matched by a quoted string test is a lexem but not a token. This is different then how the terms lexem and token are used today. The recognition of a token is by a token rule that creates a token object. A lexem is any recognized sequence of characters that is treated as a word of a language. A quoted string recognizes a sequence of characters that is not kept. The sequence '+':ADD recognizes the string '+' and :ADD creates an ADD node and pushes it on the node stack. No need to keep the '+' string.

SLIC was used to implement a COBOL cross compiler. The divisions of COBOL (68) have quite different lexical and parsing requirements. These parser programmer languages originated from Dewey Val Schorre's META II programming language.

The advantages listed below and in the article apply to these parser languages.

  • Only one metasyntax is needed
  • Non-regular lexical structure is handled easily
  • "Token classification" is unneeded which removes the need for design accommodations such as "the lexer hack" and language keywords (such as "while" in C)

I do not see how the below disadvantages would apply:

1. Since the lexical scanning and syntactic parsing processing is combined, the resulting parser tends to be harder to understand and debug for more complex languages

Just the opposite. The parser language is harmonious. Lexing, character class and token rules, usees a subset of the parsing rules. In fact the SLIC debugger simplified debuging. One could easily solve parsing problems. The rules are easier to read and understand then the common regular expressions of lexem generators
  • Most parsers of character-level grammars are nondeterministic

Need an example of this.

  • There is no guarantee that the resulting grammar is unambiguous

This is not specific to lexerless parsers.

From user page

The documented chronology of early Metacompiler development started with META I. META I was used to compile META II. META III, TREE-META, BOOK1, BOOK2, META 5 and CWIC quickly followed. Val Schorre was the primary developer of the metalanguage of "META" x and CWIC. Erwin Book worked on LISP 2 and CWIC. The code production of CWIC was a LISP 2 like language. Adding token recognition rules, backtracking and list making constructs to the parsing rules and combining the unparse rules with a full lisp language for code generation made for a very vary powerful compiler compiler. CWIC was developed at SDC a government think tank. It was made proprietary by SDC in the early 70s. I got my manual from Erwin Book at an ACM meeting. CWIC is a hard act to follow. But I think what I accomplished is another level of advancement.

But first I would like to explaine how parser of these meta compilers is programed. They use an analytical top down set of rules, Functions that are logical equations returning success or failure. A rule is like a boolean equation, making tests on the input stream. A sequence of tests, like a boolean AND, must all be successful for the sequence to be successful. Alternates are, like a boolean OR, tried on failure of a preceding alternate. But only when the input stream had not advanced. Ambiguity was not accepted in the early meta compilers. CWIC was the exception having a backtrack alternate. Backtracking was not automatic in CWIC. It had to be specified using the backtrack alternate operator \. A / was used as the non backtracking alternate in the Schorre meta compilers. The grammar of a simple arithmetic expression illustrated the basics of the analytical rules:

expr = term $(('+' / '-' ) term);
term = factor $(('*'/'/')factor);
factor = '(' expr ')'/ number / id;

The above defines the arithmetic precedence of operators. Unlike parser generators the above does not build any thing. The "$" operators defines a repeating sequence. It means zero or more of the following. Prenthies () group sequences as they would in arithmetic expressions. Operators must be used and output productions must be coded in the parsing rules that transform parsed constructs and calling code generating tests. Yes code generation may return failure. The CWIC syntax is used.

expr = term (( '+':ADD/ '-':SUB) term !2);
term = factor (( '*':MPY/ '/':DIV ) factor!2);
factor = '(' expr ')' / number / id;

Now with tree building operators a recognized expression will result in a functional expression tree being produced. The ":" operator places a node on the node stack, containing nodes in the reverse sequence the are produced by the ":" operator. The "!" operator builds a tree. Removing the top most node from the node stack and some number of parse stack entries, creating a list whose first element is the node and the following elements are from the parse stack in the order they were recognized. The list is pushed onto the parse stack. The numeric argument following the ! specifies the number of elements taken from the parse stack. Tokens are recognized by token rules. In CWIC token rules are coded in the grammar. In earlier compilers they were built in rules.

Early Meta Compilers progression started with META I that was used to write META II. META II was used to write a small ALGOL compiler. Both META I and META II were written by Val D. Schorre.

META I and META II: written by who? Bgoldnyxnet (talk) 04:39, 7 November 2014 (UTC)
Dewey Val Schorre.Steamerandy (talk) 18:34, 5 November 2015 (UTC)
I did maintenance on CWIC up until 1975, then in 1982-3 wrote a version called Swift that compiled to "c." It was never classified. It was proprietary to SDC (later bought by Unisys). Bgoldnyxnet (talk) 04:39, 7 November 2014 (UTC)


SLIC implemented the full set of features of CWIC and extended it. The extensions added machine description and pseudo code languages. With an ISO(In Sequence Optimizer) transforming language operating on pseudo code sequences. The extensions separated the generation of machine code for specific processors to defining pseudo instructions using the processing power of the generator production list processing language. machine instructions defined in the MACHOP language were used in the pseudo code procedures.

The additions decreased and simplified the work necessary to target different processor instruction sets. The generator productions generated machine independent pseudo instructions that could be defined specificity for the programming language being compiled. This made the code generation more readable and easier to; maintain. The pseudo coding is like coding an assembly macro instruction. Target machine assembly code instructions, defined in the MACHOP sub-language made writing PSEUDO code easy and readable.

PSEUDO code could be coded by experienced assembly programmers. These programmers need not be experienced compiler writers.

In SLIC the languages are sub-languages. They are all compiled by a single compiler and may be intermixed in source files.

The SLIC linker combines separately compiled object files and converts them to the target system load format.

The sub-languages of SLIC:

  • Syntax. Top down grammar analyzer language. Transforming input grammar into functional notation tree structures.
  • Generator. Unparse tree crawling rules combined with procedural block structured list processing code generators.
  • ISO (In Sequence Optimizer), sequencial pattern transform rules operating on pseudo code before it's expansion.
  • PSEUDO instructions, A (macro like) procedure defining machine code output.
  • MACHOP a target machine instruction specification Specifying the opcode mnemonic(s) operands and binary emitter.


______________________________________

atom element

  1. ^ Cite error: The named reference Metcalfe1 was invoked but never defined (see the help page).
  2. ^ Cite error: The named reference Ledleyl was invoked but never defined (see the help page).
  3. ^ Cite error: The named reference Glenniel was invoked but never defined (see the help page).
  4. ^ a b c Cite error: The named reference METAII was invoked but never defined (see the help page).
  5. ^ Cite error: The named reference SMALGOL was invoked but never defined (see the help page).
  6. ^ Cite error: The named reference META1 was invoked but never defined (see the help page).
  7. ^ Cite error: The named reference Schmidt1 was invoked but never defined (see the help page).
  8. ^ Cite error: The named reference Rutman1 was invoked but never defined (see the help page).
  9. ^ Cite error: The named reference Schneiderl was invoked but never defined (see the help page).
  10. ^ Cite error: The named reference Oppenheim was invoked but never defined (see the help page).
  11. ^ Cite error: The named reference Kirkleyl was invoked but never defined (see the help page).
  12. ^ Cite error: The named reference George was invoked but never defined (see the help page).


Cite error: There are <ref group=*> tags on this page, but the references will not show without a {{reflist|group=*}} template (see the help page).