First Set Algorithm

for each nonterminal X do First(X) :={}
while there are changes to any First(X) do
	for each production rule X ->X1X2...Xn do
		k:=1;
		while k<=n do
			First(X) = First(X) U (First(Xk)-{ε})
			if ε is not in First(Xk) then break;
			k := k+1;
		if (k>n) then First(X) = First(X) U {ε} 

follow sets

stmt -> if-stmt|other
if-stmt -> if(exp)stmt else-part
else-part -> else stmt | ε
exp -> 0 | 1

Regular Expressions

Regular Expressions are used in grep, sed, awk, perl, vi, shells, lex, yacc
– Each may use slightly different convention

State Machines

Parser Classification
LR: Bottom Up Parser
-L: scan left to right
-R: traces rightmost derivation of input string

LL: Top Down Parser
-L: Scan left to right
-L: traces leftmost derivation of input string

Typical notation
– LL(0), LL(1), LR(1), LR(k)
– number(k) refers to maximum look ahead

Recursive Descent Parsing
Non-terminal variable
convert
correct select rule for expansion matchToken
consumes token or Error

EBNF to BNF
 ::=  '*'  | 
EBNF
 ::=  {'*'  }
BNF
 ::= 
 ::= '*'   | ε

RegEx: Finite Automata

Deterministic Finite Automata (DFA)

Lexical Analysis(Scanner)
Read input: one character at a time
Group characters into tokens
Remove white spaces and comments
Encode token types and form tuples and return to parser

type value
FloatConst 123.45
VAR String=DaysOfWeek
Operator Value =+

Regular expression r (a pattern of characters) and its language L(r)
-> used in many Unix programs (e.g., grep, vi., etc.)
State machine (finite automata)

Representations of String
Basics of Regular Expression
Σ= AaBbCcDdEeFf…. {}|[]”;’<>?,./

Symbols & alphabet
-> A symbol is a valid character in a language
-> alphabet is set of legal symbols

Metacharacters/metasymbols that have special meanings:
Defining reg-ex operations(e.g. |,(,), *, + etc.)
Escape character(\) to turn off special meanings
Empty string e and empty set

a | b* = {a, ε, b, bb, bbb, …}
(a | b)* = {ε, a, b, ab, ba, aa, bb, aaa, aab, …} = Any number (including zero) of a’s and b’s in any order

unix-style regular expression
a-b same as abcd
^1-3 denotes characters other than 1, 2, or 3
-, [], +, ?, ^ and $
[a-zA-Z_][a-zA-Z_0-9]*
[0-9][0-9]*
(+|-)?[0-9][0-9]*

semantic symbols

-> #put-type;#do-decl
-> int | float
-> #add-decl
-> #add-decl
-> ID#proc-decl
#put-type puts given type on semantic stack
#proc-decl builds decl record for var on stack
#add-decl builds decl-chain
p#do-decl traverses chain on semantic stack using backwards pointers entering each var into symbol table

float a, b, c

translation: generate temporary values, propagate them to keep semantic context

Compiler Structure
scanner -> parser -> semantic analysis -> intermdeiate representation -> code generation -> assembly code

semantic actions

Symbol Table

int a, b;
declares a and b, within current scope, type integer

Basic symbol table
name, type, scope
a, int, “main”

semantic action
1. enter variable declaration into symbol table
2. look up variables in symbol table
3. do binding of looked-up variables (scoping rules, etc.)
4. Do type checking for compatibility
5. keep the semantic context of processing
e.g. a + b + c -> t1 = a + b
t2 = t1 + c

How Compilers Work

Start
-parse
-get next token
-semantic analysis
-look up variable name
-do semantic checks -> generate code(passed) or semantic error(failed)
-put in symbol table

Front End: Analysis
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis

Back End: Systhesis
4. Code Generation
5. Optimization

Lexical rules of language dictate how legal word is formed by concatenating alphabet.

If x==1 then if y==2 print 1 else print 2
stmt -> if expr then stmt else stmt

why compilers?

High-level Language

C/Java Compiler

temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;

Fortran Compiler

TEMP = V(K)
V(K) = V(K+1)
V(K+1) = TEMP

Assembly Language

lw $to, 0($2)
lw $t1, 4($2)
sw $t1, 0($2)
sw $to, 4($2)
#include

int main()
{
	char A = '2';
	int sum = A + 10;
	std::cout<<"sum ="<
		Posted on Categories Compiler