RegEx: Finite Automata

Deterministic Finite Automata (DFA)

Lexical Analysis(Scanner)
Read input: one character at a time
Group characters into tokens
Remove white spaces and comments
Encode token types and form tuples and return to parser

type value
FloatConst 123.45
VAR String=DaysOfWeek
Operator Value =+

Regular expression r (a pattern of characters) and its language L(r)
-> used in many Unix programs (e.g., grep, vi., etc.)
State machine (finite automata)

Representations of String
Basics of Regular Expression
Σ= AaBbCcDdEeFf…. {}|[]”;’<>?,./

Symbols & alphabet
-> A symbol is a valid character in a language
-> alphabet is set of legal symbols

Metacharacters/metasymbols that have special meanings:
Defining reg-ex operations(e.g. |,(,), *, + etc.)
Escape character(\) to turn off special meanings
Empty string e and empty set

a | b* = {a, ε, b, bb, bbb, …}
(a | b)* = {ε, a, b, ab, ba, aa, bb, aaa, aab, …} = Any number (including zero) of a’s and b’s in any order

unix-style regular expression
a-b same as abcd
^1-3 denotes characters other than 1, 2, or 3
-, [], +, ?, ^ and $
[a-zA-Z_][a-zA-Z_0-9]*
[0-9][0-9]*
(+|-)?[0-9][0-9]*