Lexical Analysis (Cont.)


The following figure shows the steps of lexical analyzer construction:
Regular Expressions
A regular expression is a rule describing the set of tokens in source files.

Finite Automata (FA)
An FA consists of a finite set of states and a set of transitions from state to state that occur on input symbols chosen from an alphabet Σ. One state is the initial state, in which the automaton starts. Some states are designated as final or accepting states. An FA is a 5-tuple (S, Σ, T, s0, A), where
    • S: a set of states,
    • Σ: an alphabet,
    • s0 in S: the start state,
    • A⊆S: the set of accepting states, and
    • T: the transition function mapping S×Σ∪{ε} to S.
Nondeterministic Finite Automata (NFA)
It is an FA allows zero, one, or more transitions from a state on the same input symbols.

Deterministic Finite Automata (DFA)
A DFA is a special case of an NFA in which
  • no state has a transition on input ε (an empty string), and
  • for each state s and input symbol a, there is at most one edge labeled a leaving s.