Quick start with flex/bison

Soheyb Merah
4 min readMay 16, 2020
Effective Flex & Bison Paperback — May 23, 2018

In this article we will cover the basics of flex and bison, I will use images to not give you chance to copy it, i really want you to write the code so you can get a better understanding, make sure you have a good knowledge of C

Let’s get started.

Flex is a fast lexical analyser generator. It is a tool for generating programs that perform pattern-matching on text.

Bison is a general purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar.

Obviously we need both bison and flex, add them to your environment path.

flex files are defined by .l and bison by .y.

We will start with flex go and make one (e.g example.l), open it with any editor you like.

Flex file structure has 3 sections separated by %% :

  • section 1: declaration part, define variables and import libraries.
  • section 2: pattern and action part,for patter is defined by regex if you don’t know what it is you should read this article made by jonny fox, by default any text not matched by a flex scanner is copied to the output.
  • section 3: main part, the entry point.

write the following code:

example.l

Section 1

When the scanner receives an end-of-file indication from YY_INPUT, it then checks the yywrap function. If yywrapreturns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. For extra reads check this page.

To provide our own version of yywrap we need to undefine the macro then re-define it.

Section 2

[a-zA-Z] — match one character witheither lower or higher case, if found print in console it’s a letter.

[a-zA-Z]+ — match one or more character with either lower or higher case, if found print in console it’s a word.

[0–9] — match one digit, if found print in console it’s a digit.

[0–9]+ — match one digit or more, if found print in console it’s a number.

Section 3

As you probably know function main() is the C entry point, yylex() is the lex scanner entry point so start scanning.

And now we write the code now we want the result, in same working directory open up a terminal and run flex <filename>.l (e.g flex example.l ) this will generate a new file lex.yy.c which contains the code in c of the lexical analyzer (open it up and explore it ^^), then we need to compile it with gcc command gcc lex.yy.c you can add -o to change the out name as the default one is a gcc -o program lex.yy.c run your program and play around with it.

NEXT LEVEL

We will use both flex and bison, let’s start with bison.

Bison file structure has also 3 sections separated by %%
but we will only use 2 :

  • section 1: declaration part, define variables and import libraries.
  • section 2: grammars part, contain the rule for the parse.

write the following code:

A non-terminal symbol in the formal grammar is represented in Bison input as an identifier, like an identifier in C. By convention, it should be in lower case.
A terminal symbol is also called a token kind. Token kinds as well can be represented as C-like identifiers. By convention, these identifiers should be upper case to distinguish them from non-terminals.

our stream should have either a max or a min token if match we call MAX/MIN and execute the content in this case print a message and exit.
The %token MAX MIN exporst the tokens so we can use it in flex .

Run bison -d example.y this will generate example.tab.c example.tab.h

We are done with bison now with flex, write the following code:

example.l

We import the tokens from the generated file and use it in action part.
The new thing here is the yyerror() function it is used to customize the the error message.

Run flex example.l this will generate lex.yy.c
We compile them both gcc lex.yy.c example.tab.c .

If you write max and press enter you should see the message.

A book flex & bison: Text Processing Tools wrote by John Levine describe how this work under the hood:

Bison creates the C program by plugging pieces into a standard skeleton file. The rules are compiled into arrays that represent the state machine that matches the input tokens.

That all for today, I Hope this will give give you kick-start to compiler world.

--

--

Soheyb Merah

Computer Science Student enthusiast, who loves trying out new stuff, and contributing in Open Source Project, as I belive that teamwork is the key to success.