Post

Pascal-S Compiler Implementation Analysis

I. About Pascal-S and Its Compiler

Pascal-S is a true subset of Pascal, developed by computer scientist N. Wirth. It retains most of Pascal’s functionality, supporting integer, real, boolean, and character types, plus arrays, records, user-defined types, functions, procedures, value/var parameters, and recursion.

The compiler is written in Pascal with ~50 functions covering lexical analysis, syntax analysis, code generation, error handling, and interpretation. Unlike standard Pascal, Pascal-S compiles to intermediate code (P-code) which is then interpreted.

II. Table Management

The compiler uses several tables: symbol table (tab), block table (btab), array table (atab), real constant table (rconst), string constant table (stab), and code table (code).

2.1 Symbol Table (tab) — stores identifier information: name, link (to previous identifier in same block), obj (kind: constant/function/variable), typ (data type), ref (index to atab/btab), normal, lev (nesting level), adr (address).

2.2 Block Table (btab) — one entry per program block (records are also blocks). Stores last, lastpar, psize, vsize.

2.3 Array Table (atab) — one entry per array. Multi-dimensional arrays are arrays of arrays. Stores inxtyp, eltyp, elref, low, high, elsize, size.

III. Lexical Analysis

51 symbols defined in a TYPE. The analyzer reads characters from a text file, recognizing them as one of the 51 symbols. Key functions: nextch (read next char), insymbol (read next symbol), readscale (read float exponent), adjustscale (adjust real). Uses line buffering and binary search for keywords (pre-sorted).

IV. Syntax Analysis

Uses syntax-directed translation with top-down recursive descent. Code generation and error handling are integrated with parsing.

The grammar is described in an ANSI-C-like notation covering Program, Block, Declaration, ConstDeclaration, TypeDeclaration, VariableDeclaration, Statement, Expression, SimpleExpression, Term, Factor, Constant, Identifier, and Figure.

V. Code Generation and Interpretation

Generates 59 types of quadruple-like intermediate instructions stored in the code array. A stack-based virtual machine executes them with a stack data segment, code segment, and registers (allocation pointer, instruction pointer, instruction register, base pointer).

For each Block, when called, five slots are allocated for: static link, dynamic link, return address, function return value, and procedure name position in the symbol table.

VI. Afterword

After half a month reading PL0 and Pascal-S source code, I finally got a basic understanding of compilers. Not knowing Pascal initially made it harder, but after learning the language, things clicked. PL0 was simpler and I traced through execution. Pascal-S was more complex — I had to guess in many places.

Despite errors in my understanding, the gains were significant. Compiler theory from class finally made sense after studying real compiler source code.

This post is licensed under CC BY 4.0 by the author.