scc

simple C compiler
git clone git://git.2f30.org/scc
Log | Files | Refs | README | LICENSE

ir.md (8892B)


      1 # scc intermediate representation #
      2 
      3 The scc IR tries to be be a simple and easily parseable intermediate
      4 representation, and it makes it a bit terse and cryptic. The main
      5 characteristic of the IR is that all the types and operations are
      6 represented with only one letter, so parsing tables can be used
      7 to parse it.
      8 
      9 The language is composed of lines, representing statements.
     10 Each statement is composed of tab-separated fields.
     11 Declaration statements begin in column 0, expressions and
     12 control flow begin with a tabulator.
     13 When the frontend detects an error, it closes the output stream.
     14 
     15 ## Types ##
     16 
     17 Types are represented with uppercase letters:
     18 
     19 * C -- signed    8-Bit integer
     20 * I -- signed   16-Bit integer
     21 * W -- signed   32-Bit integer
     22 * Q -- signed   64-Bit integer
     23 * K -- unsigned  8-Bit integer
     24 * N -- unsigned 16-Bit integer
     25 * Z -- unsigned 32-Bit integer
     26 * O -- unsigned 64-Bit integer
     27 * 0 -- void
     28 * P -- pointer
     29 * F -- function
     30 * V -- vector
     31 * U -- union
     32 * S -- struct
     33 * B -- bool
     34 * J -- float
     35 * D -- double
     36 * H -- long double
     37 
     38 This list has been built for the original Z80 backend, where 'int'
     39 has the same size as 'short'. Several types (S, F, V, U and others) need
     40 an identifier after the type letter for better differentiation
     41 between multiple structs, functions, vectors and unions (S1, V12 ...)
     42 naturally occuring in a C-program.
     43 
     44 ## Storage classes ##
     45 
     46 The storage classes are represented using uppercase letters:
     47 
     48 * A -- automatic
     49 * R -- register
     50 * G -- public (global variable declared in the module)
     51 * X -- extern (global variable declared in another module)
     52 * Y -- private (variable in file-scope)
     53 * T -- local (static variable in function-scope)
     54 * M -- member (struct/union member)
     55 * L -- label
     56 
     57 ## Declarations/definitions ##
     58 
     59 Variable names are composed of a storage class and an identifier
     60 (e.g. A1, R2, T3).
     61 Declarations and definitions are composed of a variable
     62 name, a type and the name of the variable:
     63 
     64 	A1	I	maxweight
     65 	R2	C	flag
     66 	A3	S4	statstruct
     67 
     68 ### Type declarations ###
     69 
     70 Some declarations (e.g. structs) involve the declaration of member
     71 variables.
     72 Struct members are declared normally after the type declaration in
     73 parentheses.
     74 
     75 For example the struct declaration
     76 
     77 	struct foo {
     78 		int i;
     79 		long c;
     80 	} var1;
     81 
     82 generates
     83 
     84 	S2      foo     (
     85 	M3      I       i
     86 	M4      W       c
     87 	)
     88 	G5      S2      var1
     89 
     90 ## Functions ##
     91 
     92 A function prototype
     93 
     94 	int printf(char *cmd, int flag, void *data);
     95 
     96 will generate a type declaration and a variable declaration
     97 
     98 	F5	P	I	P
     99 	X1	F5	printf
    100 
    101 The first line gives the function-type specification 'F' with
    102 an identifier '5' and subsequently lists the types of the
    103 function parameters.
    104 The second line declares the 'printf' function as a publicly
    105 scoped variable.
    106 
    107 Analogously, a statically declared function in file scope
    108 
    109 	static int printf(char *cmd, int flag, void *data);
    110 
    111 generates
    112 
    113 	F5      P       I       P
    114 	T1      F5      printf
    115 
    116 Thus, the 'printf' variable  went into local scope ('T').
    117 
    118 A '{' in the first column starts the body of the previously
    119 declared function:
    120 
    121 	int printf(char *cmd, int flag, void *data) {}
    122 
    123 generates
    124 
    125 	F5      P       I       P
    126 	G1      F5      printf
    127 	{
    128 	A2      P       cmd
    129 	A3      I       flag
    130 	A4      P       data
    131 	-
    132 	}
    133 
    134 Again, the frontend must ensure that '{' appears only after the
    135 declaration of a function. The character '-' marks the separation
    136 between parameters and local variables:
    137 
    138 	int printf(register char *cmd, int flag, void *data) {int i;};
    139 
    140 generates
    141 
    142 	F5      P       I       P
    143 	G1      F5      printf
    144 	{
    145 	R2      P       cmd
    146 	A3      I       flag
    147 	A4      P       data
    148 	-
    149 	A6      I       i
    150 	}
    151 
    152 ### Expressions ###
    153 
    154 Expressions are emitted in reverse polish notation, simplifying
    155 parsing and converting into a tree representation.
    156 
    157 #### Operators ####
    158 
    159 Operators allowed in expressions are:
    160 
    161 * \+ -- addition
    162 * \- -- substraction
    163 * \* -- multiplication
    164 * % -- modulo
    165 * / -- division
    166 * l -- left shift
    167 * r -- right shift
    168 * < -- less than
    169 * > -- greather than
    170 * ] -- greather or equal than
    171 * [ -- less or equal than
    172 * = -- equal than
    173 * ! -- different than
    174 * & -- bitwise and
    175 * | -- bitwise or
    176 * ^ -- bitwise xor
    177 * ~ -- bitwise complement
    178 * : -- asignation
    179 * _ -- unary negation
    180 * c -- function call
    181 * p -- parameter
    182 * . -- field
    183 * , -- comma operator
    184 * ? -- ternary operator
    185 * ' -- take address
    186 * a -- logical shortcut and
    187 * o -- logical shortcut or
    188 * @ -- content of pointer
    189 
    190 Assignation has some suboperators:
    191 
    192 * :/ -- divide and assign
    193 * :% -- modulo and assign
    194 * :+ -- addition and assign
    195 * :- -- substraction and assign
    196 * :l -- left shift and assign
    197 * :r -- right shift and assign
    198 * :& -- bitwise and and assign
    199 * :^ -- bitwise xor and assign
    200 * :| -- bitwise or and assign
    201 * :i -- post increment
    202 * :d -- post decrement
    203 
    204 Every operator in an expression has a type descriptor.
    205 
    206 #### Constants ####
    207 
    208 Constants are introduced with the character '#'. For instance, 10 is
    209 translated to #IA (all constants are emitted in hexadecimal),
    210 where I indicates that it is an integer constant.
    211 Strings are a special case because they are represented with
    212 the " character.
    213 The constant "hello" is emitted as "68656C6C6F. For example
    214 
    215 	int
    216 	main(void)
    217 	{
    218 		int i, j;
    219 
    220 		i = j+2*3;
    221 	}
    222 
    223 generates
    224 
    225 	F1
    226 	G1	F1	main
    227 	{
    228 	-
    229 	A2      I	i
    230 	A3      I	j
    231 		A2	A3	#I6	+I	:I
    232 	}
    233 
    234 Type casts are expressed with a tuple denoting the
    235 type conversion
    236 
    237         int
    238 	main(void)
    239 	{
    240 		int i;
    241 		long j;
    242 
    243 		j = (long)i;
    244 	}
    245 
    246 generates
    247 
    248 	F1
    249 	G1      F1      main
    250 	{
    251 	-
    252 	A2      I       i
    253 	A3      W       j
    254 	        A2      A3      WI      :I
    255 	}
    256 
    257 ### Statements ###
    258 #### Jumps #####
    259 
    260 Jumps have the following form:
    261 
    262 	j	L#	[expression]
    263 
    264 the optional expression field indicates some condition which
    265 must be satisfied to jump. Example:
    266 
    267 	int
    268 	main(void)
    269 	{
    270 		int i;
    271 
    272 		goto    label;
    273 	label:
    274 		i -= i;
    275 	}
    276 
    277 generates
    278 
    279 	F1
    280 	G1      F1      main
    281 	{
    282 	-
    283 	A2	I	i
    284 		j	L3
    285 	L3
    286 		A2	A2	:-I
    287 	}
    288 
    289 Another form of jump is the return statement, which uses the
    290 letter 'y' followed by a type identifier.
    291 Depending on the type, an optional expression follows.
    292 
    293 	int
    294 	main(void)
    295 	{
    296 		return 16;
    297 	}
    298 
    299 generates
    300 
    301 	F1
    302 	G1	F1	main
    303 	{
    304 	-
    305 		yI	#I10
    306 	}
    307 
    308 
    309 #### Loops ####
    310 
    311 There are two special characters that are used to indicate
    312 to the backend that the following statements are part of
    313 a loop body.
    314 
    315 * b -- beginning of loop
    316 * e -- end of loop
    317 
    318 #### Switch statement ####
    319 
    320 Switches are represented using a table, in which the labels
    321 where to jump for each case are indicated. Common cases are
    322 represented with 'v' and default with 'f'.
    323 The switch statement itself is represented with 's' followed
    324 by the label where the jump table is located, and the
    325 expression of the switch:
    326 
    327 	int
    328 	func(int n)
    329 	{
    330 		switch (n+1) {
    331 		case 1:
    332 		case 2:
    333 		case 3:
    334 		default:
    335 			++n;
    336 		}
    337 	}
    338 
    339 generates
    340 
    341 	F2	I
    342 	G1	F2	func
    343 	{
    344 	A1	I	n
    345 	-
    346 		s	L4	A1	#I1	+I
    347 	L5
    348 	L6
    349 	L7
    350 	L8
    351 		A1	#I1	:+I
    352 		j	L3
    353 	L4
    354 		t	#4
    355 		v	L7	#I3
    356 		v	L6	#I2
    357 		v	L5	#I1
    358 		f	L8
    359 	L3
    360 	}
    361 
    362 The beginning of the jump table is indicated by the the letter 't',
    363 followed by the number of cases (including default case) of the
    364 switch.
    365 
    366 ## Resumen ##
    367 
    368 * C -- signed    8-Bit integer
    369 * I -- signed   16-Bit integer
    370 * W -- signed   32-Bit integer
    371 * O -- signed   64-Bit integer
    372 * M -- unsigned  8-Bit integer
    373 * N -- unsigned 16-Bit integer
    374 * Z -- unsigned 32-Bit integer
    375 * Q -- unsigned 64-Bit integer
    376 * 0 -- void
    377 * P -- pointer
    378 * F -- function
    379 * V -- vector
    380 * U -- union
    381 * S -- struct
    382 * B -- bool
    383 * J -- float
    384 * D -- double
    385 * H -- long double
    386 * A -- automatic
    387 * R -- register
    388 * G -- public (global variable declared in the module)
    389 * X -- extern (global variable declared in another module)
    390 * Y -- private (variable in file-scope)
    391 * T -- local (static variable in function-scope)
    392 * M -- member (struct/union member)
    393 * L -- label
    394 * { -- beginning of function body
    395 * } -- end of function body
    396 * \\ -- end of function parameters
    397 * \+ -- addition
    398 * \- -- substraction
    399 * \* -- multiplication
    400 * % -- modulo
    401 * / -- division
    402 * l -- left shift
    403 * r -- right shift
    404 * < -- less than
    405 * > -- greather than
    406 * ] -- greather or equal than
    407 * [ -- less or equal than
    408 * = -- equal than
    409 * ! -- different than
    410 * & -- bitwise and
    411 * | -- bitwise or
    412 * ^ -- bitwise xor
    413 * ~ -- bitwise complement
    414 * : -- asignation
    415 * _ -- unary negation
    416 * c -- function call
    417 * p -- parameter
    418 * . -- field
    419 * , -- comma operator
    420 * ? -- ternary operator
    421 * ' -- take address
    422 * a -- logical shortcut and
    423 * o -- logical shortcut or
    424 * @ -- content of pointer
    425 * :/ -- divide and assign
    426 * :% -- modulo and assign
    427 * :+ -- addition and assign
    428 * :- -- substraction and assign
    429 * :l -- left shift and assign
    430 * :r -- right shift and assign
    431 * :& -- bitwise and and assign
    432 * :^ -- bitwise xor and assign
    433 * :| -- bitwise or and assign
    434 * ;+ -- post increment
    435 * ;- -- post decrement
    436 * j -- jump
    437 * y -- return
    438 * b -- begin of loop
    439 * d -- end of loop
    440 * s -- switch statement
    441 * t -- switch table
    442 * v -- case entry in switch table
    443 * f -- default entry in switch table