代写Part II : AST builder + Semantic Analyser代做留学生Java程序

Part II : AST builder + Semantic Analyser


The goal of part II is to implement the rest of the front-end all the way to semantic analysis. This will involve modifying your parser so that it can build the Abstract Syntax Tree (AST) corresponding to your the input program and then perform. semantic analysis.

In order to achieve this goal, you will have to perform. several tasks. First, you will have to follow the abstract grammar specification and design the Java classes that represent the AST as seen during the course. Then, you should write an AST printer in order to output the AST into a file. Thirdly, you will have to modify your parser so that it builds the AST as your are parsing the tokens. Finally, you will be able to perform. semantic analysis.

Note that we highly recommend following an iterative approach where you add AST nodes one by one, extend the printer and modify your parser as you go. We also encourage you to write small test programs that test every AST node as you build them rather than trying to implement everything at once. If you encounter any problem, have any questions or find a bug with the newly provided files, please post a note on the online forum.

Tips

Don't try to support the whole language at once, start small with a subset of the grammar. For instance, starts with a subset of the grammar that could just handle a main function returning 0, massage the grammar, add the AST nodes and update your parser. Then grow from there by supporting step by step a larger subset of the grammar.

Each analysis pass you write should have only a single purpose. It is much easier to write many small passes, each performing a simple task, than one big one trying to do everything at once (for instance write one pass just for checking assignment). Do not worry about efficiency, even if your compiler requires the application of 100 passes to reach its goal, this is absolutely fine and will make your task of writing the compiler easier.

Make use of the debugger from your IDE to chase bugs. If you have never use a debugger before, now is the time to learn: IntelliJ debugger tutorial

0. Setup

You will have to pull the AST class nodes and abstract grammar from the main repository.

If you wish to start Part 2 before the deadline of Part 1, execute the following commands in a separate branch! Otherwise, if you merge with your master branch and by accident push your changes to your remote repository, the auto-testing for Part 1 will fail. You must first create a new branch in your repository and switch to that branch (instructions are given below).

We advise you to first read this tutorial about managing branches in git. First, open a terminal and navigate to the root of your local repository.

Note that a new main file, Main2.java is provided. You should use this class as the main entry point to the compiler from now on.

0.1 Creating separate branch (if starting before deadline for part 1)

If you are starting part 2 after the deadline for part 1 you can skip to 0.2.

If you are starting part 2 before the deadline for part 1, we will first create a new local branch. Type:

$ git branch part2



Then, let's checkout the new local branch:

$ git checkout part2



And finally, let's push the branch to your git remote repository:



$ git push -u origin test



From now on, whenever you commit or push, this will happen in your newly created part2 branch.



0.2 Bringing in new updated skeleton compiler code

We are now ready to bring in the new skeleton code from the instructor (cdubach) repository.

Type:


$ git pull [email protected]:cdubach/comp520-coursework-w20XX.git



where XX is this year (e.g. 22 for 2022). This will cause some merge conflict(s) due to the change of the return type of some of the parse functions to return an AST node instead of void. For instance:



From gitlab.cs.mcgill.ca:cdubach/comp520-coursework-w20XX

* branch               HEAD            -> FETCH_HEAD

Auto-merging src/parser/Parser.java

CONFLICT (content): Merge conflict in src/parser/Parser.java



where XX is this year (e.g. 22 for 2022). Here, the file Parser.java is causing a merge conflict. In order to resolve it, you should open the file to fix the conflict. For the parser, you'd possibly want to remove everything between the equals symbols and the greater than symbols, e.g.



=======

public Program parse() {

>>>>>>> 92a7665c3dde600e1bd2d5681b2fc8fb43e1d37b



Thereafter you can continue to extend your solution. You can safely commit and push your changes as these will be pushed on the branch part2.

0.3 Merging your changes into your master branch (if started before deadline for part 1)

If you have created a new local branch for part 2, and once the deadline for part 2 has passed, it is time to bring back these changes into your master branch. This will ensure that the automarker will be able to test your part 2. Make sure to have committed all your changes from your local branch before executing the following.

To come back to the master branch, type:



git checkout master



Then, you can merge your changes from your local branch part2 into master as follows:

git merge part2



From that point on, you are back in working onto master. When you will push any of your changes from master to your remote repository, these will be picked up by the automarker.

If you encouter any issues with this, please make sure to post on the discussion forum or attend the TA office hours.

1. Massaging the grammar (Operator precedence and associativity)

As seen in the lecture, when building the AST it is important to ensure that the correct operator precedence and associativity rules are followed. To this end, you should start again from the initial concrete syntax grammar and update it.

You should make sure the resulting grammar is non-ambiguous, eliminate left recursion and ensure that the usual C precedence and associativity rules for operators are respected based on the table below.

As see in the lecture, left-associative binary operators should be handled using an iterative approach in the parser (rather than recursion). We suggest that you express these in the grammar using a Kleene closure, which will directly translate to a loop in your parser code.

Precedence

Operator

Description

Associativity

1

()

Function call

Left-to-right

1

[]

Array subscripting

Left-to-right

1

 

.

Structure member access

Left-to-right

2

+

Unary plus

Right-to-left

2

 

-

Unary minus

Right-to-left

2

(type)

Type cast

Right-to-left

2

*

Indirection

Right-to-left

2

&

Address of

Right-to-left

3

* / %

Multiplication, division, remainder

Left-to-right

4

+ -

Addition, subtraction

Left-to-right

5

< <= > >=

Relational operators

Left-to-right

6

== ! =

Relational operators

Left-to-right

7

&&

Logical AND

Left-to-right

8

⎮⎮

Logical OR

Left-to-right

9

=

Assignment

Right-to-left


Here is how to "interpret" the following piece of C code based on precedence and associativity:



array[1][2] // (array[1])[2]

mystruct.field[1] // (mystruct.field)[1]

2*3+4 // (2*3)+4

2+3*4 // 2+(3*4)

&*ptr // &(*ptr)

&p[1] // &(p[1])

a+b+c // (a+b)+c

a=b=c // a=(b=c)

a=b.c=d // a=((b.c)=d)



Note that associativity for unary operators seems at first a bit of an ill-defined concept. However, it is still a useful concept that basically specifies whether the operator is used as prefix or postfix. For instance, left-to-right associativity for function call tells you that if the following input were somehow valid syntactically (it is not in your language)



foo()bar


then the call would be taking place on foo and not on bar . However, for type casting, if the following were syntactically correct (it is not the case in our language):



x(int)y



then the casting operator would be applied on y and not on x .

2. AST Nodes

As seen in the lecture, the AST is generally defined using an abstract grammar. You can find the abstract grammar for our language here. It is important to ensure that the design of your classes follows the abstract grammar; the automated marking system will rely exclusively on the name of the class to determine the type of AST node and will expect the subtrees to appear in the same order as defined in the grammar file.

Note that a few AST node classes are already given as a starting point. You should not have to modify these (unless otherwise stated), but you are free to do so if you wish.

ASTNode class

Observe how the ASTNode class has an abstract children() method. When implementing a new ASTNode, you should take care that children() returns all the nodes that are included in the current node (i.e. its children). This will allow you to implement very simple analysis passes with only a few lines of code.

For instance, if you wanted to implement a function that prints the name of all variables every time they appear in the abstract syntax tree, you could simply do:


public void printAllVariableUses(ASTNode node) {

switch (node) {

case null -> {

throw new IllegalStateException("Unexpected null value");

}

case VarExpr ve -> {

System.out.println(ve.name);

}

case ASTNode n -> {

for (ASTNode child: n.children()) {

printAllVariableUses(child);

}

}

}

}



This will become handy when, for instance, you will have to check that variable are declared before usage.

Tips

Note that in this example, we do not want to visit the children of VarExpr . However, there might be cases where you may want to do so, and you should not forget to do it when needed!

3. Parser modifications

Your next task major consists in updating your parser so that it creates the AST nodes as it parses your input program. For most of your parseXYZ methods, you will have to modify the return type to the type of the node the parsing method should produce as seen during the lecture and implement the functionality that builds the AST nodes. You may have to modify slightly the design of your parser in order to accommodate the creation of the AST nodes.

4. AST Printer

Your next job consists in extending the AST printer class provided to handle your newly added AST node classes. As seen during the course, the AST printer will use pattern matching.

It is important to respect the following format when printing the AST to ensure that your output can be validated by our automatic marking system. Using EBNF syntax, the output should be of the form. AST_NODE_CLASS_NAME '(' [SUB_TREE (',' SUB_TREE)*] ')' Except for Op and BaseType , which should be Java enums, all AST nodes printed must be followed by an opening and corresponding closing parenthesis (even if there is no children).

Examples:

y = 3*x; should result in the following output: ExprStmt(Assign(VarExpr(y),BinOp(IntLiteral(3), MUL, VarExpr(x)))) .

void foo() { return; } should result in: FunDef(VOID, foo, Block(Return())) .

+x should result in just BinOp(IntLiteral(0),ADD,VarExpr(x))

-x should result in: BinOp(IntLiteral(0),SUB,VarExpr(x)) .

-x*3 should result in: BinOp(BinOp(IntLiteral(0),SUB,VarExpr(x)),MUL,IntLiteral(3)) .

-1 should result in BinOp(IntLiteral(0),SUB,IntLiteral(1)) .

2+3+4 should result in BinOp(BinOp(IntLiteral(2), ADD, IntLiteral(3)), ADD, IntLiteral(4)) (all binary operators are left associative in our language)

2+3*4 should result in BinOp(IntLiteral(2), ADD, BinOp(IntLiteral(3), MUL, IntLiteral(4))) (multiplication has precedence over addition, see precedence table)

struct node_t { int field1; char field2; }; should result in StructTypeDecl(StructType(node_t),VarDecl(INT,field1),VarDecl(CHAR,field2))

struct node_t n; should result in VarDecl(StructType(node_t), n)

Note that you are free to add white spaces in your output format; spaces, newlines and tabulations will be ignore by our comparison tool.

See the file fibonacci.c-ast-dump for an example output of java -cp bin Main2 -ast tests/fibonacci.c fibonacci.c-ast-dump .

Note that we represent the - and + unary operators using a BinOp add/sub AST node with 0 as first argument.

4'. Dot Printer (Optional)

As seen during the lectures, it might be a good idea to also implement a Dot printer in order to easily visualise your AST. This task is completely optional and will not be marked, but it might help you find problems more easily.



5. Name Analysis

The goal of name analysis is to ensure that the scoping and visibility rules of the language are respected. This means for instance ensuring identifiers are only declared once or that any use of an identifier is preceded by a declaration in the current or enclosing scope.

Please note that an identifier can either be a variable or a function.

Global and local scopes

As seen during the lectures, our language only have two scopes: global and local.

The global scope corresponds to the global variables declared outside any procedure and for the procedure declarations. Identifiers declared in the global scope can be accessed anywhere in the program (as long as their declaration preceed their use).

The block scope (or local scope) is a set of statements enclosed within left and right braces ({ and } respectively). Blocks may be nested (a block may contain other blocks inside it). A variable declared in a block is accessible in the block and all inner blocks of that block, but not accessible outside the block. Function parameter identifiers have block scope, as if they had been declared inside the block forming the body of the procedure.



In both cases (global or local), it is illegal to declare twice the same identifiers in the same current block (note that this means it is illegal to declare a variable with the same name as a procedure at the global level).

Special care must be taken in any struct definition since it is not allowed to declare twice the same field. For instance the following is invalid:



struct foo_t {

int bar;

int bar;

}



Shadowing

Shadowing occurs when an identifier declared within a given scope has the same name as an identifier declared in an outer scope. The outer identifier is said to be shadowed and any use of the identifier will refer to the one from the inner scope.

Function declarations and definitions

A function name can only be used once for a function definition, and once for a function declaration (if any). Declaring two function definition, or two function declaration, with the same name, is not legal. Every function declaration must have a corresponding function definition (i.e. it is illegal to only have a function declaration).

Before a function is allowed to be call, a function declaration or definition must have been defined first.

If both a function definition and function declaration with the same name exists, there must have identical types (return type and arguments' type).

Built-in functions

As you may have noticed in the previous part, our language supports a set of built-in functions which are defined as parts of our standard library:



void print_s(char* s);

void print_i(int i);

void print_c(char c);

char read_c();

int read_i();

void* mcmalloc(int size);



In order to recognise any call to these functions as valid, we suggest that you simply add dummy declaration for each of these (with an empty body) to the list of defined functions into the Program AST node. Please note that it is important to do this just before name analysis but after having printed the AST so that our automatic tests do not fail (we are not expecting to see these built-ins function in the AST when checking for the AST correctness).

Actual Task

Your task is to implement a pass that traverses the AST and identifies when the above rules are violated. In addition, you should add (and fill in), for the FunCallExpr and VarExpr AST nodes, a field referencing the declaration or definition (either a FunDefinition or VarDecl). This field should be updated to point to the actual declaration/definition of the identifier when traversing the AST with the name analysis pass. This establishes the link between the use of a variable or function and its declaration/definition as seen during the lectures.




6. Type analysis

The goal of type analysis is to verify that the input program is well-typed and assign a type for each expression encountered. As seen during the course, the typing rule of our miniC language are defined using a formal notation. You can find all the typing rules here. As usual, if you notice an error or if something is not clear, please post your question on the online forum.

Your task consists of extending the sem.TypeAnalyzer class and implement the type checking mechanism following the typing rules.

Arrays

The elements of an array can be of any type, except void . For instance, the following two declarations are illegal:


void array[10];

void array[2][3];



When checking for type equivalence for arrays, it is important to ensure that their lengths match. However, an access to an array that is out-of-bound, does not constitute a typing error. The following code is perfectly valid as far as type analysis is concerned (although it is likely to crash at runtime or produce an unexpected result):


int foo(int a[3]) {

return a[99]; // this is fine

}

int bar() {

int b[3];

return foo(b);

}



While the following example is invalid:



int foo(int a[3]) {

return a[1];

}

int bar() {

int b[6];

return foo(b); // foo expects an array of size 3, not 6

}



Structures

Structure declaration are represented in the AST as StructTypeDecl. The type analysis pass must ensure that each structure declaration has a unique name. You can enforce this by creating a simple pass which checks for this before running the type checker for instance.

Similarly to the function call and variable use, your type analyser needs to check that if any variable is declared with a struct type, the struct type exists. For instance if you encounter a variable declaration such as struct node_t var; , you must ensure that the corresponding node_t structure has been declared earlier.

Note that structure can only be defined recursively when using a pointer. So the following example is valid:



struct x_s {

int a;

struct x_s * s; // recursive reference via a pointer is fine

}



while the following example is invalid:



struct x_s {

int a;

struct x_s s; // recursive reference without a pointer is invalid

}



For the last example, the issue is that it is impossible to tell how large this data structure is due to the recursive definition. Hence, this should be rejected by the compiler during semantic analysis.

Since the fields of a structure are variable declaration, they cannot be of type void as seen in the typing rules.

Finally, when accessing a structure, you must also check that the field exist in the structure type declaration.

String literal

String literals are represented in our language as null terminated char arrays. The string literal "Hello" should therefore be of type char[6] holding characters 'H' , 'e' , 'l' , 'l' , 'o' and '\0' where \0 represents the null character.

Strong typing

Our language is strongly typed. This means that there are no implicit casts between expressions and the cast must be explicit. For instance the following code is invalid in our language:



int i;

char c;

i=c;


To make this valid, an explicit cast operation must be performed. The following is valid:




int i;

char c;

i=(int)c;





热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图