ICS 33 Fall 2024
Project 3: Why Not Smile?
Due date and time: Monday, November 18, 11:59pm
Introduction
When I was a young kid, one of my teachers introduced me to a computer for the first time; it was a state of the art (in those days) personal computer called a Radio Shack TRS-80 Model I. First, I played little math games and messed around with other new-fangled educational tools from 1980; the state of the art wasn't much then, but it was fun and new, and felt alive with possibility.
Booting up a TRS-80 took the user directly into the equivalent of a Python shell; you could load programs from external storage like floppy disks or cassette tapes, but the computer's default mode was an environment for writing programs. My teacher asked me if I wanted to learn how to write my own programs, which I thought sounded like a great idea, though I had no idea how to do it. So, I opened up a book of his about the TRS-80's primary programming language, which was called BASIC, which was a good teaching and learning tool for its day: versatile and easy to start with, much like Python is today. I typed in a short program that asked a user for a number of hits and a number of at-bats and printed out a batting average (foreshadowing my later interest in baseball, though I didn't know what it meant at the time). I ran the program, tried it out, and I was mesmerized; the computer did exactly what I asked it to, exactly the way I asked it to. I was hooked. Over forty years later, I still am.
A natural progression of one's curiosity about programming revolves around the question of how to implement one's own programming language. Where do they come
from? How are they built? While we won't be able to tackle these questions in too much depth — there are at least three different courses in our undergraduate curriculum that cover aspects of this — this project will ask you to begin exploring them. For that
purpose, I've designed a considerably limited (and somewhat different) version of BASIC called Grin, which supports a small handful of statements. You'll be building a Grin interpreter, a program written in Python that takes a Grin program as its input, executes the Grin program, then shows its output. (This may sound a little mind-bending, but it's not as crazy as it sounds. The Python interpreter you've been using was most likely written in a language other than Python; the most popular one is written in a language called C.)
In the process of building your interpreter, you'll gain experience in a few areas that will stretch your abilities:
· Continuing to develop your understanding of object-oriented design, as you'll be on the lookout for concepts in the program that would best be represented as classes.
· Using inheritance, which is Python's mechanism for defining new classes in terms of existing ones, an important technique when you have many classes that share at least some of the same behavior.
· Writing unit tests incrementally that cover as much as of your program as it practical, so that you can verify that parts of your program work as you expect before you build larger parts on top of them.
The Grin language
The precise requirements for your interpreter are discussed later in this write-up, but we'll first need to agree on the definition of the Grin language that your interpreter will implement. Grin is a programming language, though its design is quite different from Python's, so we'll first need to acquaint ourselves with how it works. Given a Grin program, you'll need to know, first and foremost, what its output is meant to be.
A Grin program is a sequence of statements, one per line. Here's an example of a Grin program:
LET MESSAGE "Hello Boo!"
PRINT MESSAGE
.
Each line contains exactly one statement (i.e., there can be no blank lines). Grin assigns a line number to each of the statements, where the first statement in the program is numbered 1, the second statement is numbered 2, and so on. There is no predefined limit on the number of statements in a Grin program. Execution of a Grin program always begins at line number 1. The last line contains only a dot (.) and nothing else, as a way to mark that the program has ended; it's not a statement, but any subsequent lines of text in the Grin program after that end-of-program marker are ignored.
The program above consists of two statements. The first one stores the text Hello Boo! into a variable named MESSAGE, then the second one prints the value of that same variable. The output of the program is what you'd expect, given that description.
Hello Boo!
Lexical rules
Like most programming languages (including Python), a Grin program is made up of a sequence of lexemes, which is a fancy-sounding term for a sequence of characters that combine together with a single meaning and comprise one of the indivisible "atoms" in the language, similar to the role that words play in sentences written in natural languages like English. Programming languages that are written textually generally define a set of lexical rules that specify which lexemes are valid and how to derive a meaning for each of them; Grin is no different, in that respect, so we'll need to start our journey with Grin by acquainting ourselves with those rules.
Grin programs are made up of the following kinds of lexemes.
· Integer literals, which are sequences of one or more digits (0-9), optionally
preceded by a minus sign -. Their meaning is their corresponding integer value.
· Floating-point literals, which are integer literals that are immediately followed by a dot . and, optionally, one or more digits (0-9). Their meaning is the corresponding floating-point value.
· String literals, which are sequences of zero or more characters, both preceded and followed by one double quote character ". Their meaning is the sequence of
characters contained between the double quotes, but not including the double quotes.
。There are no special escape sequences such as \n that you'd find in Python, which means that there are two kinds of characters that cannot appear in string literals: newlines and double quotes.
· Identifiers, which are used to specify the names used to describe things like
variables and labels. Identifiers begin with a letter, optionally followed by a
sequence of letters and digits. Identifiers in Grin are case-sensitive, which means that BOO, Boo, and boo are each considered to be different from the others.
· Keywords, which are sequences of zero or more characters that have a special
meaning and, thus, can never be used as identifiers. The following are keywords in Grin: ADD, DIV, END, GOSUB, GOTO, IF, INNUM, INSTR, LET, MULT, PRINT,
RETURN, SUB.
· Comparison operators, which can be used in some statements to compare two values. There are six comparison operators: =, <>, <, <=, >, and >=.
· Label markers, which are colon characters (i.e., :) that are used to specify the existence of a label on a line.
· End-of-program markers, which are dot characters (i.e., .) that are used to mark the end of a program.
Some examples of Grin lexemes and their meanings follow.
0 # Integer literal (zero)
13 # Integer literal (positive)
-18
|
# Integer literal (negative)
|
0.0
|
# Floating-point literal (zero)
|
11.75
|
# Floating-point literal (positive)
|
-3.0
|
# Floating-point literal (negative)
|
""
|
# String literal (an empty one)
|
"Boo!"
|
# String literal (containing four characters)
|
A
|
# Identifier
|
BOO
|
# Identifier
|
THIS1ISTHELAST1
|
# Identifier
|
IF
|
# Keyword
|
GOTO
|
# Keyword
|
=
|
# Comparison operator
|
>=
|
# Comparison operator
|
:
|
# Label marker
|
.
|
# End-of-program marker
|
Labels
Any statement in a Grin program can begin with a label, which is a name that can be
used to refer to that statement elsewhere in the program without having to rely on
knowing its line number. Labels appear at the beginning of a line, and are made up of an identifier followed by a colon.
LET A 3 PRINT A
GOSUB "CHUNK"
PRINT A PRINT B
GOTO "FINAL"
CHUNK: LET A 4
LET B 6 RETURN
FINAL: PRINT A
.
In the program above, two statements have labels on them: LET A 4 is labeled as CHUNK and the last statement is labeled as FINAL.
Spacing
One of the features of Python's syntax is that the way you space your program — indention, empty lines, and so on — has an effect on your program's meaning. Grin, in that sense, is different. Grin programs cannot have blank lines in them, each statement must be on its own line, and at least one space is required to separate lexemes that would otherwise be combined, but the specific amount and placement of blank space between the lexemes on each line is otherwise irrelevant. So, the following program is legal and equivalent Grin to the previous one shown, though obviously there's a lot to be said for using spacing to make a program's meaning more obvious to a human reader.
LET A 3
PRINT A
GOSUB "CHUNK"
PRINT A
PRINT B
GOTO "FINAL"
RETURN
FINAL: PRINT A
.
Variables
A Grin program can utilize variables to store values that can be accessed again (or modified) later. Each variable is named by an identifier. Variables do not need to have values assigned to them before they are used, and any variable that is used before it is assigned has the integer value 0.
The primary way to change the value of a variable is with a LET statement. A LET statement changes the value of one variable, by either assigning it a literal value or the value of another variable.
· LET A 3 — changes the value of the variable A to the integer 3
· LET NAME "Boo" — changes the value of the variable NAME to the string "Boo" · LET QQQ SSS — changes the value of the variable QQQ to store a copy of the value stored in the variable SSS
You can print the value of a variable to the output by using a PRINT statement. A PRINT statement prints the value of one variable, followed by a newline.
So, consider the following short Grin program:
LET NAME "Boo"
LET AGE 13.015625 PRINT NAME
PRINT AGE
.
Its output would be:
Boo
13.015625
The formatting rules used when printing the values of variables depend on their types.
· Integers are printed as they are in Python: An optional - character (for negative
integers only), followed by a sequence of one or more digits without leading zeroes.
· Floating-point numbers are printed as they are in Python: An optional - character (for negative numbers only), followed by a sequence of one or more digits without leading zeroes, followed by a . character (a decimal point), followed by a sequence of one or more digits without leading zeroes.
· Strings are printed by printing their contents (i.e., the characters within them), without double quotes around them.
Reading input
Grin includes two statements for reading input from the console:
· INNUM, which is used when you want to read an integer or floating-point number. · INSTR, which is used when you want to read a string.
Either way, the syntax is mostly the same: We write INNUM or INSTR, followed by the name of the variable into which you want to read the input value. A short Grin program demonstrates the idea.
PRINT "Number:" INNUM X
ADD X 7 PRINT X
.
This program prints output and also reads input, so let's imagine what that might look like when we execute it.
Number:
11
18
First, the PRINT statement on line 1 will have printed Number:. Next, a line of input will have been read and treated, in this case, as the integer 11, which will be stored in the
variable named X. We'd then add 7 to X, causing its value to become the integer 18. Finally, we'd print X's value, which causes 18 to be printed.
The precise rules for INNUM need to be specified, though, since not all inputs are valid.
· When the user enters a sequence of digits, optionally preceded by a minus sign -, the value is treated as an integer. Leading or trailing whitespace is permitted.
· When the user enters a sequence of digits, optionally preceded by a minus sign -, followed by a dot (i.e., .), optionally followed by more digits, the value is treated as floating-point. Leading or trailing whitespace is permitted.
· When the user enters anything not meeting these characteristics, the program terminates with an error message.
Meanwhile, the precise rules for INSTR are much simpler, because not much can go
wrong. We read a line of text, then store the contents of that line (without a trailing
newline) into the given variable. Any line of text, including empty lines or very long lines, is permitted, so there are no error conditions to consider.
Control flow and how to alter it
A Grin program is executed one statement at a time, beginning at line number 1.
Ordinarily, execution proceeds forward, so that line 1 will execute first, followed by line 2, followed by line 3, and so on. Execution continues until either an END statement is reached, or until execution proceeds beyond the last statement in the program.
Like most programming languages, Grin makes it possible to write programs that
execute out of sequence, though the mechanisms are a bit more primitive than they are in a language like Python. A GOTO statement causes execution to "jump" immediately forward or backward by the given number of lines. For example, the statement GOTO 4 jumps execution to the line number that's 4 greater than the current one. Here's an example Grin program that uses GOTO:
LET A 1 GOTO 2 LET A 2 PRINT A
.
In this program, line 1 is executed first, setting the variable A's value to 1. Then the GOTO
statement will immediately jump execution of the program to line 4 — the GOTO
statement is on line 2, and two lines beyond that is line 4 — skipping the second LET. Line 4 prints the value of A, which is still 1. So, the output of the program is simply 1.
A GOTO statement mayjump either forward or backward, meaning that the following program is a legal Grin program. See if you can figure out what its output would be. (Remember that the value of a variable that hasn't yet been assigned with a LET is 0.)
LET Z 5 GOTO 5 LET C 4 PRINT C PRINT Z END
PRINT C PRINT Z GOTO -6
.
Alternatively, GOTO statements can specify a string literal specifying a label instead of a line number, in which case execution jumps to the line that is marked with that label. A Grin program equivalent to the previous one, but that uses labels instead of line numbers, follows.
LET Z 5
GOTO "CZ"
CCZ: LET C 4
PRINT C PRINT Z END
CZ: PRINT C
PRINT Z
GOTO "CCZ"
.
GOTO statements can cause the program to terminate with an error message in a few circumstances.
· Jumping to a line number that's zero or negative.
· Jumping to a line number that's more than one beyond the last statement of the
program. (So, in a five-statement program with a . on line 6, you can jump to line 6 — which will cause the program to end — but not to line 7 or greater.)
· GOTO 0 would be guaranteed to be an infinite loop, so it is not permitted. · Jumping to a non-existent label.
Finally, it should be noted that GOTO statements can use variables to specify their target, as long as the variable contains either an integer or a string value, in which case that
value is treated the same as it would have been if it had been specified literally.
LET Z 1
LET C 11
LET F 4
LET B "ZC" GOTO F
ZC: PRINT Z
PRINT C END
CZ: PRINT C
PRINT Z GOTO B
.
When the target of a GOTO is a variable containing something other than an integer or a string, that, too, terminates the interpreter with an error message.