BASIC, which stands for "Beginner's All Purpose Symbolic Instruction Code," was one of the earliest, simplest, and easy to use of the high-level programming languages. It's also fun to play with, so we've been contemplating creating a small BASIC interpreter that would work with the DIY Calculator (the reason we use the term "interpreter" will be made clear as we progress).

View Topics



 
Introduction
Developed by John Kemeney and Thomas Kurtz at Dartmouth College in the mid-1960s, BASIC was designed to be easy to learn and use, and was intended to provide a stepping stone to more complex programming languages.

Before we plunge headfirst into the fray, it may ease our discussions to consider a typical usage model for one of the early BASIC incarnations. First of all there would be some sort of command and control window – a "system console" – as shown below (in the case of early computers this would occupy the entire screen):

This is obviously just the top part of the console screen. The white rectangle in the upper-left-hand corner represents a flashing cursor. (In today's computers the cursor is a thin vertical flashing line that is presented between two characters. By comparison, in the old days the cursor occupied an entire character position. If there was a character "under" the cursor, then when the cursor flashed white the character would be black, and vice versa.)

Now let's assume that we have created our BASIC interpreter program and that it's already running on the computer (the DIY Calculator in our case). Part of the interpreter will be an editor function that allows us to enter, view, and modify our programs. Let's also suppose that we use the computer's keyboard to enter two lines of source code text as shown below (in order to complete a line we press the <Enter> key):

This is a very simple program. The first line defines a variable called "A" and assigns it a value of 6 (my lucky number). The second line instructs the interpreter to print (output) this variable, where the default output device would be our console window.

Observe that each line commences with a number, which the user has to enter along with the rest of the statement. When we eventually come to run the program, the interpreter will execute the lines in the order defined by their line numbers. The reason we typically increment the line numbers in steps of 10 is to facilitate making changes later. For example, suppose that we actually wish to increment the value in "A" before printing it out. One way we could do this would be to enter a new line as follows (note that we don't need to use a LET command in this new line, because we already declared the variable "A" in line 10 of the program):

Although the way the program currently appears on the screen may look a little strange, this is due only to the order in which we entered it, but the interpreter is keeping track of what's going on. For example, if we were to run the program right now (which we could do by entering the RUN command as discussed a little later), the interpreter would execute the lines in the correct order: 10 followed by 15 followed by 20. If we wish to see the program presented in the correct order, we can use the LIST command as follows:

By default, the LIST command will list out the entire program, but we could specify a subset of the program if we wished (for example, "LIST 10-15" would list all if the lines between 10 and 15 inclusive). Observe that the LIST command doesn't have a line number associated with it, and that this command is executed immediately. Some commands – like LIST – can only be executed in this way, while others can be executed both inside and outside the program. For example, if a command like CLRSCREEN ("clear the console window and place the cursor in the top-left corner) is entered without a line number, then it will be executed immediately. Alternatively, if this command is prefixed by a line number, then it will only be executed when we run the program and reach this line.

OK, now let's actually run our example program. We do this by entering the RUN command as shown below:

Remember that our program first defines a variable called "A" and assigns it a value of 6. Next, we increment the value in "A", and then we print the resulting value – which is 7 – out to the screen. Hey, we didn't say that this was going to be a particularly interesting program!




 
Why Do We Say "Interpreter"?
Let's suppose that we've entered a one line program that simply prints the word "HELLO" to the console window as shown below:

Now let's consider what's going on inside the computer's memory (this would be the DIY Calculator's memory in our case). One technique would be for the source code to be stored "as-is" in the memory as illustrated below:

As we see, the BASIC interpreter is resident in memory; this program is running all of the time. As we previously noted, part of this program acts as an editor that allows us to enter, view, and modify our programs. This editor stores the source code in another area of the computer's memory. When we use a RUN command, the interpreter starts reading the source code and executing it; in fact, we say that it's "interpreting" the program.

In reality, every technique for taking a high-level language and converting it into a form that can be executed on a computer is a matter of translation. If the source code is in assembly language and the target (destination) is machine code, then we call the translator an assembler. Alternatively, if the source code is in a high-level language like C and the target is assembly language or machine code, then we call the translator a compiler. Last but not least, in the case of tool that accepts source code as input, interprets that source code "on-the-fly," and executes it immediately, we call this form of translator an interpreter.

In the case of a pure interpreter, this will analyze the source code (as entered by the user) every time that source code is executed. Thus, if we have a loop containing 20 lines of code and this loop is executed 100 times, then each line will be re-interpreted every time we go around the loop. In reality, such a beast is rare (thank goodness). The way most BASIC interpreters work is that when you enter a new line of source code, this code is interpreted into – and stored in – an intermediate (internal) form. This internal form, which may be referred to as pseudo-code (or P-code), is used to compress the source code into fewer memory locations and to speed the interpretation process. When you subsequently run the program, the interpreter actually interprets and executes this internal form. Similarly, when you use a LIST command to display your program (or part thereof), the interpreter translates it back into the full-blown representation you expect to see. But all of this is a story for another day (we will regale you with these nuggets of knowledge when we eventually get around to creating and documenting our own BASIC interpreter).




 
Defining Our Language
Before we [or you] would set about creating a BASIC interpreter, we would first need to define the syntax of our BASIC language and the constructs and functions we intend to support. The following points are intended only to provide "food for thought," and we would need to ponder this in depth before we actually started implementing anything.

First, we're going to require some high-level control commands as follows (there are many more commands we could think of; these are just a few to start the ball rolling):

NEW   Clear the source code memory area and prepare to start a new program
  
LIST   Display the source code program. By default, this would list the program from the beginning to the end, but it would do so in screen-size "chunks" and – after displaying each block – it would present you with the ability to continue to the next block or terminate the listing. This command should also support options like the ability to say "LIST 100-200" (display the program from line 100 to line 200, inclusive); "LIST -200" (display the program from the beginning to line 200); and "LIST 100-" (display the program from line 100 to the end);
  
RUN   Run the program. By default this will start at the first line in the program, but it should also be possible to append a line number to the command such as "RUN 100", which would instruct the interpreter to start running the program from line 100

Also, at a minimum we're going to need the ability to declare integer variables (we would also have to define the size/range of these integers; for example, if each variable occupied only two bytes, we could represent numbers in the range -32,768 to +32,767). The simplest possible implementation would be to support 26 such variables named "A" to "Z". We would declare such a variable and perform an initial assignment using a LET command; for example, "10 LET A = 6" (note that the "10" in this example is the line number). A more advanced option would be to support user-defined variable names along the lines of "15 LET FRED = 6".

Once we've declared some integer variables, we need the ability to perform operations on them. For example, suppose we've already declared the variables "A", "B", and "C" and assigned values to them, we might then say something like: "15 LET D = (A + B) * C". The minimum set of math operators would be +, -, *, and / (add, subtract, multiply, and divide). We would also need logical operations such as AND, NAND, OR, NOR, XOR, XNOR, and NOT.

As opposed to supporting only scalar integer values as discussed above, we might decide to support arrays; for example, "20 DIM A(10)" (where "DIM" stands for "Dimension" and is used to declare an array). The elements forming an array in BASIC are typically numbered from "1", so our DIM statement would declare an array of values called "A" and referenced from 1 to 10. Thus, in order to load the fourth element in this array with a value of "36", we would say "30 A(4) = 36". (One reason for numbering an array from 1 in BASIC – as opposed to numbering from 0 – is that it's easier for beginners to understand. Also, having made this decision, we are now free to use element 0 of the array – which the user never sees and has no access to – to store the size of the array.)

With regard to the previous point, are we going to support multi-dimensional arrays? For example, "40 DIM(10,20)", which we can visualize as being a two-dimensional array of 10 columns and 20 rows, or vice versa. Thus, in order to load the element in the 6th column and 4th row of this array with a value of 42, we would say "50 A(6,4) = 42". What about three-dimensional arrays, and so forth?

In addition to integers, are we going to support real numbers such as 3.142? If so, how are we going to do this?

It would be really useful to be able to declare and use string variables. The simplest possible implementation would be to support 26 such variables named "A$" to "Z$". We would declare such a variable and perform an initial assignment using a LET command; for example, '60 LET A$ = "Hi there!"'

We're also going to need some way to output results. The most common way to do this is to provide a PRINT command. For example, '70 PRINT "Hello"' would display "Hello" on the system console, while "80 PRINT A" would display the value associated with the variable "A".

Another pair of very useful commands are PEEK and POKE, which allow is to read and write from selected memory locations, respectively. For example, suppose we wished to read a value from the input port connected to the DIY Calculator's keypad (this port is located at memory address $F011) and load it into a previously declared integer variable called "A". In this case we could use something like "90 A = PEEK ($F011)". (Remember that these are just ideas, we're not attempting to define an official syntax in these discussions).

It would be very boring if our programs simply started at the lowest line number and always worked their way to the highest line number; thus, we need some way to change the order in which the program executes. One example is a GOTO ("go to line number") instruction; for example "100 GOTO 70". In this case, when the interpreter reaches line 100, it will immediately jump back to line 70. If we wanted to get really adventurous, we might also provide a GOSUB ("go to subroutine") command.

We also need some form of conditional test and control instruction(s). The simplest example would be an "IF ... THEN"; for example, "95 IF A = B THEN GOTO 120" (if the value in "A" equals the value in "B", the interpreter will jump to line 120 in the program). We could further extend this to support "IF ... THEN ... ELSE". There are also a variety of other program flow control statements we might decide to implement, such as "FOR ... NEXT", "WHILE ... DO", and "REPEAT ... UNTIL" (we will go into this further when we come to formally define our language at some stage in the future).

With regard to the previous point, in which we specified the condition "IF A = B", we actually need to support a selection of such conditions; for example, "A = B" (A equals B), "A != B" (A is not equal to B), "A < B" (A is less than B), "A > B" (A is greater than B), "A <= B" (A is less than or equal to B), and "A >= B" (A is greater than or equal to B).

And the list goes on ... Once again, remember that the above is intended only to provide some points to ponder. We'll go into this in more detail and firm these ideas up at some stage in the future when we eventually sit down, roll up our sleeves, put our thinking caps on, and start to create a real BASIC interpreter.

Note: When Bill Gates and Paul Allen created their first version of Microsoft BASIC to run on the Altair personal computer in 1975, the entire thing occupied only 4K bytes of memory (this included the interpreter itself and the area used to store the source code programs). See also "The First Personal Computers" topic in the "History of Calculators and Computers" section of our More Cool Stuff page.

Note: See also the discussions on creating a Backus-Naur Form (BNF) description of the language subset you intend to support in the General Notes section below.

Note: There is an ANSI (American National Standards Institute) standard for BASIC (check out their website at www.ANSI.org), but everyone who creates a BASIC interpreter – and a lot of folks do – almost invariably "tweaks" the syntax and adds a plethora of their own commands.




 
Creating the Interpreter
This is just something to think about ... Once we [or you] have defined the BASIC syntax and language subset we intend to support, how would we [or you] set about actually creating the interpreter?

One technique would be to write the interpreter in the DIY Calculator's assembly language, and to then use our existing assembler to generate the machine code version of the interpreter. However, if we already have access to a C compiler as discussed elsewhere on our More Tools page, then we could actually create a C program to implement our BASIC interpreter (the mind boggles).




 
Relationship to the DIY Calculator
One point you may have wondered about is: "How does a BASIC interpreter tie into the DIY Calculator?" Well, in our book How Computers Do Math, we describe how to create a program in our assembly language to make the calculator ... calculate.

Now, let's suppose that we have a BASIC interpreter. In this case, we could create a program in BASIC and use this program to drive the calculator; we could use PEEK commands to read values from the input port connected to the calculator's keypad; POKE commands to write values to the output port driving the calculator's main display; and other BASIC instructions to implement the functionality of the calculator.




 
Console and Keyboard
In order to actually make use of a BASIC interpreter, we need a system console as reflected in the illustrations presented earlier in this topic. As we pen these words, we are adding such a console to the DIY Calculator's treasure chest of tools and devices.

The nitty-gritty details as to how this console device works will be fully discussed in the supporting documentation associated with this release of the DIY Calculator (see the Download page for more details when we make this release available). Suffice it to say for the moment that the DIY Calculator's CPU will be able to write to this display by means of one of its output ports.

Furthermore, we are going to require some way in which to enter our BASIC source code programs. Thus, we are going to provide a virtual QWERTY keyboard device that will be connected to one of the DIY Calculator's input ports. Watch this space ...




 
General Notes (Sharing Your Work) and Questions
1)   If you do decide to create a BASIC Interpreter as described here, we're sure that other users would be very interested in seeing it and using it. We would be very happy to make such a tool available via the DIY Calculator website (giving full credit to you, of course).
  
2)   Note that the ideas presented here are just a few thoughts that have popped into our minds during the course of writing How Computers Do Math. If you think of any other considerations we should note regarding our proposed BASIC Interpreter, email us as described on the About/Contact Us page on the main DIY Calculator website and we'll add them to the list so that they are there should we (or someone else) decide to actually create such an interpreter.
  
3)   Introduced by John Backus and Peter Naur in the late 1950s and early 1960s, Backus-Naur Form (BNF) is a technique for recursively defining the grammar – the words, symbols, and tokens – associated with a computer language. Having such a description greatly eases the task of creating a parser for a language. For example, a full BNF description of our existing assembler language is provided in Appendix E of The Official DIY Calculator Data Book, which is itself provided on the CD-ROM accompanying our book, How Computers Do Math. The point is that, if you are contemplating creating a BASIC interpreter, it would be a very good idea to kick of the process by documenting a full BNF description of the language you intend to support.

There are always a lot of points to ponder before embarking on a new software development quest. We've had a head-start, because we've been pondering furiously for a long time. Thus, if you are interested in creating a BASIC interpreter and want to bounce some ideas around, please feel free to drop us a line as described on the About/Contact Us page on the main DIY Calculator website.




 
Feedback From Readers

Jon, Cleveland, Ohio, USA: Here's my 2 cents about developing a BASIC interpreter based on my experience...

I have written several rudimentary interpreters for simple scripting languages used in telephone switches, but a general purpose language interpreter was a bit more difficult. I had a few false starts with my BASIC interpreter because I designed it around a language lexicon, which was based on my previous experience with the simpler script interpreters. I found that I needed to design the whole interpreter around a complex expression parser instead.

As you know, complex expressions involve variables, literal values, and function return values, along with comparison and arithmetic operators, which may be nested to any depth within parenthesis. These complex expressions turn up everywhere in programs and on the command line.

I used object oriented methods in developing my BASIC interpreter (C++ language) so I designed a complex expression parser class first. The value of each variable, function return, and literal are fetched or calculated and put on an operation stack along with their associated operators.

Values are always put on the operation stack in pairs along with the operator. Expressions involving nested parenthesis are parsed and evaluated recursively, the resulting value being put on the operation stack one level up in recursion. Long simple expressions (A + B – C +...) are also broken down into value pairs with an operator. The result of a comparison expression is always boolean. ( A simple A = B is an assignment operation. However, if A = B occurs after an IF keyword, or embeded within a complex expression, it's a comparison operation.)

So, expression evaluation usually involves two values and an operator. Complex nested expressions are evaluated recursively to facilitate the two value, one operator sceme. Of course there is the common situation where an expression is one variable, literal or function, in which case the value is simply fetched or calculated.

Type checking and some syntax checking is done in the expression parser.

Yes, a language lexicon is necessary to identify command keywords, variables, functions, and literal values. Also, a storage class is needed to store and fetch variables. However, I think these functions are ancillary to the expression parser. So, designing the interpreter around a complex expression parser made the difference between my functioning BASIC interpreter and my previous failed versions.