The assembler provided with the DIY Calculator is of the “cheap and cheerful” variety, so we’ve been
pondering some of the features that one might decide to include in a new “super assembler” if one was so inclined …
View Topics
Introduction
As noted above, the assembler provided with the DIY Calculator is of the “cheap and cheerful” variety. This little scamp
was designed to be small, efficient, and relatively easy to use; for example, it locates and reports only a single error at a time,
which tends to be easier for beginners.
Having said this, however, the original assembler’s error messages can be somewhat cryptic on occasion, so it would make our lives
easier if these could be improved. We’ve also pondered a smorgasbord of “wouldn’t it be nice if” features that one might
decide to implement in a shiny new “super assembler” with chrome plating and “go-faster stripes” painted on the side. Some
of our musings are as presented below:
Longer/Richer Label Names
Any user-defined labels in our existing assembly language are limited to only eight characters, which must be
alphanumeric or underscores. When writing a program, one almost invariably finds this somewhat limiting, so it might be nice
to support longer labels of say 10 or 12 characters. Alternatively, one could decide to support labels of arbitrary length
(the downside here is that this would mess up the format of the *.lst (list) files generated by the assembler).
As a somewhat related point, labels in our existing language are obliged to commence with an alpha character or an underscore
(but not a numeric character). It might be useful to allow labels to start with numeric values and also support a richer set of
characters in label names such as “!”, “~”, “$”, “%”, and so forth. (Note that you have to be careful here, because this will
make it harder for the assembler to determine “what’s what” when it’s parsing your source code.)
Character/String Assignments
The existing assembler supports decimal, binary, and hexadecimal assignments to .EQU, .BYTE, .2BYTE, and .4BYTE
directives and to other instructions as shown below:
The problem is that whenever we need to use ASCII codes, we have to either translate them by hand or use the existing assembler’s
Insert > String utility. Thus, it would be really useful to add a character/string type, which would facilitate assignments
such as those shown below:
In this case, the assembler would cause the LDA (“load accumulator”) instruction shown above to load the accumulator with the
ASCII code for a question mark character (that would be $3F, just in case you were wondering).
Similarly, the assembler would cause the .BYTE statement to reserve 11 memory locations and to load them with the ASCII codes
associated with the “Hello World” string (these codes would be $48, $65, $6C, $6C, $6F, $20, $57, $6F, $72, $6C, $64). Note also
that we typically use a NUL code ($00) to terminate the end of the string (that is, to give our programs something they can use
to determine the end of the string), which is why we appended one onto the end of our .BYTE statement (we could cause
the assembler to do this automatically, but there may be cases in which we don’t wish this to occur, so it’s best to do this explicitly).
BCD Assignments
As was noted in the previous point, the existing assembler supports decimal, binary, and hexadecimal assignments to .EQU, .BYTE, .2BYTE, and .4BYTE directives and to other instructions.
When we modified the DIY Calculator’s CPU to support Binary Coded Decimal (BCD) additions and subtractions in the form of the DADD,
DADDC, DSUB, and DSUBC instructions, we also modified the assembler delivered with the DIY Calculator to support these
instructions, but we decided not to offer a special BCD assignment. What would such an assignment have looked like if we had
offered it? Well, one possibility would have been as shown below:
Note that we used a carrot (“^”) character to indicate the BCD assignment in this example, but you may want to see if you
can come up with a more appropriate character.
So why would we want to have such an assignment? After all, both of the above would load exactly the same value into the
accumulator. In fact, there are a couple of reasons. First, it makes our intent clear, because having a special BCD assignment
makes it apparent to anyone reading the program that we are considering this value to be in BCD. And second, this would allow
the assembler to check that we aren’t mistakenly trying to use an illegal (non-BCD) value; for example:
In this case, the first load at the LOAD1 label is perfectly legal. By comparison, the second load at the LOAD2 label is obviously wrong,
because the carrot (“^”) indicates that we are considering this to be a BCD value, but the “A” in the “4A” value is not a BCD digit
(BCD digits are of course in the range ‘0’ through ‘9’). Similarly, the assembler would also flag an error for the third load at the
LOAD3 label.
Detailed Error Identification
When the current assembler detects an error, it scrolls you to that line in the source code and leaves the
cursor at the beginning of the line. It’s also prone to informing you that: “There’s something wrong with this line, but I don’t
know what it is!”
A more useful scenario would be for the assembler to (a) provide more detailed error messages and (b) for it to highlight the word
in question. It might also be possible to embed the error message in the source code window as shown below:
In this case, the label BERT has been highlighted and the carrot (“^”) character and the error message “Undefined Label” have been
added into the source code window to facilitate our locating and understanding the problem quickly and efficiently.
Multiple Error Detection
Our current assembler stops as soon as it finds an error. By comparison, professional-level assemblers tend to process
the entire program and report multiple errors.
In some cases this multi-error technique can be a pain, such as when the same undefined label is reported hundreds of times.
Most of the time, however, this type of assembler makes it possible for the experienced user to address a whole bunch of errors
quickly and efficiently.
An Assemble > Subroutine Command
Let’s suppose that we create a subroutine that we wish to save in its own source file and then insert it into
one or more programs later. For example, suppose that we created the following subroutine in a file called clear-version-1.asm:
The problem arises when we wish to assemble this routine in isolation to make sure that we haven’t made any syntactical errors.
In order to assemble and debug this routine using our original assembler, we would have to declare two dummy labels for CLRCODE and
MAINDISP, and also add .ORG and .END directives as illustrated below (the additional statements are shown in bold):
Once we’ve eventually assembled and debugged this routine, we would then have to strip these additional comments back out again before
finally re-saving our clear-version-1.asm file in a form that we can insert into other programs later.
The point is that it would be useful to augment our existing assembler's File > Assemble command with a special File > Assemble Subroutine
command. This new command could instruct the assembler to automatically insert .ORG and .END statements (this would be “on-the-fly”
insertion used only for the purposes of debugging this routine; these statements wouldn’t actually be added into the source code).
Similarly, when using this File > Assemble Subroutine command, the assembler wouldn’t stop when it came to an undefined label
(it could warn us about any such labels, but it would automatically substitute dummy values as required).
As one final point, although we’ve discussed having a file containing only one subroutine, it is more than likely that we would occasionally
wish to create a file containing two or more routines, and our proposed File > Assemble Subroutine command would have to take
this into account.
A #include Command
With regard to the previous point, lets assume that we have created a suite of subroutines that are stored
in individual assembly source code files (some files may contain two or more routines). For example, let's suppose that we have created
the following:
File “add16.asm” contains a subroutine called “_ADD”
File “sub16.asm” contains a subroutine called “_SUB”
File “mult16.asm” contains a subroutine called “_MULT”
File “div16.asm” contains a subroutine called “_DIV”
Now, let's assume that we are working on a new program into which we wish to insert the above routines. In the case of our existing assembler,
we would have to place our cursor at the target location in the new program and then use the assembler’s Insert > File command
four times (once for each file).
One big consideration (problem) with this approach is that once such a subroutine has been inserted in the main program it’s here to
stay unless you delete it by hand. This can be unwieldy if you wish to experiment with different versions of the subroutine sharing
the same name but stored in different files. Similarly, this technique can be somewhat of a pain if – when you eventually run
your program – you detect a problem in the subroutine regarding the way in which it works in the context of the rest of the program. In this case, you will have to hand-delete the routine from the main program, correct the problem in the individual subroutine source file, and then re-insert the routine back into the main program.
Thus, there would be a lot of advantages if the assembler supported a #include statement. For example, consider the "Subroutines"
section in the source code of the main program, which could now look like the following:
Similarly, you might decide to create a file of constant declarations that you commonly use in your programs, and then use a #include
statement to insert these at the beginning of each program.
There are a couple of points to consider here. For example, the assembler currently treats anything to the right of a “#” character
as being a comment, so it would now have to be modified to detect the “#include” string starting at the beginning of a line and
respond appropriately.
Furthermore, the assembler would have to be modified with regard to the way in which it generates error messages and the usage
model for displaying and correcting these errors. For example, instead of simply reporting a line number like “Error in line 125”, the assembler should
prefix this with something to indicate in which file the error took place; perhaps something like: “Error in line MAIN 125”
(if the problem was detected in the main program) or “Error in line _ADD 125” (if the problem was detected in subroutine “_ADD”).
Also, if an error is detected in a subroutine that is stored in an external file, then it would probably be a
good idea for the assembler to open a new window containing just that subroutine (file) and to locate, indicate/highlight, and
correct the erroneous statement in that window.
Case-Sensitivity
Our existing assembly language is case-insensitive. This means that you can write our keywords in lowercase
(e.g. “lda”) or uppercase (e.g. “LDA”), and that user-defined labels such as “fred”, “Fred”, and “FRED” are all treated as being identical.
However, some folks prefer to work with a language that is case-sensitive. For example, we might decide that all key
words such as instructions (LDA, STA, AND, OR, ADD, SUB, etc.) must be entered in uppercase. Meanwhile, in the case of
user-defined labels, we might decide to make the assembler treat “fred”, “Fred”, and “FRED” as being three distinct entities.
(Note that we’re not saying that this would be a particularly good or useful idea, you understand; it’s just something to think about.)
General Notes (Sharing Your Work)
| 1) |
If you do decide to create a super assembler as described here, we’re sure that other users would be very
interested in seeing it and using it. We would be very happy to make such a tool available via the DIY Calculator website (giving
full credit to you, of course). |
| | |
| 2) |
Note that the ideas presented here are just a few thoughts that have popped into our minds during the course of writing How
Computers Do Math. If you think of any other features you’d like to see in an assembler, email us as described on the
About/Contact Us page on the main DIY Calculator website and we’ll add them to the list so that they are there
should we – or someone else – decide to actually create a “super assembler”. |
| | |
| 3) |
Introduced by John Backus and Peter Naur in the late 1950s and early 1960s, Backus-Naur Form (BNF) is a technique
for recursively defining the grammar – the words, symbols, and tokens – associated with a computer language. Having such
a description greatly eases the task of creating a parser for a language. Thus, you may be interested to learn that a full
BNF description of our existing assembler language is provided in Appendix E of The Official DIY Calculator
Data Book, which is itself provided on the CD-ROM accompanying our book, How Computers Do Math.
|
Any Questions?
There are always a lot of points to ponder before embarking on a new software development quest. We’ve
had a head-start, because we’ve been pondering furiously for a long time. Thus, if you are interested in creating a
super assembler and want to bounce some ideas around, please feel free to drop us a line as described on the
About/Contact Us page on the main DIY Calculator website.
|