There is a rule-of-thumb known as the "80:20 Rule" that says that a program will
spend 80% of its time executing only 20% of the code. Thus, in this topic we discuss the idea of creating a Code Profiler utility
that will analyze a program run and report where the little scamp is spending the majority of its time. Armed with this knowledge,
we can then examine these areas of the code and – by means of a little judicious "tweaking" – significantly improve the
program's efficiency and dramatically speed things up.
View Topics
Introduction
One you’ve created a program, the first thing you are going to do is run it to make sure that the little scamp
works as planned. But let’s suppose that your program ends up crawling along … so … slowly … that … you … start … to … yawn … and … fall … asleep …
Sorry, where were we? Oh yes, your program is not running as fast as one might hope. So how are we going to address this problem?
The solution is to create some sort of Code Profiler utility that can examine the results from a program run and inform us where the
program is spending the bulk of its time. We can then put these portions of the program under the microscope to see if we can
improve their efficiency in any way.
Gathering Code Coverage Data
When we run our assembler, it uses a
*.asm (source code) file as input and generates three files as
output: a
*.lst (list) file, a
*.ram (machine code) file, and a
*.rad (raw assembly dump) file. (The formats for
each of these files are fully documented under the
File Formats
topic on the
More Tools page.) For example, consider what happens when we assemble a program called
test.asm:
As we know, we can load *.ram machine code files into the DIY Calculator’s memory by means of the Memory > Load RAM
command, at which point we can run these programs. Thus far, we’ve used the assembler’s list output only to debug our programs by
stepping through the program and visually examining these files, and we haven’t used the Raw Assembly Dump (*.rad) files at all,
but that’s about to change.
In fact, we've augmented the DIY Calculator with a “Data Gatherer” utility, which can be used to gather data
during the course of a program run. The type of data we’re talking about here is the number of times each opcode is executed and the
number of times each data location (reserved using a .BYTE command or its .2BYTE and .4BYTE cousins) is written to and/or read from.
In the case of conditional jump instructions such as a JZ (“jump if zero”), the Data Gatherer utility will record how many times
the instruction passes its test and how many times it fails.
The way in which the Data Gatherer utility works is described in the documentation accompanying this release of
the DIY Calculator on the Download page. For our purposes here, we need only
be aware of the fact that – when we instruct it to do so – the Data Gatherer can be used to read in the *.rad
(raw assembly dump) file generated by the assembler and to write out a corresponding *.pad (processed assembly dump) file
that contains the gathered data:

Once again, the formats for each of these files – including *.pad files – are fully documented under the
File Formats
topic on the More Tools page. The point is that all of the above is handled by the DIY
Calculator and its environment. What we are proposing is for someone (maybe you, maybe us) to create a Code Profiler utility. This tool
will read in the *.lst (list) file generated by the assembler, the *.pad (processed assembly dump) file generated by
the DIY Calculator’s new Data Gatherer utility, and also a clock-cycles.ini file that we’ll discuss in a moment.
Then – by means of some form of Display utility – the Code Profiler will inform us as to which portions of our program
are consuming the most (and least) processing time and resources:

Let us now turn our attention to The Official DIY Calculator Data Book that is to be found on the CD-ROM accompanying our
book How Computers Do Math. Part of this tome describes the internal architecture of the DIY Calculator’s CPU in excruciating detail.
(Actually, it’s written in a delightfully amusing, interesting, and informative manner as you will no-doubt agree.) Furthermore,
the data book also documents signal waveforms showing exactly how many clock cycles each instruction (in each addressing mode) will
require to perform its magic. Well, this is the information that is contained in the clock-cycles.ini file (you may modify
this information to reflect an alternative CPU architecture of your own devising should you so desire.)
But how are we going to employ the information in the clock-cycles.ini file and how are we going to display the code
profiling results? Ah Ha! Therein lays the question. In fact, there are a large number of
techniques we could adopt; a few of these approaches are presented in the following topics.
Colored Text Display
As usual, the simplest display technique would be for the Code Profiler utility to color the lines of text
from the
*.lst (list) file using the data from the
*.pad (processed assembly dump) file and the
clock-cycles.ini
file as shown below:
In this case, the code being displayed reflects a nested loop. Line 37 is where we load the initial value used to control the outer
loop, so this line is executed only one time, which explains why it’s colored green. By comparison, lines 38, 42, 43, and 44 are
executed every time we cycle round the outer loop (255 times), so these lines are colored orange.
Similarly, line 39 is where we load the initial value used to control the inner loop. This line is executed only one time in the
inner loop, but the inner loop is called 255 times by the outer loop, so this line is also colored orange. Finally, lines 40 and 41
are executed once every time we cycle round the inner loop (255 times), which means they are executed 255 x 255 = 65,025 times;
hence the fact that they are colored red.
Observe that this example doesn’t actually include any comment or blank lines, but if it did, what color should we use to color them?
Also, is it better to color all of the text as shown above, or would it be preferably to just color the line numbers? Alternatively,
as opposed to coloring the text itself, another technique would be to leave the text as-is and instead modify the color of the background
as shown below:
The above offers just two main possibilities; there are many more. What would be really interesting would be to create a Code
Profiler utility that allowed the user to experiment with different approaches to presenting this data, and to then interview
users to determine which technique they found to be the most efficacious.
Vertical Bar Graph Display
The colored text approach presented in the previous topic is great for fine-grained detailed (line-by-line) analysis
of the code, but it would also be useful to be able to work with a “bigger picture” view that allowed the user to quickly identify
problem areas. One way of doing this would be to use a vertical bar-graph type display:
The idea here is that each vertical line in the new display corresponds to a line of source code. Note that we’ve shown only a
few colors here for simplicity, but the display could use a sliding range of colors with different graduations in shade
representing different amounts of usage. For example, we could use 20 different shades each of every color in the rainbow.
Note that clicking the mouse in this bar-chart display should auto-scroll the text window to display the corresponding lines
of source code (see also the Advanced Navigation Capabilities topic later on this page).
(Note that the size and location of the “thumb” in the horizontal scroll bar associated with this bar-chart display
corresponds to the relative amount and location of the text being presented in the main display.)
Using Height As Well As Color
With regard to the previous topic, as usual we will have to decide how to handle comment and blank lines.
Another point to consider is that, in the example above, we employed only the color of the line to reflect the amount of activity
associated with each instruction. Instead, we might decide to also use the height of the lines as shown in the examples below:
Once again, we are using only three colors in these examples for simplicity, but the real display could use multiple shades
of each of every color in the rainbow.
We should also note that, even though each of the bands in the above illustration reflects 1/3 (33%) of the vertical height, we
could actually use these bands to reflect different amounts of activity. For example, the top band might be used to represent
only those instructions that accounted for 80% of the activity.
Furthermore, as opposed to dividing the vertical span of this display into only three bands, how about four, five, or six? Alternatively,
the height of each line could precisely reflect the relative amount of activity associated with each line as shown below:
Of course, this leads us to the question as to how we calculate relative activity in the first place, which is the
subject of the next topic.
Calculating Relative Activity
As was discussed in the
Gathering Code Coverage Data topic earlier on this page, the
*.pad
file generated by the DIY Calculator contains details as to each type of instruction in the program (the address and opcode of each
instruction) and the number of times each instruction was executed.
Let’s assume that we have a really simple program containing only three instructions. Instruction A is executed 30 times, instruction B is
executed 90 times, and instruction C is executed 180 times. Thus, one technique would be to say that the instruction that was executed
the most is considered to equate to 100% of the relative activity. As this was instruction C, which was executed 180 times, this means
that 180 instruction hits equates to 100% of the relative activity, so 1% of the relative activity equates to 1.8 hits. Thus, instruction A
equates to 30 / 1.8 = 17% of the relative activity, while instruction B equates to 90 / 1.8 = 50% of the relative activity.
So, if we assume that we are considering the vertical bar graph display as presented in the previous topic – and if we also assume that
the height (and color) of each line will precisely reflect the relative amount of activity associated with each line (instruction) – then
the line associated with instruction A would occupy 17% of the vertical span and would be of a greenish color; the line associated with
instruction B would occupy 50% of the vertical span and would be of an orange-hue; and the line associated with instruction C
would occupy 100% of the vertical span and would be bright red.
But wait, there’s more! Suppose we have two instructions called D and E, where instruction D is executed 50 times and instruction E is
executed 100 times. Which instruction accounts for the most relative activity? Well, it all depends how we define relative activity.
If we are concerned only with the number of times each instruction is executed, then instruction E is the worst offender.
However, what if we were to tell you that the CPU required 10 clock cycles to execute instruction D, but only 4 clock cycles to execute
instruction E? In this case, the line containing instruction D accounts for 50 x 10 = 500 clock cycles, while the line containing
instruction E consumes only 100 x 4 = 400 clock cycles, which means that we should perhaps focus our attention on improving the
efficiency of the code that uses instruction D.
The point is that, as was discussed in the Gathering Code Coverage Data topic earlier on this page, the number of
clock cycles used by each instruction in each addressing mode is captured in the clock-cycles.ini file, which is itself
located in the C:\DIY Calculator\Config folder on your computer. This means that the Code Profiler could calculate the amount of
relative activity by multiplying the number of hits for each instruction with the number of clock cycles required to perform
that instruction. (Remember that, in the case of a conditional jump, there are two different clock cycle values: one for when the test
passes and one for when it fails. Similarly, the *.pad file keeps track as to how many times each conditional jump passes and fails.)
All of this leads to a number of possibilities. For example, it would be interesting to be able to switch back and forth between
displaying the number of hits versus the number of clock cycles associated with each line (instruction). It would also be interesting
to be able to select a group of lines in the vertical bar graph display or the text display (or one of the other displays) and to
be informed as to the number of hits and clock-cycles associated with the selected group (it should also be possible to see
the total number of hits and clock cycles for the entire program).
There are several ways in which the selection of a sub-group of lines could be performed. For example, the user could simply hold the
left mouse button down and drag it over the selected lines in the vertical bar graph display or in the main source code window. Or
perhaps the vertical bar-graph display could include two special markers that the user could drag to the start and end of the lines
of interest. Alternatively, the user could be provided with the ability to explicitly key-in the start and end line numbers.
As another thought, if the Code Profiler allowed us to specify a clock speed (say 1 megahertz), then we could quickly
determine the amount of real time that would be consumed by performing a particular subroutine. This means that if we were
comparing two versions of a logarithmic subroutine, for example, we could use this capability to determine that subroutine
A can perform its magic on a certain set of data five times faster than subroutine B (of course speed may not be the only
measurement criteria, because subroutine B may be ten times smaller in terms of the memory it consumes).
And one final point before we continue: consider the case of a program that spends a lot of time looping around reading from
an input port and waiting for something to happen; for example, reflect on the way our calculator program waits for the user to
click buttons on the keypad. These instructions are going to be executed an inordinate number of times as compared to the
other instructions in the program, and this will skew all of the results. For example, it would be more than possible to have
several million hits on the instructions that read from a port, perform a test such as a CMPA (“compare accumulator”), and
then perform a conditional jump.
If the other instructions in the program are executed only a few hundred times each, then our calculations of relative
activity will show the bulk of the program doing very little and these three instructions doing the vast majority of the work.
But these instructions aren’t really doing anything of interest. The point is that the Code Profiler utility should provide
some mechanism for the user to exclude specified lines (or groups of lines) from its calculations of relative activity
(and maybe give these instructions a special color code in the various displays).
Bird’s Eye Skyscraper Display
This is somewhat related to the previous display. The idea here, however, is to consider things from a bird’s
eye view of a city containing skyscrapers, in which case all the bird can see is a small square corresponding to the top of
each building. Similarly, in our display, each square would correspond to one line of code:
This type of display may convey advantages when working with larger programs because it facilitates a “big picture” view that
encompasses more data. One very important consideration is how we organize the elements in this display. For example, let’s assume
for the sake of discussion that 100 elements (squares) are presented across the display in the horizontal direction and 10 elements
in the vertical direction. Assuming the top-left corner corresponds to line 1 in the source code, do we progress from left to
right until we reach line 100, and then continue from the left-hand-side of the next from with line 101; for example:
1, 2, 3, 4, 5, 6 … 97, 98, 99, 100
101, 102, 103, ... 198, 199, 200
201, 202, 203, ... etc.
Alternatively, would it be better for the display order to wend its way back and forth sort of like a snake as follows:
1, 2, 3, 4, 5, 6 … 97, 98, 99, 100
200, 199, 198, ... 103, 102, 101
201, 202, 203, ... etc.
Or maybe it would be better to present the squares (lines) in a vertical ordering ... or perhaps to present each group of 100
lines in a 10 x 10 array of squares. Hmmmm; it would be very interesting to experiment with these different possibilities to see
which was the easier and more efficient to use.
Of course it would also be interesting to allow the user to have access to both the vertical bar display and the birds-eye view
display to see which they preferred (it should be possible to turn these different displays on and off individually). And once again,
clicking the mouse in this bird's eye display should auto-scroll the text window to display the corresponding lines of source code
(see also the Advanced Navigation Capabilities topic later on this page).
Organic Lava-Lamp Display
While we’re letting our imaginations run wild and free, another form of display might look something like the
oil in a lava lamp of the type that were prevalent during the 1970s (it would be ultra-cool and add a lot of visual interest
if the various shapes maintaind their relative positions but they were all gradually "undulating"):
The idea here is that the main rectangle represents the top-level program, of which we can assume that at least the first
instruction has been executed. The various ellipses correspond to either loops in the main body of the program or to subroutines; ellipses
containing ellipses represent nested loops or nested subroutines.
Ideally, it would be nice to annotate the various ellipses with the names of their corresponding loops or subroutines. Alternatively,
the names of the loops or subroutines could appear in a small “pop-up” window as the mouse cursor was moved over the display. Clicking
on an area in this display should auto-scroll the text in the main display to the corresponding lines of source code
(see also the following topic).
This type of display could be used to give a much higher view of our code-profiling world, but it would require our Code Profiler
utility to perform an in-depth analysis of the list file to determine the extent of each of the loops and subroutines. Also, actually
generating this display would pose some interesting challenges.
Advanced Navigation Capabilities
As was noted in the discussions associated with all of the displays discussed above (the vertical bar
graph, the bird's eye view, and the organic lava-lamp display), clicking on an area in the display should cause the main
text window to auto-scroll to display the corresponding lines of source code.
However, we might be able offer something a little more sophisticated. For example, one technique we could use would be
that if we left-mouse-click on a subroutine in the display, then the main text window auto-scrolls to display the
corresponding lines of source code; but if we right-mouse-click in the display, then we are presented with a pop-up
list of options, including jumping to the subroutine itself or jumping to the point in the program from whence the
routine was called (if there are several such calling points, they would be listed in order).
Another alternative would be that if we held the <Ctrl> key down while left-mouse-clicking in the display, then the
main window would auto-scroll to the point from whence the subroutine was called; but if we click without holding the <Ctrl> key,
the main window would auto-scroll to the subroutine itself. There are of course many more techniques one might explore, so
put your thinking cap on and start pondering furiously.
General Notes (Sharing Your Work)
| 1) |
If you do decide to create a Code Profiler utility as described here, we’re sure that other users would be very
interested in seeing it and using it. We would be very happy to make such a tool available via the DIY Calculator website
(giving full credit to you, of course).
|
| | |
| 2) |
The Code Profiler utility discussed here is very similar in concept (and in the display mechanisms it uses)
to the Code Coverage
utility discussed elsewhere on the More Tools page. Thus, it might be a good idea
to combine these two tools and to provide the ability to switch back and forth between the different views of the data.
|
| | |
| 3) |
Note that the ideas presented here are just a few thoughts that have popped into our minds during the course
of writing How Computers Do Math. If you think of any other considerations we should note regarding our proposed
Code Profiler utility, email us as described on the About/Contact Us page on
the main DIY Calculator website and we will add them to the list so that they are there should we (or someone else) decide
to actually create such a tool.
|
Questions?
There are always a lot of points to ponder before embarking on a new software development quest. We’ve had a head-start,
because we’ve been pondering furiously for a long time. Thus, if you are interested in creating a Code Profiler utility and want to bounce
some ideas around, please feel free to drop us a line as described on the
About/Contact Us
page on the main DIY Calculator website.