These days, we are so familiar with the wonders of technology that we tend to take them for granted. For example, how often do you look at the large crisp display on your computer screen and reflect on the tortuous path it took for scientists and engineers to create this marvel?

View Topics

Victorian Fax Machines?
The whole basis of the DIY Calculator accompanying our book How Computers Do Math (ISBN: 0471732788) is that it’s a virtual machine that exists only in your physical computer’s memory; and the only way to see the DIY Calculator is on your real computer's screen, or monitor. Displaying information on a screen is an incredibly efficient way for a computer to communicate with us. So where did computer screens come from? Well, as is often the case, engineers employed an existing technology that was developed for an entirely different purpose ... television.

Television, whose name comes from the Greek tele, meaning "distant," and the Latin vision, meaning "seeing" or "sight," has arguably become one of the wonders of the 20th Century, so you may be surprised to learn that television's origins are firmly rooted in the Victorian era. In fact one of the earliest examples of an image being captured, transmitted, and reproduced by electromechanical means occurred in 1842, only five years after Queen Victoria had ascended to the throne, when a Scotsman – Alexander Bain – came up with a rather ingenious idea.

Bain created an image to be transmitted by snipping it out of a thin sheet of tin, placing this representation on a moveable base, and connecting it to one side of a battery. He then created a pendulum using a conducting metal wire and a weight ending in a sharp point, and he set this device swinging above the base. The base was slowly moved under the pendulum, where the swinging weight made periodic contact with the metal image, thereby completing the electrical circuit and converting the dark and light areas of the image (represented by the presence and absence of tin) into an electrical signal.

Bain then used this signal to control a relay, which was moving back and forth in time with the pendulum. When activated, the relay pushed a pencil down onto a piece of paper mounted on a second base moving at the same rate as the first, thereby reproducing the image as a pencil drawing.

Obviously, Bain's device had little application with regard to the transmission of moving pictures, but it certainly wasn't a wasted effort, because he had essentially created the precursor to the modern Fax machine.

How to Display Moving Pictures on a Toaster
In 1878, Denis Redmond of Dublin, Ireland, penned a letter to the English Mechanic and World of Science publication. In his letter, Redmond described creating an array of selenium photocells, each of which was connected via a voltage source to a corresponding platinum wire. As the intensity of light on a particular photocell increased, it conducted more current, thereby causing its associated platinum wire to glow more brightly. Redmond's original device contained only around 10 × 10 elements, and therefore was very limited as to what it could represent. Having said this, it could apparently reproduce moving silhouettes, which was pretty amazing for the time.

In fact Redmond's photocell-array concept was not far removed from today's semiconductor diode-array cameras, while his array of glowing platinum wires is loosely comparably to the way in which images are constructed on today's Liquid Crystal Displays.

Also, had Redmond continued to increase the size of his platinum-wire array to contain say 1,000 × 1,000 elements, then it would have had the added advantage of being able to double-up as a toaster! Sadly, the large size of Redmond's photocells drastically limited the quality of the images he could display, and the ability to reproduce his efforts using semiconductors and related technologies lay some 100 years in his future, so the inventors of yesteryear were obliged to search for another approach ...

Nipkow Disks
In 1884, the German inventor Paul (Julius) Gottlieb Nipkow proposed a novel technique for capturing, transmitting, and reproducing pictures based on flat circular disks containing holes punched in a spiral formation:

Nipkow's idea was both elegant and simple. A strong light source was used to project a photographic image onto the surface of the Nipkow Disk, which was spinning around. As the outermost hole on the disk passed through the image, the light from the light source passed through the hole to hit a light-sensitive cell, such as a silver-caesium phototube. The intensity of the light was modified by the light and dark areas in the image as the hole traveled past, thereby modulating the electrical signal generated by the phototube. The holes were arranged such that as soon as the outermost hole had exited the image, the next hole began its trek. Since the holes were arranged in a spiral formation, each hole traversed a different slice, or line, across the image.

At the other end of the process was a brilliant lamp and a second spinning Nipkow Disk. The electrical signal coming out of the phototube was used to modulate the lamp, which was projected onto the second disk. The modulated light passed through the holes in the second disk to construct a line-by-line display on a screen. Although the resulting image was constructed as a series of lines, the speed of the disk combined with persistence of vision meant that an observer saw a reasonable (albeit low-resolution) facsimile of the original picture. (The concept of persistence of vision is discussed in a little more detail later in this paper).

Leaping ahead to the year 1895, the Italian electrical engineer and inventor Guglielmo Marconi extended the earlier research of such notables as the British physicist James Clerk Maxwell and the German physicist Heinrich Hertz by inventing the forerunner of radio as we know it today. In the early part of the twentieth century, engineers constructed experimental systems that could transmit images using a combination of Nipkow's disks and radio signals. The electrical signal coming out of the phototube was merged with a synchronization pulse (which indicated the start of a rotation), and this combined signal was then used to modulate the carrier wave from a radio transmitter.

At the receiving end of the system was a radio receiver and a second spinning Nipkow Disk. The receiver first separated the synchronization pulse from the video signal, and the synchronization pulse was then used to ensure that the receiver disk was synchronized to the transmitter disk. Meanwhile, the amplified video signal was once again used to modulate a brilliant lamp, which was projected through holes in the receiver disk to construct a line-by-line display on a screen.

Cathode Ray Tubes (CRTs)
Modern television systems are based on a device called a cathode ray tube (CRT). Primitive cathode ray tubes had been around since 1854, when a German glass blower named Heinrich Geissler invented a powerful vacuum pump. Geissler then proceeded to use his pump to evacuate a glass tube containing electrodes to a previously unattainable vacuum. Using these Geissler Tubes, experimenters discovered a form of radiation which they called cathode rays (and which we now know to consist of electrons).

The idea of using a cathode ray tube to display television images was proposed as early as 1905, but practical television didn't really become a possibility until 1906, when the American inventor Lee de Forest invented a vacuum tube called a triode, which could be used to amplify electronic signals. Even so, progress was hard fought for, and it wasn't until the latter half of the 1920s that the first rudimentary television systems based on cathode ray tubes became operational in the laboratory.

The principles behind the cathode ray tube are quite simple (although actually building one is less than trivial). The tube itself is formed from glass, from which the air is evacuated to leave a strong vacuum:

In the rear of the tube is a device called an electron gun, which generates electrons. A positively charged grid mounted a little way in front of the electron gun focuses the electrons into a beam and accelerates them towards the screen. Thus, the name "cathode ray tube" is derived from the electron gun (which forms the negative terminal, or cathode), the electron beam (or ray), and the glass enclosure (or tube).

The inside face of the screen is lined with a layer of material called a phosphor, which has the ability to fluoresce. Hmmm, this is going to take a moment to explain. Phosphors are distinguished by the fact that when they absorb energy from some source such as an electron beam, they release a portion of this energy in the form of light. Depending on the material being used, the time it takes to release the energy can be short (less than one-hundred-thousandth of a second) or long (several hours). The effect from a short-duration phosphor is known as fluorescence, while the effect from a long-duration phosphor is referred to as phosphorescence. Televisions use short-duration phosphors, and their screens' linings are therefore known as the fluorescent layer.

The end result is that the spot where the electron beam hits the screen will glow. By varying the intensity of the electron beam, it's possible to make the spot glow brightly or hardly at all. Now, this would not be particularly useful on its own (there's only so much you can do with an individual spot); but, of course, there's more.

Note the two plates referred to as vertical deflection plates in the above illustration. If an electrical potential is applied across these two plates, the resulting electric field will deflect the electron beam. If the upper plate is more positive than the lower, it will attract the negatively charged electrons forming the beam and the spot will move up the screen. Conversely, if the lower plate is the more positive, the spot will move down the screen. Similarly, two more plates mounted on either side of the tube can be used to move the spot to the left or the right of the screen (these horizontal deflection plates are not shown in the above illustration so as to keep things simple).

By combining the effects of the vertical and horizontal deflection plates, we are able to guide the spot to any point on the screen. There are several ways in which we can manipulate our spot to create pictures on the screen, but by far the most common in modern CRT-based displays is the raster scan technique as illustrated below (see also the note on Vector Displays toward the end of this topic):

Using this technique, the electron beam commences in the upper-left corner of the screen and is guided across the screen to the right (the top-most blue line). The path the beam follows as it crosses the screen is referred to as a line. When the beam reaches the right-hand side of the screen it undergoes a process known as horizontal flyback, in which its intensity is reduced and it is caused to "fly back" across the screen (the top-most mauve line). While the beam is flying back, it is also pulled a little way down the screen. (This description is something of a simplification, but it will serve our purposes here.)

The beam is now used to form a second line, then a third, and so on until it reaches the bottom of the screen. The number of lines affects the resolution of the resulting picture; that is, the amount of detail that can be displayed.

As An Aside: Standard (pre-high-definition) British television is based on 625 lines [the British system was originally 405 lines (from 1936), but this was phased out in the early 1970s in favor of the 625 line format]. By comparison, standard (pre-high-definition) American television is based on 525 lines. In fact, with regard to pre-high-definition television systems, there are three main formats used in the world (there are also a number of derivations of these formats along with a few totally weird formats):

 PAL This stands for Phase Alternation Lines or Phase Alternating Line (PAL is jocularly said to stand for Pictures At Last or Pay for Added Luxury). The majority of countries with a 50 Hz mains power supply use the PAL broadcast/video standard. NTSC This stands for the National Television Standard Committee, which established the original American TV broadcast standard in 1953 (NTSC is often – and unfairly – said to stand for Never Twice the Same Color). The majority of countries with a 60 Hz mains power supply use the NTSC broadcast/video standard. SECAM This stands for SEquential Couleur Avec Memoire, which is French for "Sequential Color with Memory" (SECAM is also said to stand for System Essentially Contrary to the American Method). This was designed by the French primarily for political reasons, including protecting their manufacturing industries. This format is also commonly used in Eastern Block countries so as to be incompatible with the majority of Western transmissions.

When the beam reaches the bottom right-hand corner of the screen, it undergoes vertical flyback in which its intensity is reduced, it "flies back" up the screen to return to its original position in the upper left-hand corner (diagonal black line), and the whole process starts again. Thus, in a similar manner to Nipkow's technique, we can create pictures by varying the intensity of the beam as it scans across the screen. For example, consider how we'd construct the image of a simple triangle:

Note that this small group of lines represent a tiny area located somewhere in the middle of a much larger screen. In the real world, the lines forming the picture would be very close together, so this would actually be a very small triangle, but it serves to illustrate the concept. When they are first introduced to this technique for creating pictures, many people wonder why they can't see the lines being drawn and why the image doesn't appear to flicker. The answer has three parts:

 1) The electrons forming the electron beam travel at a tremendous speed, and the beam itself can be manipulated very quickly. The beam used in a television set can scan the entire picture in a fraction of a second, and the entire picture is actually redrawn approximately thirty times a second (for American/NTSC format televisions) or 25 times a second (for European/PAL systems). 2) The phosphor lining the inside of the screen is carefully chosen to fluoresce for exactly the correct amount of time, such that any particular point has only just stopped fluorescing by the time the electron beam returns to that point on its next scan. 3) The combination of our eyes and nervous system exhibit persistence of vision, which means we continue to see an image for a fraction of a second. For example, if you look at a bright light for a short time and then turn your head, an after-image of the light persists for a while.

All of these effects combine to form a seemingly substantial picture. However, if you ever look at a television program where the scene contains a television set, you’ll often see bands of light and dark areas moving up or down that television's screen. This is because the camera taking the picture and the television in the picture are not synchronized together, resulting in a kind of stroboscopic effect (much like wagon wheels appearing to rotate backwards in old cowboy films). Thus, if producers of television programs wish to include a television in the scene, the engineers have to ensure that the systems are synchronized to each other.

As Another Aside: This topic has focused on Raster Displays, because this is the most common technique in use for today's computer displays. As we've discussed, this form of display means that we take the image we wish to display, we convert that image (the official term is to render the image) into a bitmap, and we then scan the electron beam across the display row-by-row turning it on and off for each pixel on the screen.

One reason we use this technique for modern computer displays is that the bitmap images discussed above have to be stored in memory (either in the computer's main memory or in dedicated memory located on a special graphics card/subsystem) and memory is cheap these days, but this wasn't always the case. In the 1960s and 1970s computer memory was very expensive, so the standard form of computer display was known as a Vector Display (these little scamps were also known as Calligraphic Displays or Stroker Displays).

The idea here is that the computer is used to wield the electron beam like a pen, controlling it's location in the horizontal and vertical axes so as to draw lines and curves directly onto the screen (this explains the "vector" moniker, because lines are often referred to as "vectors" by engineers and scientists). In addition to the fact that they required relatively little memory, a big advantage associated with vector displays is that lines and curves drawn on the screen using this technique look much "sharper" and "cleaner" that their rasterized equivalents.

The reason vector graphics can be more visually appealing is easy to understand if you imagine a diagonal line drawn from the bottom-left-hand-corner of the screen to the top-right-hand-corner. This will be reproduced as a perfectly smooth line on a vector display, but it will end up as a somewhat jagged "staircase" in a bitmap/vector display (we can use "anti-aliasing" techniques to make the raster representation look smooth, but these techniques are beyond the scope of our discussions here).

As one final point of interest the reason we use the phrase "television set" derives from the early days of radio. The first radio systems for home use essentially consisted of three stages: the receiver to detect and pre-amplify the signal, the demodulator to extract the audio portion of the signal, and the main amplifier to drive the speaker. Each of these stages were packaged in individual cabinets, which had to be connected together; hence the user had to purchase all three units which formed a "wireless set." The term "set" persisted even after all of the radio's components were packaged in a single cabinet, and was subsequently applied to televisions when they eventually arrived on the scene.

A.A. Campbell-Swinton and John Logie Baird
There are two primary requirements for a functional television system: a technique for capturing images and a way to display them. Following Nipkow's experiments, other inventors tried to move things forward with limited success. The history books mention several names in this regard, but one who seems to have "slipped through the cracks" and rarely gets a mention in texts on the origin of television was the Scottish electrical engineer Alan Archibald (A.A.) Campbell-Swinton (1863-1930).

In 1908, Campbell-Swinton wrote a letter to the magazine Nature in which he described an electronic technique for implementing a television system. Three years later, in 1911, he expanded on his original proposal and described a complete system using a special cathode ray tube to capture images and another to display them. Campbell-Swinton's idea for the "camera" cathode ray tube was to use a lens to capture an image and project it onto the flat end of the tube. Meanwhile, inside the tube, the glass on the flat end would be covered by a sandwich of photoelectric material, an insulating layer, and a layer of conducting metal.

Photons of light associated with the image projected onto the tube would result in areas of positive electrical charge in the photoelectric material – lighter areas would have more charge; darker areas would have less charge – the insulating layer would stop the charge leaking away.

By scanning the electron beam row-by-row in a raster pattern (see the previous topic) across the metal layer, it would be possible to "read" the areas of charge. By this means, the image could be converted into an electrical signal that could be sent to a "display" cathode ray tube where it would be reconstructed and presented to observers as discussed in the previous topic.

Campbell-Swinton was a man ahead of his time. At a high-level his scheme was near-perfect, but his plans omitted many of the fine details that would be required to make the system actually work. In fact, it took another 20 years before a fully electronic television system was realized as discussed in the next topic.

Another key player in the annals of television history was John Logie Baird, a Scotsman who used a derivation of Nipkow's disks for capturing and displaying pictures during the latter half of the 1920s and the early 1930s.

The British Broadcasting Corporation (BBC) allowed Baird to transmit his pictures on their unused radio channels in the evening. By 1934, even though he could only transmit simple pictures with a maximum resolution of around 50 lines, Baird had sold thousands of his Televisor receivers around Europe in the form of do-it-yourself kits. Meanwhile, on the other side of the Atlantic, the Radio Corporation of America (RCA) experimented with a system consisting of a mechanical-disk camera combined with a cathode ray tube display device. Using this system, RCA transmitted a picture of a model of Felix the Cat endlessly rotating on the turntable of a record player in the early 1930s.

Philo Farnsworth (a Man Lost in History) and Vladimir Zworkin
Strange as it may seem, relatively few reference sources seem to be aware of the real genius behind television as we know it today – a farmboy named Philo T. Farnsworth from Rigby, Idaho. In 1922, at the age of 14, with virtually no knowledge of electronics, Philo conceived the idea for a fully electronic television system. Flushed with enthusiasm, he sketched his idea on a blackboard for his high school science teacher, a circumstance that was to prove exceedingly fortuitous in the future as we shall see.

Over the years, Philo solved the problems that had thwarted other contenders. He invented a device he called an Image Dissector, which was the forerunner to modern television cameras, and he also designed the circuitry to implement horizontal and vertical flyback blanking signals on his cathode ray tube, which solved the problems of ghosting images. By the early 1930s, Philo could transmit moving pictures with resolutions of several hundred lines, and all subsequent televisions are directly descended from his original designs.

The reason Philo has been lost to history is almost certainly attributable to RCA. The corporation first attempted to persuade Philo to sell them his television patents, but he informed them in no uncertain terms that he wasn't interested. In 1934, RCA adopted another strategy by claiming that the Russian émigré Vladimir Zworkin, who was working for them at that time, had actually invented everything to do with televisions back in 1923.

Eventually the case went to the patent tribunal, at which time Zworkin's claim was shown to leak like a sieve. The final nail in the coffin came when Philo's old science teacher reconstructed the sketch Philo had drawn on the blackboard back in 1922. This picture was recognizably that of the television system that Philo had subsequently developed, and the tribunal had no hesitation in awarding him the verdict.

Unfortunately, by this time the waters had been muddied to the extent that Philo never received the recognition he deserved. Almost every standard reference continues to cite Zworkin (and his camera called an Iconoscope) as initiating modern television. In fact it was only towards the end of the 1970s that Philo's achievements began to be truly appreciated, and, although it's still rare to find references to him, Philo's name is gradually coming to the fore. As video historian Paul Schatzkin told the authors of this paper:

 "Many engineers and scientists contributed to the emergence of the television medium, but a careful examination of the record shows that no one really had a clue until Philo Farnsworth set up shop in San Francisco at the age of 20 and said: We'll do it this way!"

Video Tubes
The tubes used in television sets and computer monitors (which we might call video tubes) are very similar to cathode ray tubes with some additional refinements. First, in place of the deflection plates discussed above, video tubes tend to use electromagnetic coils, but the end result is much the same so we don't really need to go into that here. More importantly, video tubes have a second grid called the shadow mask, which is mounted a fraction of an inch from the screen's fluorescent coating:

Like the grid, the shadow mask is positively charged, so it helps accelerate the electrons forming the electron beam, thereby giving them more energy which results in a brighter picture. More importantly, the shadow mask helps to focus the beam, because any electrons that deviate even slightly from the required path hit the mask and are conducted away, thereby producing a sharper image. (The shadow mask also has a role with regard to protecting the image from the effects of magnetic fields such as the Earth's, but this is beyond the scope of this paper.) Note that the illustration above is greatly magnified and not to scale; in reality the shadow mask is only slightly thicker than aluminum foil and the holes are barely larger than pin-pricks.

If you approach your television at home and get really close to the screen, you'll see that the picture is formed from individual dots (much like our "Christmas tree" in the above illustration). A "black-and-white" television contains only one electron gun, and the phosphor lining its screen is chosen to fluoresce with white light. In this case, each dot on the screen corresponds to a hole in the shadow mask, and each dot may be referred to as a picture element, or pixel for short.

By comparison, in the case of a color television, you'll see that the picture is composed of groups of three dots, where each dot corresponds to one of the primary colors: red, green, and blue. Each of these dots has its own hole in the shadow mask, and each dot is formed from a different phosphor, which is chosen to fluoresce with that color. In this case each group of three dots would equate to a single pixel:

A color television also contains three electron guns, one to stimulate the red dots, one for the green, and one for the blue. The three electron beams scan across the screen together, but the intensity of each beam can be varied independently. Thus, by making only one of the beams active we can select which color in a group will be stimulated (we can also specify how brightly the dot should glow by varying the strength of that electron beam).

Now this is the clever bit. We might decide to make two of the electron beams active and stimulate two of the dots in the group at the same time: red-green, red-blue, or green-blue. Alternatively, we might decide to make all three of the beams active and stimulate all three of the dots. The point is that – as discussed in the following topic – we can form different colors by using various combinations and intensities of these three dots.

Color Vision: One of Nature’s Wonders
As fate would have it, this topic grew in the telling to the extent that it became a full paper in its own right. In this Color Vision paper you will discover all sorts of interesting information on the visible spectrum, the discovery of infrared and ultraviolet light, the way in which color vision works, and the evolution of our visual systems.

For the purposes of this paper we need only note that what we refer to as "light" is simply the narrow portion of the electromagnetic spectrum that our eyes can see (detect and process), ranging from violet at one end to red at the other, and passing through blue, green, yellow, and orange on the way (at one time, indigo was recognized as a distinct spectral color, but this is typically no longer the case.):

The point is that white light is a mixture of all of the colors in the visible spectrum. Furthermore, by mixing different quantities of red, green, and blue light, we can trick our eyes into seeing just about any color. Thus, in the context of our color television, if all three of the electron beams are active when they pass a particular group of dots, the individual dots will fluoresce red, green, and blue, but from a distance we'll perceive the group as a whole as being white. (If we looked really closely we'd still see each dot as having its own individual color.) Similarly, if we stimulate just the red and green dots we'll see yellow; combining the green and blue dots will give us cyan (a green-ish, light-ish blue); while mixing the red and blue dots will result in magenta. (The color magenta, which is a sort of purple, was named after the dye with the same moniker; in turn, this dye was named after the battle of Magenta, which occurred in Italy in 1859, the year in which the dye was discovered.)

Visual Display Units (VDUs)
Today's computer monitors have much smaller pixels and many more pixels-per-inch than a television set, which means that they have a higher resolution and can display images with greater precision. (Note that the concept of pixels is somewhat slippery when we come to computers – we'll discuss this in more detail in the What is a Pixel? topic later in this paper).

Modern computers typically contain a circuit board called a graphics card or a graphics subsystem, which contains a substantial amount of memory and its own on-board processor(s). In this case, all your main system has to do is to send an instruction to the processor on the graphics card saying something like "Draw me a purple circle with 'this' diameter at 'that' location on the screen." The graphics processor then slogs away completing the task, while the system's main processor is left free to work on something else.

The combination of high-resolution monitors and graphics cards endow today's computer screens with millions of pixels, each of which can be individually set to thousands or even millions of colors. We require this level of sophistication because we often wish to display large amounts of graphical data, such as three-dimensional animations. This wasn't possible until recently, because it requires a huge amount of computing power and memory. In fact it's only because today's computers are incredibly fast and memory is relatively inexpensive that we are in a position to display images at this level of sophistication.

By comparison, computers in the early 1960s were relatively slow, memory was extremely expensive, and there weren't any dedicated computer monitors as we know them today. On the bright side, very few people had access to computers, and those who did were generally only interested in being able to see textual data, which was typically printed out on a Teleprinter or some comparable device. At some stage, however, it struck someone that it would be useful to be able to view and manipulate their data on something similar to a television screen, and thus the first visual display unit (VDU) (sometimes known as a video display unit (VDU)) was born.

By 1977, a few lucky souls were the proud possessors of rudimentary home computers equipped with simple VDUs. A reasonably typical system at that time would probably have resembled the one shown below:

Even though these computers were slow and had hardly any memory by today's standards, their owners were immensely proud of them and justly so, because most of these devices were hand-built from kits or from the ground up. Similarly, although their rudimentary VDUs could display only a few rows of "black-and-white" text, anyone who was fortunate enough to own one was deliriously happy to actually see words appearing on their screen.

The professional VDUs (which we shall refer to as monitors henceforth) typically offered between 20 and 24 rows, each containing 80 columns (characters). Why 80? Because there wasn’t much point in displaying fewer characters than could fit on IBM Punched Cards, which were used to store a large proportion of the world’s computer data at that time. Similarly, there didn’t seem to be much point in being able to display more characters than were on these cards.

By comparison, monitors for home use were less expensive, less sophisticated, and generally only capable of displaying around 16 rows of 32 characters. A major consideration for the designers of these early monitors was the amount of memory they required. The reason for this was because – unlike a television, which receives its pictures from afar – a computer needs somewhere to store everything that it's displaying. From our discussions on video tubes earlier in this paper, you may recall that images are created as a series of lines containing "dots," and this is the way in which a monitor displays characters. For example, consider how a monitor could be used to display the letter 'E':

One configuration that was common in early monitors was to use a matrix of 9 × 7 dots per character as illustrated here (another common style was based on a 7 × 5 matrix, but the resulting characters were somewhat "clunky" and difficult to read). So, ignoring any extra dots used to provide spaces between rows and columns, each character would require 9 × 7 = 63 dots, and a display containing 16 rows of 32 characters would therefore use a total of 32,256 dots. This means that if each dot were represented by its own bit in the computer's memory (where a logic 0 could be used to indicate the dot is off and a logic 1 to indicate the dot is on), then remembering that there are 8 bits in a byte, the system would require 4,032 bytes of RAM just to control the monitor.

Although this doesn't seem like a tremendous amount of memory today, in 1977 you considered yourself fortunate indeed to have 2 kilobytes (2,048 bytes) of RAM, and you were the envy of your street if you had as much as 4 kilobytes (4,096 bytes). Obviously this was a bit of a conundrum, because even if you were the proud possessor of a system with a 4 kilobyte RAM, there wouldn't have been much point in running a monitor that required 4,032 bytes, because you would only have 64 bytes left in which to store your programs and data!

In those days of yore, programmers prided themselves in writing efficient code that occupied as little memory as possible, but this would have been pushing things above and beyond the call of duty. A measly 64 bytes certainly wouldn't have been enough to have written a program that could do anything particularly useful with the monitor, which would defeat the purpose of having a monitor in the first place. What was needed was a cunning ploy, and electronics engineers are nothing if not ingenious when it comes to cunning ploys (in fact, the term engineer is derived from the Latin ingeniator, meaning "a creator of ingenious devices"). The solution to the problem was a concept known as a memory-mapped display, which we shall consider in excruciating detail in the next topic.

But before we move on, as a point of interest, when you purchase say a 32-inch television set, this distance is measured as a diagonal from the upper-left-hand corner of the screen to its lower-right=hand corner. The same thing applies to computer screens. Traditional television sets and computer screens have an aspect ratio of 4:3, which means they are wider (4 parts) than they are tall (three parts). For example, consider a small screen that is five inches across its diagonal; in this case, the width of the screen would be four inches and its height would be three inches.

The reason we mention this is that the computer screen is wider than it is tall, but a piece of paper is taller than it is wide. Due to the fact that computers are often used for word-processing applications, a number of computer manufacturers have released taller, thinner screens with an aspect ratio of 3:4. However, the trick here is that the manufacturers didn't want to go to the expense of creating a completely new device. Instead, they simply rotated an existing screen by 90 degrees and put it in a different cabinet. Of course, this meant the upper-left-hand corner of the screen used to be the upper-right-hand corner and so forth; also that the raster scan would now be progressing from left to right instead of from top to bottom; so the graphics subsystem would have to correct for all of this, but that really wasn't much of a problem at all.

Memory-Mapped Displays
Purely for the sake of discussion, let's assume that we have memory-mapped display (we'll explain the "memory mapped" moniker shortly) that supports an array of 16 rows by 32 columns (characters). Also, since we are dealing with a hypothetical display, let's assume that each of our characters is going to be formed from a matrix of 15 dots by 10 dots, which will give us much nicer-looking images (why suffer if you don't have to?):

When you magnify one of our imaginary characters (the letter 'B' in this example), it may seem that we've carelessly wasted a lot of our dots. But remember that we require some way to form spaces between the rows and columns. Also, some of the lowercase characters such as 'q', 'y', and 'p' have "tails" that extend downward, and we have to reserve some space to accommodate these as well.

Be this as it may, the problem remains that we’ve got 16 × 32 = 512 characters, each of which requires 15 × 10 = 150 dots. This would require 9,600 bytes if we used one bit to store each dot, yet we want to use the smallest amount of our computer's memory as possible. The solution to our problem lies in that fact that the patterns of dots for each character are pre-defined and relatively immutable (at least they were in the early days). To put this another way, if we wish to display several 'B' characters at different positions on the screen, then we know that each of them will use exactly the same pattern of dots. This means that we can divide our problem into three distinct parts:

 a) We need some way to remember which characters are being displayed at each location on the screen; for example, "The character in column 6 of row 3 is a letter 'B'." Due to the fact that we want to be able to change the characters displayed at each location, this information will have to be stored in our RAM. b) We need some way to store a single master pattern of dots for each character we wish to be able to display (for example, 'A', 'B', ... 'Z', and so on). Assuming that we don't wish to change the way our characters look, then these master patterns can be stored in some flavor of read-only memory (ROM) device. c) We need some mechanism to combine the information from points (a) and (b). That is, if the system knows that the character in column 6 of row 3 should be a letter 'B', then it requires the ability to access the master pattern of dots associated with this letter and display them on the screen at the appropriate location.

The first thing we have to decide is which types of characters we wish to use. Obviously we'll want to be able to display the uppercase letters 'A' through 'Z', and it's not beyond the bounds of possibility that we’d like to use their lowercase counterparts 'a' through 'z'. Similarly, we’d probably appreciate the numbers '0' through '9', along with punctuation characters such as commas and semi-colons, and perhaps a few special symbols such as '\$', '&', and '#'.

Just a moment, doesn't all of this seem strangely familiar? Are you experiencing a feeling of déja vu? (Didn't somebody just say that?) Well you can flay us with wet noodles if this isn't beginning to sound like the specification for the ASCII Code that we introduced in this document's companion paper (we love it when a plan comes together).

As you may recall, ASCII is a 7-bit code. As we tend to store our information in 8-bit bytes, this means that we've got a spare bit in each byte to play with (but you can bet your little cotton socks that we’ll find a use for these bits in the not-so-distant future). The main point is that if we use ASCII codes to indicate the characters that we wish to display at each location on the screen, then we'll only need to reserve 512 bytes (16 x 32 characters) of our RAM for the entire screen, which is pretty efficient usage of our limited resources when you come to think about it.

For example, let's assume that the memory location at address \$nnnn is associated with the character in the upper left-hand corner of the screen (row 0, column 0); that the memory location at address \$nnnn+1 is associated with the character at row 0, column 1; that address \$nnnn+2 is associated with the character at row 0, column 2; and so forth (note that the dollar '\$' characters associated with addresses such as "\$nnnn" indicate hexadecimal values). Furthermore, let's assume that locations \$nnnn through \$nnnn+4 contain the ASCII codes \$42, \$45, \$42, \$4F, and \$50 as shown in the illustration below (where these codes correspond to the characters 'B', 'E', 'B', 'O', and 'P', respectively):

Apropos of nothing at all, the jazz style known as Bebop became highly popular in the decade following World War II. Charlie Parker, Dizzy Gillespie and Thelonius Monk were especially associated with this form of music, which is known for its fast tempos and agitated rhythms. One might wonder how many of the early computer scientists were listening to the radio and clicking their fingers in time with a Bebop melody while pondering a particularly perplexing problem. And we may only speculate if it was but a coincidence that many of the most significant ideas and discoveries in the history of computing occurred alongside the flourishing Bebop. But we digress...

Henceforth, we'll refer to the group of 512 memory locations associated with the memory-mapped display as the Video RAM. In reality, any contiguous set of 512 bytes would serve our purpose; however, due to the fact that we typically place our programs in the lower-order memory locations, it would be common practice to locate the Video RAM somewhere in the higher regions of the computer's memory map (the concept of memory maps is introduced in our book, How Computers Do Math).

It's important to note that the computer – in the form of its central processing unit (CPU) – doesn't know anything about any of this; much like a married man it just does what it's told. So if a program instructs the CPU to load a value of \$42 into memory location \$nnnn, it will happily do so without any understanding that, in this case, we're treating \$42 as the ASCII code for the letter 'B', and that we're hoping the corresponding pattern of dots will somehow wend their way to the correct position on the screen.

One of the clever things about all of this is that the video card (which is connected to the main circuit board by a cable as shown in our discussions on VDUs earlier in this paper) performs most of the work. We can imagine the little scamp as slipping in to read locations in the Video RAM while the main computer's back is turned, and then displaying the appropriate characters on the screen.

In the case of our hypothetical video card, we can consider it as being "hard-wired" to understand that our memory-mapped display supports 16 rows of 32 columns, and also that address \$nnnn is the start address of the Video RAM. Thus, the video card knows that whichever ASCII character occupies address \$nnnn is supposed to appear at row 0, column 0 on the screen; that the character at address \$nnnn+1 is supposed to appear at row 0, column 1; and so on. Similarly, because the video card knows how many rows and columns our display supports, it understands that the character at address \$nnnn+31 is supposed to appear at row 0, column 31; while the character at address \$nnnn+32 is supposed to appear at the beginning of the next line at row 1, column 0; and so forth.

The fact that specific memory locations are mapped to particular character positions on the screen is, of course, why this technique is referred to as "memory-mapped." As a point of interest, the first memory-mapped display was developed around 1976 by Lee Felsenstein. In addition to designing the Pennywhistle Modem and the Osborne One (one of the earliest successful microcomputers), Lee also found the time to moderate the famous Homebrew Computer Club and to act as the system administrator of the pioneering Community Memory wide-area network (WAN). (Lee was also kind enough to write the foreword to one of our earlier books entitled Bebop BYTES Back (An Unconventional Guide to Computers).)

But, once again, we digress... The final problem is to take the ASCII codes stored in the Video RAM, and to use them to generate the patterns of dots that are displayed on the screen. In order to do this, the video card uses a device called a Character ROM. Remember that read-only memory (ROM) is a form of memory containing hard-coded patterns of 0s and 1s; also that it remembers its contents, even when power is removed from the system. In the case of the video card's Character ROM, these 0s and 1s are used to represent the absence or presence of dots on the screen, respectively. Let's assume that the video card is commencing a new pass to refresh the screen commencing at row 0, column 0 as shown below:

In the case of our hypothetical display, each character is formed from fifteen lines (rows), each containing 10 columns of dots. One thing we have to remember is that the VDU/CRT's electron beam has to scan all the way across the screen to form each line on the screen. As we see in the above illustration, our Character ROM has twelve input signals; eight of these inputs (char[7:0]) are used to present an ASCII code to indicate which character we're interested in, while the other four (line[3:0]) are used to indicate a particular line (row) in that character.

Thus, the video card commences by peeking into location \$nnnn of the Video RAM to see what's there (for the purposes of this example we're assuming that it's going to find the ASCII code \$42, which corresponds to the letter 'B'). The video card passes this ASCII code to the character ROM's char[7:0] inputs, and it also sets the line[3:0] inputs to binary 0000 (thereby indicating that it's interested in line 0 of this character).

Using this data, the character ROM's ten outputs, dot[9:0], return the pattern of 0s and 1s that correspond to the first line of the character 'B' (binary 1111111000 in this case). This pattern is then loaded into a shift register, which converts it into a sequence of pulses that are used to control the electron beam (a 1 turns the beam on to form a dot and a 0 turns it off to leave a space).

However, the video card can't complete the rest of this character yet, because the electron beam is continuing its scan across the screen. So the video card peeks into location \$nnnn+1 in the Video RAM to see what's there and finds the ASCII code \$45, which corresponds to the letter 'E'. The video card passes this new ASCII code to the character ROM’s char[7:0] inputs while maintaining the binary 0000 value on the line[3:0] inputs (thereby indicating that it's still interested in line 0 of the new character). Once again, the character ROM responds with the pattern of dots required to construct the first line of the letter 'E'; and once again, this pattern is loaded into the shift register, which converts it into the pulses required to control the electron beam.

The video card continues this process for addresses \$nnnn+2 through \$nnnn+31, at which time it has completed the first line of the first row of characters forming the display. It then repeats the process for the same set of characters in addresses \$nnnn through \$nnnn+31, but this time it sets the character ROM's line[3:0] inputs to binary 0001, thereby indicating that it's now interested in line 1 of these characters. In due course, the video card has to repeat this process another fourteen times for the first row of characters on the screen (incrementing the line[3:0] inputs each time) until it finally manages to complete all of the lines required to form the first row of characters.

Next, the video card has to perform another fifteen scans to construct the second row of characters from the ASCII codes stored in Video RAM addresses \$nnnn+32 through \$nnnn+63, and so on for the remaining fourteen rows on the display. This may seem to be a dreadfully complicated process involving a lot of work, but it really isn't too bad. Transistors can switch millions of times a second, so what seems to be a horrendous amount of effort to us actually leaves them with a lot of time on their hands waiting around for something interesting to happen.

More importantly, as users we aren't really affected by any of this. All we really need to know is that – assuming we're working with a memory-mapped display – if we create a program that stores an ASCII code into one of the RAM locations we've designated as the Video RAM, then the video card will automatically cause the corresponding character to appear on the screen at the appropriate location.

As fate would have it, we don't use memory-mapped displays in our home computers anymore, because we've grown to expect (nay demand) sophisticated user-interfaces and high-resolution graphics (see also the discussions on Modern and Future Display Technologies later in this paper). However, it would be a mistake to regard memory-mapped displays as being only historical curiosities, because these devices are still found in some "cheap and cheerful" applications such as automatic teller machines (ATMs).

I/O-Driven Displays
Following the memory-mapped displays presented in the previous topic, the next step up the evolutionary ladder would be an equivalent input/output (I/O)-driven display. In this case, the Video RAM is a separate entity that is part of the display (actually, it's located on the video card):

The idea here is that the main computer simply writes a series of ASCII characters to a certain output port that is being used to drive the video card. Special control logic on the video card keeps track of what's happening and stores each character in the appropriate location in its Video RAM. This control logic would also understand special codes that instruct it to do things like clear the screen (which would equate to loading all of the locations in the Video RAM with ASCII space characters [\$20]), returning a flashing cursor to its "home" position (at row 0, column 0), and so forth.

Thus, assuming that the computer had already transmitted "clear" and "home cursor" codes to the video card, when the computer sent its first ASCII character, the control logic on the video card would automatically store this character in the row 0, column 0 location in the Video RAM. Similarly, when the computer sent its next ASCII character, the control logic would store this little rascal in the row 0, column 1 location, and so forth. (The video card's control logic would also understand special commands [control codes] such as "New Line," which would cause it to move the flashing cursor to the beginning of the next line on the display.)

This approach, which is the one used by the DIY Calculator's virtual Console (screen) as discussed in the release documentation on the Downloads page of this website, frees up the main computer's memory and leaves it available to store programs and data. The downside (at least, in the days when computer memory was very expensive) is that you need an additional block of memory to act as the Video RAM.

1-Bit, 8-Bit, 15-Bit, 16-Bit, 24-Bit, 30-Bit, or 32-Bit Color
As we previously noted, the simple displays presented in the previous topics are still to be found in some "cheap and cheerful" applications such as the automatic teller machines found in places such as banks and shopping malls. However, we don't use these displays in our home computers anymore, because we've grown to expect – nay demand – sophisticated user-interfaces and high-resolution graphics. [In the context of computer graphics and graphics subsystems, the term resolution refers to the number of pixels (picture elements) that are used to represent an image.]

Of course, a number of developments had to occur and technologies had to mature for us to reach the present state-of-play. First, manufacturing techniques improved, allowing computer monitors to have smaller pixels and (consequently) more pixels-per-inch than was previously achievable. This means that today's monitors support high resolution and can display images with great precision. Perhaps more importantly, the amount of memory we can squeeze into a single silicon chip has increased enormously, while the cost of such devices has plummeted dramatically. Finally, modern computers are tremendously faster and more powerful than their predecessors.

As we previously discussed, modern computers typically contain a circuit board called a graphics card or a graphics subsystem, which can contain a substantial amount of memory and its own on-board processor(s). All your main system has to do is to send an instruction to the processor on the graphics card saying something like "Draw me a purple circle with 'this' diameter at 'that' location on the screen." The graphics processor then slogs away completing the task, while the system's main processor is left free to work on something else.

The combination of high-resolution monitors and graphics cards means that today's computer screens can have millions of pixels, each of which can be individually set to thousands or even millions of colors. The fact that the graphics card can individually address each pixel means we can create all sorts of sophisticated effects, such as changing the size and font of characters on a character-by-character basis and dynamically varying the spacing between characters, because each character can be individually drawn pixel-by-pixel. Furthermore, these high-resolution displays support today's graphical user interfaces, such as the one employed by the DIY Calculator, which both enhance and simplify the human-machine interface.

In this topic, we are going to consider some of the color schemes that may be employed by different software applications and graphics subsystems. For example, if you right-mouse-click on your desktop (assuming you are running the Windows® operating system), select the Properties option from the ensuing pop-up menu, and then select the Settings tab from the resulting dialog, you can examine the options for Screen Resolution and Color Quality:

Some common resolution options are 800 × 600 (which means 800 pixels wide by 600 pixels deep), 1024 × 768, 1152 × 864, and 1280 × 1024. (As we previously mentioned, the concept of pixels is somewhat slippery when we come to computers – we’ll discuss this in more detail in the What is a Pixel? topic later in this paper). Meanwhile, some common color options are Low (8-bit), Medium (16-bit), and High (24-bit). But what does this actually mean? Well, we can explain it this way...

1-Bit "Color": Before we commence, we need to know that a modern graphics subsystem contains a block of RAM (memory) known as the frame buffer. This stores a copy of whatever image is currently being displayed on the computer's screen (where this image is constructed by the graphics processor). Another term of which we need to be aware is color depth, which refers to the number of bits we use to represent the color of each pixel in an image.

If we used only a single bit to represent each pixel, for example, then each bit could be in only one of two states – either Off or On (logic 0 or logic 1) – which would allow us to represent only two colors; for instance, black and white:

It's obvious that this illustration is not to scale and reflects a very limited number of pixels (if we tried to show them real sized, we wouldn't be able to see anything at all). But if we assume a resolution of 1280 × 1024, then having a black-and-white display boasting only one bit per pixel will still require our frame buffer to contain 1,310,720 bits (or 163,840 bytes) of memory. And what of the other schemes? Read on...

8-Bit Color (A Teaser): The more colors that are available to describe an image, the more realistic will be the final result. Associating more bits with each pixel allows us to represent more colors. For example, two bits can be used to represent 2^2 = 4 different binary values (00, 01, 10, and 11), which can – in turn – be used to represent four different colors.

Similarly, three bits can be used to represent 2^3 = 8 different colors, four bits can be used to represent 2^4 = 16 different colors, and so forth. However, increasing the number of bits used to describe each pixel increases the complexity and cost of the graphics subsystem.

For this reason, low-end graphics subsystems tend to use as few bits per pixel as possible. A popular technique for these low-end cards is to assign eight bits to each pixel in the frame buffer, where these eight bits can be used to represent 2^8 = 256 different colors. Of course a palette of only 256 colors is really quite limiting, so we often use a cunning trick to get around this restriction. Can you guess what this ruse is? Actually, in order to understand how this works, we need to understand some other concepts, so we’ll first consider 15-bit, 16-bit, and 24-bit color schemes, and then we'll return to the 8-bit scheme a little later in this topic.

15-Bit Color: One technique used in some mid-range graphics cards is to store 15-bits per pixel in the frame buffer. This requires almost twice the amount of memory as an 8-bit (256 color) approach, but in return it offers 2^15 = 32,768 colors:

In this case, the 15 bits associated with each pixel in the frame buffer are composed of three 5-bit subfields, which are used to specify that pixel's red (R), green (G), and blue (B) color components, respectively (this is usually abbreviated to RGB). The value in each 5-bit subfield is used to drive an associated digital-to-analog converter (DAC), which transforms the digital data in the frame buffer into its analog equivalent as required by the monitor. (Until fairly recently, the vast majority of monitors were analog in nature.)

Note that there is some additional circuitry (not shown in the above diagram for simplicity) that scans through the rows and columns of pixel data in the frame buffer. This circuitry commences with the pixel in the upper-left-hand corner and works its way across the first row; it then moves to the next row and repeats the process; it works its way through the frame buffer until it's processed the last row, and then it returns to the upper-left-hand corner and starts all over again.

15-bit color offers a reasonable tradeoff between memory requirements and the number of colors, but the resulting images are not as realistic as those represented using the 16-bit or 24-bit color techniques as discussed in the following sections.

16-Bit Color: Another technique used by mid-range graphics cards is to store 16 bits per pixel in the frame buffer. This requires twice the amount of memory as an 8-bit (256 color) approach, but in return it offers 2^16 = 65,536 colors:

Once again, the value in each subfield is used to drive an associated digital-to-analog converter (DAC), which transforms the digital data in the frame buffer into its analog equivalent as required by the monitor.

The 16-bit scheme illustrated here is called a 5-6-5 scheme because it uses 5 bits to represent the pixel's red component, 6 bits for the green component, and 5 bits for the blue component. Due to the fact that the human eye is more sensitive to variations in the green portion of the spectrum, using 6 bits to represent the green component provides a noticeable improvement over the 15-bit color scheme discussed above.

The following image goes some way to showing the differences between 1-bit (black-and-white), 8-bit, and 16-bit color with regard to representing some three-dimensional geometric shapes:

In reality, this image doesn't do justice to the 16-bit scheme, which would look much better on a display in the real world.

24-Bit Color: Graphics cards that store 24 bits per pixel in the frame buffer can represent 2^24 = 16,777,216 different colors. This is more than enough to accurately portray true-to-life images, so 24-bit color is often referred to as "true color."

As usual, the value in each subfield is used to drive an associated digital-to-analog converter (DAC), which transforms the digital data in the frame buffer into its analog equivalent as required by the monitor.

High-end applications that require photorealistic images mandate the use of graphics subsystems that support 24-bit true color, because lesser color schemes simply cannot reproduce images with the required fidelity.

8-Bit Color (Redux): And so we return to considering 8-bit color. As we previously noted, the eight bits associated with each pixel in the frame buffer can be used to represent 2^8 = 256 different patterns of 0s and 1s. The problem is that a palette of only 256 colors is really quite restrictive. One common way to mitigate this limitation is to use the 8-bit fields in the frame buffer to index (point) into a set of lookup tables (color palettes):

In this case, there are three lookup tables (LUTs) – one each for the red, green, and blue color components – each of which is 8-bits wide and 256 words deep. The 8-bit value returned from each LUT (which is pronounced to rhyme with "hut") is used to drive an associated digital-to-analog converter (DAC), which – as usual – transforms the digital data into its analog counterpart as required by the monitor.

So now we have an interesting mix, because we can use only 256 colors, but each of those colors can be selected from a 24-bit palette, which provides us with 16,777,216 different color possibilities.

Thus, this approach allows a graphics application (computer program) to load the LUTs with any set of 256 colors appropriate to that application. One problem with this technique, however, is that whichever application is currently active loads the LUTs with its preferred set of colors, which makes any images currently being displayed by other applications elsewhere on the screen look somewhat strange.

All things considered, this 8-bit approach – which is frugal in terms of its memory requirements – can be useful for applications that require only a limited number of colors; for example, user interfaces that employ areas of "solid" color. However, in the case of applications that are intended to process more complex images such as photographs, an 8-bit approach will result in low-fidelity images that are unrealistic.

30-Bit Color: Some very special imaging and sensor applications employ a 30-bit color scheme, in which each pixel in the frame buffer is represented by 30 bits; 10 bits each for the red, green, and blue color components. However, this type of thing is extremely rare and is outside the scope of this paper.

32-Bit Color: In the case of a 32-bit color scheme, each pixel in the frame buffer requires 32 bits to represent it. In fact, the term "32-bit color" is technically a misnomer, in that this scheme is actually a combination of 24-bit true color (as discussed above) along with 8 alpha bits.

These alpha bits offer 2^8 = 256 different levels of translucency, ranging from transparent to opaque. This is applicable to a variety of applications, such as three-dimensional graphics in which a scene may contain many objects, some of which are in front of others.

As a point of interest, the term transparency refers to the quality of a material that allows the passage of light such that objects behind that material can be clearly seen. There are a number of different techniques for representing transparency in computer graphics. In the case of the alpha blending approach, we blend the colors of the pixels forming the transparent object with the colors of the pixels associated with any objects that are behind that object. The alpha bits associated with the pixels forming the object in the foreground are used to tell the graphics processor just how transparent those pixels are.

Graphics folks also tend to drop the term translucency into the conversation. This refers to the quality of a material of allowing the passage of light and diffusing that light so that objects behind the material can be seen, but not clearly. In the real world, for example, it's possible to look at a scene through a sheet of colored glass and an identically colored bowl of gelatin and see objects behind both of them. In this case, we would refer to the glass as being transparent and the gelatin as being translucent, because any objects seen through the gelatin would tend to be less clear.

And finally: One question that may have popped into your head when we were considering the fact that your computer offers you different levels of screen resolution and different levels of color quality is: "Why wouldn't we always want to use both the highest resolution and the highest color quality?"

That's a good question. Well, first of all, your graphics subsystem may contain only a limited amount of memory for its frame buffer. The designers of these subsystems tend to make these things very configurable, such that the memory can be allocated in many different ways. So, for example, there may be enough memory to support 24-bit true color at a resolution of 1024 × 768; however, if you wish to use a higher resolution of 1280 × 1024, then your graphics subsystem may be able to accommodate only 16-bit color at this resolution.

Alternatively, even if your graphics subsystem can support the highest color quality at the highest resolution, you may decide to use a lower resolution, such that objects (especially text) appear larger on the screen.

Alternative Color Schemes (8-bit, 12-bit, 17-bit, etc.)
With regards to the previous topic, do we really need 24 bits per pixel to achieve photo-realistic images, or can we get by with less? Now that computer memory is relatively inexpensive, this may not be tremendously important in the case of large (desktop, for example) systems. However, it may be very significant in the case of embedded systems and portable, handheld, battery-powered systems such as cell phones, in which using fewer pixels equates to less real estate on the silicon chip, a reduction in computational requirements, and lower power consumption.

The reason we've come to question "conventional wisdom" is a paper written by IBM Fellow Mike Cowlishaw. Dated from 1985, and entitled Fundamental Requirements for Picture Presentation, this paper demonstrates that it should never be necessary to use more than 17-bits to reproduce optimal color (5 red, 7 green, and 5 blue), but that a 12-bit scheme (4 red, 5 green, 3 blue) does almost as well. In fact, the paper even shows a photograph of a person looking very realistic using only an 8-bit scheme (2 red, 4 green, and 2 blue)!

Modern and Future Display Technologies
For many decades, cathode-ray tube (CRT)-based computer monitors were the only game in town. But things don't stay the same for ever; new display technologies have already emerged, with even more exciting possibilities for the future...

Liquid Crystal Displays (LCDs): In the early 1990s, a number of different companies started experimenting with substances known as liquid crystals (LCs); eventually, liquid crystal displays (LCDs) became available to the market.

In their early incarnations, these displays were very expensive compared to CRT-based techniques; also, the picture quality wasn't as good and the response time of the individual pixels was slow enough that "ghosting" and blurring effects were seen on fast-moving objects in the images. These problems have now largely been solved, and sales of LCDs have risen dramatically over the last few years, to the extent that almost 85% of all new displays sold are LCD-based.

Interestingly enough, LCD technology has its roots in 1888, when an Austrian botanist called Friedrich Reinitzer (1857-1927) was studying cholesterol in plants. He ended up creating a material we now know as cholesteryl benzoate. This was a phase of matter of which we had never previously been aware, but which we now know as possessing a "liquid crystalline" structure.

One problem with LCDs is that there many different variations on the theme. For example, we now know of more than 50,000 compounds and mixtures that possess liquid crystalline properties. Thus, the following will be a generic (high-level) description intended only to give the "flavor" as to how this all hangs together. As usual, the easiest way of summarizing how these things work is by means of a high-level diagram:

The key point about liquid crystals (at least from our perspective) is that – by default – they will arrange themselves into a tight ("twisted") helix pattern, in which case they will block the passage of light. However, if we apply current to the liquid crystals, they will "untwist" to the extent that they will pass light. Varying the amount of current will affect the amount of light that is being passed.

The illustration above reflects a cutaway portion of the screen showing the tiny red, green, and blue filters forming a single pixel. Each of these filters has a bunch of liquid crystals associated with it, and each of these bunches has an associated transistor (these transistors are not shown here for reasons of simplicity).

Behind all of the crystals is a backlight formed from some source of white light. When individual transistors are turned on, they will activate their c orresponding bunch of liquid crystals, which will transmit the light into their associated filter. It's possible to control the bunches of crystals in 256 increments which we might number from 0 to 255; 0 means the crystals are twisted and won't pass any light; 255 means that the crystals are sufficiently untwisted that they will pass the maximum amount of light they can; and the other values correspond to the crystals passing lesser or greater amounts of light.

The great advantage of LCDs over CRT-based displays is that they are very thin, very light, and very flat. Having said this, CRT-based displays still have an advantage in terms of the brightness, contrast, and "vibrancy" of the images that can be achieved. If only there were some other technologies...

Plasma Display Panels (PDPs): You may have seen flat-panel plasma displays at television stores. These displays offer bright, crisp, high-contrast images. In this case, we can think of each pixel as being formed from three tiny fluorescent lights (like microscopic neon tubes). By one mechanism or another, these three tiny neon tubes can be coerced into generating red, green, and blue light, each of which can be controlled to form the final color coming out of that pixel.

Plasma displays are fantastic when it comes to presenting ever-moving images such as films. However, if they are instructed to present the same image over and over again, they suffer from "burn-in" effects that leave "ghost" images on the screen. This means that plasma-based technologies do not make an ideal display for computer applications (although there are always some folks who will try to do so).

Organic Light-Emitting Diodes (OLEDs): These are devices that are formed from thin films of organic molecules that generate light when stimulated by electricity. OLED-based displays hold the promise of providing bright and crisp images while using significantly less power than liquid crystal displays.

At some stage in the future, it may be possible to use OLEDs to create displays that are only a few millimeters thick and are two meters wide (or more); these displays would consume very little power compared to other technologies, and in some cases the display could be rolled up and stored away when it wasn't in use (OLEDs can be "printed" onto flexible plastic substrates).

But (despite some very exciting "proof-of-concept" demonstrations), this technology isn't ready for "prime time" usage just yet. OLED-based displays are sometimes used for small-screen applications such as cell phones and digital cameras, but their widespread use for applications like large screen computer displays may not come for another five or ten years at the time of this writing (in fact, they may not make it at all if the SED technology discussed below fulfils its promise).

Surface Emission Displays (SEDs): This is where things start to get very exciting. Prior to the mid-1980s, graphite and diamond were the only forms of pure carbon that were known to us. In 1985, however, a third form consisting of spheres formed from 60 carbon atoms was discovered. Commonly referred to as "Buckyballs," the official moniker of this material is Buckministerfullerine, which was named after the American architect R. Buckminister Fuller who designed geodesic domes with a similar underlying symmetry.

Sometime later, scientists discovered a related structure that we now refer to as a carbon nanotube. Such nanotubes can be incredibly small, with a diameter only one thousandth of one millionth of a meter. Furthermore, they are stronger than steel, have excellent thermal stability, and are tremendous conductors of heat and electricity.

In addition to functioning as wires, nanotubes can be persuaded to act as transistors. Of particular interest to us here is that they can also be coerced into emitting streams of electrons out of one end. Hmmm, tiny little electron guns; what wonders could we perform with these little rapscallions?

Imagine a screen that is thin and flat like a LCD, but is as bright and vibrant as a CRT-based display. Well, that's what you end up with if the screen is formed from a carbon nanotube-based SED. In this case, the inside of the screen is covered with red, green, and blue phosphor dots (one of each to form each pixel), and each if these dots has its own carbon nanotube electron gun.

This technology has been skulking around in the background for some time, but it appears as though the outstanding issues that had been holding it back have been resolved, and SEDs are poised to leap onto the center stage. At the time this topic was first written, it was predicted that we would be seeing SEDs on the streets toward the end of 2006 and the beginning of 2007. It was later announced, however, that the introduction of these devices is being held back until around the middle of 2008 (to coincide with the Summer Olympics in Beijing).

Toshiba hosted the first public demonstration of a large-scale carbon nanotube-based SED at the consumer electronics show (CES) in January 2006. Industry expert Dennis P. Barker attended the show, and as he told the authors of this paper:

 "High-definition television is incredibly realistic, but SED goes one step beyond. When I saw the Toshiba demonstration, it gave me chills and the hairs on the back of my neck stood to attention. I have seen the future and – to me – the future is SED!"

An Alternative Sub-Pixel Technology
In our previous illustrations, we’ve shown the red, green, and blue sub-pixels forming a full pixel as being circles or squares. In reality, this was to some extent a case of artistic license that made it easier to get the concepts across. In order to fully wrap our brains around this current topic, however, we need to be aware that the red, green, and blue sub-pixels forming a pixel on a liquid crystal display (LCD) are rectangular. Combined, these three sub-pixels form a square (or close enough to a square for our purposes here).

And so we come to those clever guys and gals at Clairvoyante (www.clairvoyante.com), who have used their expertise in human vision to come up with an incredibly cunning new way of doing things. First of all, let's consider a 4 × 4 array of standard RGB (red, green, and blue) pixels compared to a 2 × 4 array of Clairvoyante’s RGBW (red, green, blue, and white) PenTile™ pixels. (To be fair, we should note that the underlying RGBW technology has been around for quite some time for specialist applications such as displays in aircraft. Until now, however, these displays have been unsuccessful when it comes to displaying natural images. In order to address this, the folks at Clairvoyant have come up with a treasure chest of cunning tricks and techniques – such as special sub-pixel-based rendering algorithms – that result in bright, crisp images with natural color.)

Observe that each new row of RGBW pixels is shifted by two sub-pixels from the previous row. That is, the first, third, fifth, etc. rows have their sub-pixels ordered R, G, B, W, R, G, B, W, etc. By comparison, the second, fourth, sixth, etc. rows have their sub-pixels ordered B, W, R, G, B, W, R, G, and so on. We'll see how this comes into play shortly.

One key point to note here is that four RGB pixels in a row (in the horizontal direction) are formed from 4 × 3 = 12 sub-pixels. By comparison, two RGBW pixels in a row are formed from 2 × 4 = 8 sub-pixels. Thus, the PenTile Matrix employs only 2/3 the number of sub-pixels required by the traditional scheme. In turn, this means that a PenTile-based screen requires a third fewer transistors, which improves reliability. Moreover, the remaining transistors can be fabricated a tad larger, which makes them more robust.

So why is the smaller number of larger-sized RGBW sub-pixels important? Well, apart from anything else, both arrays occupy the same physical area. Now, if you look at any of the sub-pixels in the illustrations above and below, you'll see them shown as being surrounded by a black line. This represents the real world, where each sub-pixel has an opaque periphery that blocks extraneous internal light coming out (and unwanted external light getting in).

The term "aperture ratio" refers to the transparent area of a sub-pixel compared to the total area occupied by that sub-pixel (including its opaque periphery). The fact that there are only two PenTile RGBW sub-pixels in the same area as three of the standard RGB sub-pixels means that the aperture ratio of the PenTile sub-pixels is larger; in turn, this means that they pass more light per unit area and therefore are more efficient.

However, the real key to the excitement surrounding this new technology is its power efficiency. Although LCDs are considered to be energy-efficient, when it comes to handheld, battery-powered devices such as cell phones, such a display can consume a large proportion of the device's total power budget. And things will only get worse as we start to use new features such as graphics-intensive games and streaming video on these devices.

In the case of today's cell phones using qVGA resolution (240 × 320), for example, the backlighting requires two high-efficiency white light-emitting diodes (LEDs), each of which consumes 50 mW and costs 25 cents (when purchased in extremely large quantities). By comparison, forthcoming cell phones boasting VGA resolution (480 × 640) will require 8 or 10 LEDs for the backlighting, with a combined cost of \$2.00 to \$2.50 and a combined power consumption of 400 to 500 mW. (Good Grief, Charlie Brown!)

Thus, another really important point about the PenTile arrangement is the fact that the white (W) sub pixels are basically transparent, which means they propagate the backlight with minimal losses (as opposed to the RGB sub-pixels whose colored filters impose significant losses). The way all of this works is best described visually (”a picture is worth a thousand words,” as they say). Let’s start by considering a small 3 × 4 array of traditional RGB pixels and the corresponding array of PenTile pixels; initially we’ll assume that none of the sub-pixels are lit up.

Now, let's assume that we want to fully light up the traditional RGB pixel in row 2 column 2, which will require us to turn each of its sub-pixels on 100%. We can achieve the same effect with the PenTile matrix by activating a collection of sub-pixels as shown below:

At a first glance, the above diagram (and the two that follow) may appear to make complete sense. But let's pause just a moment and take the time to ponder this image in a little more detail. If we consider the percentage values associated with the RGBW portion of the image, for example, we see 50% red, 50% green, 4 × 12.5% = 50% blue, and 4 × 12.5% = 50% white. But what does this actually mean? If you instinctively understand this, then please continue reading; alternatively, if you're starting to feel a little puzzled, you might want to take a moment to see if you can figure it out for yourself, and then check out our Brain Boggler addendum to see how close you got.

As a second example, let's assume that we wish to fully light up the traditional RGB pixel in row 3 column 2; as before, this will require us to turn each of its sub-pixels on 100%. Once again, we can achieve the same effect with the PenTile matrix by activating a collection of sub-pixels as shown below:

As one final example, suppose that we wish to fully light up two of the traditional RGB pixels in row 2 column 2 and row 3 column 2. In this case, we can achieve the same effect with the PenTile matrix by activating a collection of sub-pixels as shown below (actually, this final example is something of a simplification, because the actual values for each RGBW sub-pixel will be tweaked [adjusted] based on the values of surrounding pixels.):

The fact that the RGBW sub-pixels have a larger aperture ratio than the traditional RGB sub-pixels – coupled with the fact that the W (white) sub-pixels propagate the backlight with minimal losses (as opposed to the red, green, and blue sub-pixels in which there are significant losses) – makes the PenTile matrix much more efficient as a whole.

And what do we mean by more efficient? Well, taking VGA resolution as an example, a PenTile screen will be 100% brighter than a traditional screen using the same number of backlight LEDs. Alternatively, a PenTile screen can achieve the same brightness as a traditional screen using half the LEDs (and therefore consuming half the power). ("But wait," we hear you cry, how can we achieve a 100% increase in brightness or a 50% reduction in power when the previous image showed the center group of RGBW sub-pixels at 62.5%?". Well, once again, if you instinctively understand this then please continue reading; otherwise check out our Brain Boggler addendum.)

In addition to the concept of the PenTile matrix itself, the folks at Clairvoyante have developed corresponding sub-pixel image processing algorithms that take images intended for standard displays and convert them into equivalent images for PenTile matrix displays (this extra processing consumes only a few milliwatts). These algorithms can be made available in the form of intellectual property (IP) for folks to include on their custom integrated circuits, or as software algorithms to be run on a CPU/DSP.

But what about the fact that there are 1/3 fewer RGBW sub-pixels in the PenTile matrix as compared to the RGB sub-pixels in the traditional displays. Or, to put this another way, a single RGBW-based pixel replaces (occupies the same area as) two of the traditional RGB-based pixels in the horizontal direction. Let's remind ourselves as to what this looks like:

So, doesn’t the fact that the PenTile matrix effectively has half the number of pixels (2/3 the number of sub-pixels) as compared to a traditional RGB-based display affect the resolution of the display. Well, it all depends what we mean by resolution, doesn’t it? The conventional way of specifying resolution is to report the number of whole pixels forming the display. Thus, if the arrays in the above image represented the total display, we would say that the RGB version has a resolution of 4 × 4 pixels, while the RGBW has a resolution of only 2 × 4 pixels. Is this bad? Let's see...

Before we proceed, consider the following illustration, in which all of the pixels are fully lit and both screens are pure white (although the RGBW screen would be twice as bright using the same amount power as discussed above):

Now consider what would happen if we tried to present the maximum possible number of alternating black-and-white lines on our displays in both the vertical and horizontal directions as shown in the following illustration:

As we see, we can display exactly the same number of black-and-white lines on both displays. On this basis, we might say that both displays have the same "visual resolution", even though they have different numbers of pixels/sub-pixels. To put this another way, it's the amount of detail that can be perceived in the displayed image that is important, not the number of pixels/sub-pixels used to produce that image.

OK, black-and-white lines are one thing, but what about real-world images including text (both regular and italic fonts) and graphics? Well, the point here is that the human vision system can detect variations in luminance (the intensity of light per unit area of its source – what we poor folks may consider to be "brightness") at a much higher resolution than it can detect variations in color. As we have observed during our earlier discussions, illuminating different combinations of RGBW sub-pixels results in the same brightness as their RGB counterparts, but we achieved this by spreading the colored sub-pixels around. Having said this, Clairvoyante's sub-pixel rendering algorithms perform an extra step that sharpens the image by moving the energy back onto the original pixel locations whenever possible. Furthermore, this sub-pixel color "spreading" can actually be advantageous in the case of things like italic fonts, because it can end up minimizing unwanted visual artifacts. The bottom line is that, by means of incredibly cunning tricks, Clairvoyante's sub-pixel rendering algorithms uses this knowledge of the human visual system to present us with images that have the same perceived resolution as traditional displays:

In reality, of course, the above illustration has undergone so many manipulations before being presented here that it's impossible for us to use this as the basis for determining the quality of the final displayed images (not the least that nothing looks realistic when you zoom in to this level). In closing, therefore, let me just say that I hope to see a real-world demonstration of this technology in the not-so-distant future. And when I do, I will report my findings here (so come back often and tell all your friends about this website).

So, Just What Is a Pixel Anyway?
As we previously noted, the term "pixel" is a somewhat slippery customer when it comes to computers (especially if we're talking about the PenTile matrix presented in the previous topic). For example, home computers using traditional RGB pixels often used to arrive with a default setting of 800 (wide) × 600 (tall) pixels. However, if you right-mouse-click on your desktop (assuming you are running the Windows® operating system), select the Properties option from the ensuing pop-up menu, and then select the Settings tab from the resulting dialog, you will see other options such as 720 × 512 pixels, 1024 × 768 pixels, and so forth.

Obviously, the number of physical dots on your screen remains constant, but you can instruct your computer to use larger or smaller groups of these dots to represent a pixel. Increasing the number of pixels that you're using (by using more groups, each of which contains fewer dots) means that you can display finer details, but it also means that you require more memory on your video card.