Journal:    Dr. Dobb's Journal  Jan 1993 v18 n1 p127(6)
-------------------------------------------------------------------------
Title:     Yet another animation method. (Graphics Programming) (Column)
Author:    Abrash, Michael


Abstract:  A graphics technique called dirty-rectangle animation can
           overcome many of the performance problems encountered with VGA
           display monitors.  The technique is called dirty because the
           graphics drawn using the technique do not match those that
           appear on-screen.  Rather than draw directly to the screen,
           images are drawn and stored in off-screen or non-display
           memory.  The list of the bounding rectangles for the drawn-to
           areas are the dirty rectangles.  These bounding rectangles are
           transferred to the screen once all drawing and redrawing is
           complete.  Drawing and redrawing directly to the screen
           creates excessive flicker and reduces the visual
           presentation's quality.  Dirty-rectangle animation improves
           image presentation because only the final pixel representation
           appears on-screen.  The technique is also faster because it
           limits the amount of interaction with VGA hardware.
-------------------------------------------------------------------------
Full Text:

As documented last month, we brought our pets with us when we moved out
here to Seattle.  At about the same time, our Golden Retriever, Sam,
observed his third birthday.  Sam is relatively intelligent, in the sense
that he is clearly smarter than a Banana Slug, although if he were in the
same room with Jeff Duntemann's dogs Mr. Byte and Chewy, there's a
reasonable chance that he would mistake them for something edible (a
category that includes rocks, socks, and a surprising number of things
too disgusting to mention), and Jeff would have to find a new source of
openings for his column.

But that's not important now.  What is important is that--and I am not
making this up--this morning I managed to find the one pair of socks Sam
hadn't chewed holes in.  And what's even more important is that after we
moved and Sam turned three, he calmed down amazingly.  We had been
waiting for this magic transformation since Sam turned one, the age at
which most puppies turn into normal dogs who lie around a lot, waking up
to eat their Science Diet (motto, "The dog food that costs more than the
average neurosurgeon makes in a year") before licking themselves and
going back to sleep.  When Sam turned one and remained hopelessly out of
control we said, "Goldens take two years to calm down," as if we had a
clue.  When he turned two and remained undeniably Sam we said, "Any day
now."  By the time he turned three, we were reduced to figuring that it
was only about seven more years until he expired, at which point we might
be able to take all the fur he had shed in his lifetime and weave
ourselves some clothes without holes in them, or quite possibly a house.

But miracle of miracles, we moved, and Sam instantly turned into the dog
we thought we'd gotten when we forked over $500--calm, sweet, and
obedient.  Weeks went by, and Sam was, if anything, better than ever.
Clearly, the change was permanent.

And then we took Sam to the vet for his annual check-up and found that he
had an ear infection.  Thanks to the wonders of modern animal medicine, a
$5 bottle of liquid restored his health in just two days.  And with his
health, we got, as a bonus, the old Sam.  You see, Sam hadn't changed.
He was just tired from being sick.  Now he once again joyously knocks
down any stranger who makes the mistake of glancing in his direction, and
will, quite possibly, be booked any day now on suspicion of homicide by
licking.

Okay, you give up.  What exactly does this have to do with graphics?  I'm
glad you asked.  The lesson to be learned from Sam The Dog With A Brain
The Size Of A Walnut is that while things may look like they've changed,
in fact they often haven't.  Take VGA performance.  If you buy a 486 with
a Super-VGA, you'll get performance that knocks your socks off,
especially if you run Windows.  Things are liable to be so fast that
you'll figure the Super-VGA has to deserve some of the credit.  Well,
maybe it does if it's a local-bus VGA.  But maybe it doesn't, even if it
is local bus--and it certainly doesn't if it's an ISA-bus VGA, because no
ISA-bus VGA can run faster than about 300 nanoseconds per access, and
VGAs capable of that speed have been common for at least a couple of
years now.  Your 486 VGA system is fast almost entirely because it has a
486 in it.  (486 systems with accelerators such as the ATI Ultra or
Diamond Stealth are another story altogether.)  Underneath it all, the
VGA is still painfully slow--and if you have an old VGA or IBM's original
PS/2 motherboard VGA, it's incredibly slow. The fastest ISA-bus VGA
around is two to twenty times slower than system memory, and the slowest
VGA around is as much as 100 times slower.  In the old days, the rule
was, "Display memory is slow, and should be avoided."  Nowadays, the rule
is, "Display memory is not quite so slow, but should still be avoided."

So, as I say, sometimes things don't change.  Of course, sometimes they
do change.  For example, in just 49 dog years, I fully expect to own at
least one pair of underwear without a single hole in it.  Which brings
us, deus ex machina and the creek don't rise, to yet another animation
method:  dirty-rectangle animation.

VGA Access Times

Actually, before we get to dirty rectangles, I'd like to take you through
a quick refresher on VGA memory and I/O access times.  I want to do this
partly because the slow access times of the VGA make dirty-rectangle
animation particularly attractive, and partly as a public service,
because even I was shocked by the results of some I/O performance tests I
recently ran.

Table 1 shows the results of the aforementioned I/O performance tests, as
run on two 486/33 Super-VGA systems under the Phar Lap 386!DOS-Extender.
(The systems and VGAs are unnamed because this is a not-very-scientific
spot test, and I don't want to unfairly malign, say, a VGA whose only sin
is being plugged into a lousy motherboard, or vice versa.)  Under Phar
Lap, 32-bit protected-mode apps run with full I/O privileges, meaning
that the OUTs I measured had the best official cycle times possible on
the 486:  10 cycles.  OUT takes 16 cycles in real mode on a 486, and a
mind-boggling 30 cycles in protected mode if running without full I/O
privileges (as is normally the case for protected-mode applications).
Basically, I/O is just plain slow on a 486.

Slow as 30 or even 10 cycles for an OUT is, one could only wish that VGA
I/O was actually that fast.  The fastest OUT in Table 1 is 26 cycles, and
the slowest is 126--this for an operation that's supposed to take 10
cycles.  To put this in context, MUL takes only 13 to 42 cycles, and a
normal MOV to or from system memory takes exactly one cycle on the 486.
In short, OUTs to VGAs are as much as 100 times slower than normal memory
accesses, and are generally two to four times slower than display memory
accesses, although there are exceptions.

Of course, VGA display memory has its own performance problems.  The
fastest ISA-bus VGA can, at best, support sustained write times of about
10 cycles per word-sized write; 15 or 20 cycles is more common, even for
relatively fast Super-VGAs; the worst case I've seen is 65 cycles per
byte.  However, intermittent writes, mixed with a lot of register- and
cache-only code, can effectively execute in one cycle because the VGA and
the 486 coprocess.  Display memory reads tend to take longer, because
coprocessing isn't possible--one microsecond is a reasonable rule of
thumb for VGA reads, although there's considerable variation.  So VGA
memory tends not to be as bad as VGA I/O, but Lord knows it isn't good.

In conclusion, OUTs, in general, are lousy on the 486 (and to think they
only took three cycles on the 286!).  OUTs to VGAs are particularly
lousy.  Display memory performance is pretty poor, especially for reads.
The conclusions are obvious, I would hope.  Structure your graphics code,
and, in general, all 486 code, to avoid OUTs.  For graphics, this
especially means using write mode 3 rather than the bit-mask register.
When you must use the bit mask, arrange drawing so that you can set the
bit mask once, then do a lot of drawing with that mask.  For example,
draw a whole edge at once, then the middle, then the other edge, rather
than setting the bit mask several times on each scan line to draw the
edge and middle bytes together.  Don't read from display memory if you
don't have to.  Write each pixel once and only once.

[TABULAR DATA OMITTED]

It is indeed a strange concept:  The key to fast graphics is staying away
from the graphics adapter as much as possible.

Dirty-rectangle Animation

The relative slowness of VGA hardware is part of the appeal of the
technique that I call "dirty-rectangle" animation, in which a complete
copy of the contents of display memory is maintained in off-screen system
(nondisplay) memory.  All drawing is done to this system buffer.  As
offscreen drawing is done, a list is maintained of the bounding
rectangles for the drawn-to areas; these are the "dirty" rectangles,
dirty in the sense that they do not match the contents of the screen.
After all drawing for a frame is completed, all the dirty rectangles for
that frame are copied to the screen in a burst, and then the cycle of
off-screen drawing begins again.

Why, exactly, would we want to go through all this complication, rather
than simply drawing to the screen in the first place?  The reason is
visual quality.  If we were to do all our drawing directly to the screen,
there'd be a lot of flicker as objects were erased and then redrawn.
Similarly, overlapped drawing done with the painter's algorithm (in which
farther objects are drawn first, so that nearer objects obscure them)
would flicker as farther objects were visible for short periods.  With
dirty-rectangle animation, only the finished pixels for any given frame
ever appear on the screen; intermediate results are never visible.
Figure 1 illustrates the visual problems associated with drawing directly
to the screen; Figure 2 shows how dirty-rectangle animation solves these
problems.

Well, then, if we want good visual quality, why not use page flipping?
For one thing, not all adapters and modes support page flipping.  The CGA
and MCGA don't, and neither do the VGA's 640x480 16-color or 320x200
256-color modes, or many Super-VGA modes.  In contrast, all adapters
support dirty-rectangle animation.  Another advantage of dirty-rectangle
animation is that it's generally faster.  While it may seem strange that
it would be faster to draw off screen and then copy the result to the
screen, that is often the case, because dirty-rectangle animation usually
reduces the number of times the VGA's hardware needs to be touched,
especially in 256-color modes.  This reduction comes about because when
dirty rectangles are erased, it's done in system memory, not in display
memory, and since most objects move a good deal less than their full
width (that is, the new and old positions overlap), display memory is
written to fewer times than with page flipping.  (In 16-color modes, this
is not necessarily the case, because of the parallelism obtained from the
VGA's planar hardware.)  Also, read/modify/write operations are performed
in fast system memory rather than slow display memory, so display memory
rarely needs to be read.  This is particularly good because display
memory is generally even slower for reads than for writes.

Also, page flipping wastes a good deal of time waiting for the page to
flip at the end of the frame.  Dirty-rectangle animation never needs to
wait for anything because partially drawn images are never present in
display memory.  Actually, in one sense, partially drawn images are
sometimes present because it's possibly for a rectangle to be partially
drawn when the scanning raster beam reaches that part of the screen.
This causes the rectangle to appear partially drawn for one frame,
producing a phenomenon I call "shearing."  Fortunately, shearing tends
not to be particularly distracting, especially for fairly small images,
but it can be a problem when copying large areas.  This is one area in
which dirty-rectangle animation falls short of page flipping, because
page flipping has perfect display quality, never showing anything other
than a completely finished frame.  Similarly, dirty-rectangle copying may
take two or more frame times to finish, so even if shearing doesn't
happen, it's still possible to have the images in the various dirty
rectangles show up nonsimultaneously.  In my experience, this latter
phenomenon is not a serious problem, but do be aware of it.

Dirty Rectangles in Action

Listing One (page 140) demonstrates dirty-rectangle animation.  This is a
very simple implementation, in several respects.  For one thing, it's
written entirely in C, and animation fairly cries out for assembly
language.  For another thing, it uses far pointers, which C often handles
with less than optimal efficiency, especially because I haven't used
library functions to copy and fill memory.  (I did this so the code would
work in any memory model.)  Also, Listing One doesn't attempt to coalesce
rectangles so as to perform a minimum number of display-memory accesses;
instead, it copies each dirty rectangle to the screen, even if it
overlaps with another rectangle, so some pixels get copied multiple
times.  Listing One runs pretty well, considering all of its failings; on
my 486/33, ten 11x11 images animate at a very respectable clip.

One point I'd like to make is that although the system-memory buffer in
Listing One has exactly the same dimensions as the screen bitmap, that's
not a requirement, and there are some good reasons not to make the two
the same size.  For example, if the system buffer is bigger than the
screen, it's possible to pan the visible area around the system buffer.
Or, alternatively, the system buffer can be just the size of a desired
window, representing a window into a larger, virtual buffer.  We could
then draw the desired portion of the virtual bitmap into the
system-memory buffer, then copy the buffer to the screen, and the effect
will be of having panned the window to the new location.

Another argument in favor of a small viewing window is that it restricts
the amount of display memory actually drawn to.  Restricting the display
memory used for animation reduces the total number of display-memory
accesses, which in turn boosts overall performance; it also improves the
performance and appearance of panning, in which the whole window has to
be redrawn or copied. If you keep a close watch, you'll notice that many
high-performance animation games similarly restrict their full-featured
animation area to a relatively small region.  Often, it's hard to tell
that this is the case, because the animation region is surrounded by
flashy digitized graphics and by items such as scoreboards and status
screens, but look closely and see if the animation region in your
favorite game isn't smaller than you thought.

Next month, I'll put the important parts of dirty-rectangle animation
into assembler, and I'll coalesce dirty rectangles to minimize
display-memory accesses--and maybe, just maybe, I'll do some panning.
Then we'll see what kind of stuff dirty-rectangle animation is really
made of.

3-D Reading

As anyone who's been following this column for a while knows, I'm keenly
interested in 3-D graphics.  Thus, it is with considerable pleasure that
I'm able to report that Programming in 3 Dimensions: 3-D Graphics, Ray
Tracing, and Animation by Christopher D. Watkins and Larry Sharp (M&T
Books, 1992) is good stuff.  There's a fair amount of theory, and lots of
3-D implementation, from modeling and scenes to ray tracing and finally,
animation.  The animation is the precomputed, playback kind, of the
Autodesk Animator sort, and while it lacks the on-the-fly flexibility of
the real-time animation we've developed in this column, my oh my, it does
look good.  If you get this book, I strongly suggest you get the disk as
well; in which case, run ANIMATE.EXE, with BOUNCE as the input file, and
marvel that you now have, in source form, all the software needed to
implement that animation.  Ten years ago, I'll bet you couldn't have
produced this level of fully rendered, real-time playback animation for
less than $50,000 in hardware and software; now, a couple of thousand
will easily do the trick.  What a great time this is to be a programmer!
Recommended.