ray proudly presents:
chunky 2 planar graphics
the way to do it
|
introduction
hi freaks...i'm ray of tscc
this is my first tutorial so let me explain some basic things:
first of all i got to tell you that all the examples will be plain 68000 asm
code and sometimes it'll get quite machine-dependend. so my tutorials will
only be useful with the st and its "brothers" - so don't even get the idea of
taking the following descriptions for valid on a tt or something, but i
think that's clear somehow.
on the second i'm expecting you to understand the basics of 68k programing
because i don't have the time to teach you 68k coding in all its sometimes
'bizarre' details like ie. the different addresing modes or something (but
watch out 'cause i'm going to release a 68k-tut in one of the next series).
and if there's something you don't understand just read it 2 or more times
just until you got it - if this doesn't help try to ask me.
well, if you wanna contact me don't hesitate:
let's get it on
let me tell you some sentences about the prupose of this tutorial.
today i'll try to explain how a technique called 'chunky 2 planar graphics',
which is the secret of all those modern demos, can be realized on the st.
now you might ask, what the hell means chunky to planar ?!?
since i think you know how the st's bitplane-screen works (if that isn't the case
read on in the next paragraph) you'll propably know how hard it is to set a pixel
and even worse, how slow.
now chunk 2 planar 'conversion' means that the bitplane's bits are set according to
byte-values in a so called 'chunky-buffer' where every byte represents the color
of one single plot on the screen (this buffer could be compared to the pc's vga-
memory of a 320x200x256 screen - don't get me wrong, intel really suckz ! - it's just
an example).
first you set up your effect or what ever you want in that chunky-buffer by
simply moving the according bytes into it and in a further step the chunky-buffer
is converted to bitplane-data (hence planar simply means bitplane-graphics).
how exactly this works will be described later on...
a brief description
of the bitplanes
skip this paragraph if you already know how the st's screen is arranged...
as already mentioned, it's hard to set pixels 'by hand' because the st's screen
is somehow splitted into bitplanes.
ok i'll be some more exact but notice: i'm only covering st low rez 320x200x16.
in this resolution the screen is divided into 4 'bitplanes' - every four
'corresponding' bits of these bitplanes indicate in which color one pixel on the
screen should appear.
let's take a look at the first 4 words of screen-memory.
with the help of this diagram with the screen pixel-coordinates it should become clear:
| |
0 |
319 |
| 0 |
*
|
| 199 |
|
|
assuming we want to set the pixel in the upper left corner of the screen (0/0), marked
with ' * ', in color $D(=13 if you don't know what's hexadecimal please stop reading) this
would mean we would have to maipulate the highest bit of the first 4 words in screen
memory.
$D (hex) = %1101 (bin)
screen memory:
| bit |
15 |
|
|
|
|
|
|
|
7 |
|
|
|
|
|
|
0 |
| 1.bitplane / 1.word |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2.bitplane / 2.word |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3.bitplane / 3.word |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4.bitplane / 4.word |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1.bitplane / 5.word |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
got it ? no - then let's do some more examples:
just imagine we want to set the pixel right beside the one before (ie. 1/0) in the same
color. then we'd just manipulate bit 14 of the 4 words.
but now be careful, what if we wanted to set pixel 16/0 (the 17th on the sceen) ?
the answer is just as simple as this: since every single of the 4 words only holds 16
bits (=16 pixels) we just had to skip over to the next 4 words in memory (5,6,7,8).
i think you're getting the idea, right ? but i'll think now you'll see why setting
pixels is really slow and complicated on the st. it's just because you have to 'bit set'
4 places in screen-memory just for one pixel - and i think you know that bit
operations are kinda hard for a byte/word/long oriented cpu.
now guys, i think it's time for some code at the end of this paragraph (let's go
back to the first example when we wanted to set the upper left pixel with color $D):
move.l screen,A0
; assume the screen addr is stored in 'screen'
clr.w D0
; start at offset 0
ori.w #%1000000000000000,0(A0,D0.w)
; set bit 15 of the 1st word of bitplane 1
ori.w #%1000000000000000,2(A0,D0.w)
; set bit 15 of the 1st word of bitplane 2
andi.w #%0111111111111111,4(A0,D0.w) ; clear
bit 15 of the 1st word of bitplane 3
ori.w #%1000000000000000,6(A0,D0.w)
; set bit 15 of the 1st word of bitplane 4
hey - we've just set the first pixel ! now try what happens if you ie. move #8 into D0
instead of clearing it...
(i gave this example to show how hard it is to set a pixel)
the secret of c2p, how it works and why it's so quick
the only significant thing i can mention on top of this paragraph is that the whole
let's call it secret of the c2p conversion is the 68k's movep-instruction:
what movep actually does is moving word or long values 'byte per word' aligned by
splitting them up into bytes, so let's take a look at a little example:
move.l #$FE23D2E0,D0
movep.l D0,(A0)
; this instruction kinda covers 64 bit or in other
; words only the upper bytes of 4 following words in
; memory (already smelling what we'll do ? ;) )
after the instruction:
0(A0) = $FE
2(A0) = $23
4(A0) = $D2
6(A0) = $E0
or if we start at an odd addr. (if A0 holds an even one):
move.l #$FE23D2E0,D0
movep.l D0,1(A0)
; now movep will will use the lower 8 bits of the data-
; bus or some more simple : now it covers the lower
; bytes of 4 following words
after the instruction:
1(A0) = $FE
3(A0) = $23
5(A0) = $D2
7(A0) = $E0
if we didn't have movep we'd have to to:
move.l #$FE23D2E0,D0
move.b D0,6(A0)
lsr.l #8,D0
move.b D0,4(A0)
lsr.l #8,D0
move.b D0,2(A0)
lsr.l #8,D0
move.b D0,(A0)
i think then we could forget our chunk 2 planar graphics - thanx motorola !
and now go on reading very carefully because now i'll describe the core of the c2p
conversion:
well, what we need to do a c2p conv. is, like mentioned above, 1st a 'chunky-buffer'
holding the chunky byte data and at 2nd a c2p table which is finally used for the conversion.
don't be afraid i'll give you some more detail...first of all let's talk about the chunky
buffer:
to get the conversion anywhere near reasonable quick we need to double the pixels
horizontally and vertically, this means that one 'chunky-byte' in fact represents
2x2 pixels on the screen and it also means that we'll get a virtual rez of 160x100 'chunky-pixels',
of course that's why the chunky-buffer is sized 160x100 bytes = 16000 bytes (notice: the size
of the chunky buffer may be variable but to keep it simple i'm talking of a fullscreen
conversion).
now some words on the c2p table and the actual conversion:
let's assume the first 4 bytes of the chunky buffer would have
the values $03,$04,$0D,$02 (these 4 values represent 4 chunky-pixels or 4 double pixels
on the screen).
because only the lower 4 bits of these values are used (16 colors!) we're now able to
'pack' those 4 bytes into one word the following way:
lea chunkybuffer,A0 ; A0
now points to the chunky buffer
moveq #0,D0
; clear D0.l
move.w (A0)+,D0
; D0 = $0304 - 1st 2 values out of the chunky buffer
lsl.w #4,D0
; D0 = $3040 - shift them up one nibble
or.w (A0)+,D0
; D0 = $3D42 - that's the four 1st values in one word
but stop, values 2 and 3 have been swapped somehow, wouldn't it be more simple if D0
would contain $34D2 like in the chunkybuffer ??
that brings us to the next element of the c2p conv. - the c2p table (which is, apart
from the low resolution of c2p graphics, a further disadvantage against planar-
graphics. ie. the high memory consumption !)
the chunk 2 planar table holds all the possible bitplane-combinations of 4 double-pixels
on the screen. and since every one of those pixels can have 16 colors it's
16 * 16 * 16 * 16 * 4 bytes = 256 kb huge ! (* 4 because we get longwords out of it).
but when we precalc this c2p-tbl we need to remeber that pixels 2 and 3 of that 4 pixel
'quad' are swapped - because now we gonna use the value of D0 as offset (it means that
the chunky data or 4 chunky-bytes are used as an offset to the c2p table, which holds
the bitplane data for 8 planar = 4 chunky pixels):
lsl.l #2,D0
; get the longword alignment
lea c2p,A1
; set up the address of the c2p tbl
move.l screen,A2 ; A2 points
to our screen
move.l 0(A1,D0.l),D0 ; move the needed value into D0
movep.l D0,0(A2) ; set
the first 4 doublepixels on the screen
movep.l D0,160(A2) ; do the scanline below
wow, this little program-block has set 4 double pixles on the screen ! it's
also managed to skip the lower bytes of the bitplane words using the movep
instruction, meaning that the other byte of the bitplane word won't be affected.
what you'll do now is getting the next for double-pixels and converting them
to the next screenoffset and so on. but notice that you have to skip one
scanline when you've done 160 double-pixels because the pixels are also doubled
vertically. so, that's all. did you get it ? i hope so...
and now i think it's at the time to say a few words about the
demo:
what it does is drawing and scrolling some red/yellow bars by doing
1. setting up a pattern in the chunkybuffer
CHUNKYBUFFER:
REPT 1000 ; 16000
bytes
DC.L $00010203
DC.L $04050607
DC.L $08090A0B
DC.L $0C0D0E0F
ENDR
2. doing the c2p conversion
(do this 100 times - one for each 2 scanlines)
move.w #20-1,D7 ; convert one scanline
SETPIXEL:
; = 20 * 8 doublepixels
moveq #0,D0
move.w (A0)+,D0
lsl.w #4,D0
or.w (A0)+,D0
lsl.l #2,D0
move.l 0(A1,D0.l),D0
movep.l D0,0(A2)
movep.l D0,160(A2)
moveq #0,D0
move.w (A0)+,D0
lsl.w #4,D0
or.w (A0)+,D0
lsl.l #2,D0
move.l 0(A1,D0.l),D0
movep.l D0,1(A2)
movep.l D0,161(A2)
addq.w #8,A2
bra D7,SETPIXEL
.
.
.
3. scrolling the chunkybuffer
lea CHUNKYBUFFER,A3 ; using this method
lea CHUNKYBUFFER,A4 ; the last pixel on
addq.l #1,A4 ; the screen will be cleared
move.w #16000-1,D4 ; so please don't wonder
SCROLLLOOP: move.b
(A4)+,(A3)+ ; if the patter disappears
dbra D4,SCROLLLOOP ; after a time
4. repeat step 2 until space is pressed
the whole thing runs at about 8.5 fps on a plain st (fullscreen) - which is as
i think quite 'quick' for a fullscreen fine-scroller (imagine how you would code
this with planar graphics (but keep in mind this demo is very basic and not
optimized, 'cause i wanted to keep it simple - just play around with XSIZE and
YSIZE, these 2 values determine the size of the chunky-screen. if you use a
smaller effect-window you might get higher framerates, of course :).
but please notice that XSIZE must be divisable by 16 - otherwise you will get
crap onto your screen!
last remarks
some words on optimizing....one way to optimize this algo a little bit
is to preshift every color up 2 bits (i mean your texture or image colors or whatever)
before you store them in the chunky buffer. that saves the lsl.l #2,D0 when
calculating the offset in the c2p-tbl.
there are even ways to let the chunky-buffer fall away completly.
but belive me, it shouln't be that hard to figure it out yourself and by
the way it could be a real good practice for you. ok, i'll
give you a little hint. just code your efx. that way that they directly set
4 double-pixels not getting the data out of a chunky buffer...
another way of speeding it up will be to just resize the chunky buffer when
your effect will not take up the whole screen.
just keep coding, you'll get it ;)
what's for next time ? hmm...i'm thinking of doing something on raycasting, at
least on the basics, so keep looking for my tuts, he he...
hope to see you next time,
ray - Aug. 2000
eof
|