home UCM21

-----

.tHE .sIRIUS .cYBERNETICS .cORPORATION


UCM 21
--------------------------------------------------------------------------------
                   CHUNK 2 PLANAR GRAPHICS - THE WAY TO DO IT
--------------------------------------------------------------------------------


 introduction
--------------

Hi freaks... I'm Ray, the new coder of .tSCc.

This is my first tutorial so let me explain some basic things:

First of  all I got  to tell you that  all the examples will  be plain 68000 asm
code and sometimes it'll get quite  machine-dependend. So my tutorials will only
be useful with the  ST and its "brothers" - so don't even get the idea of taking
the following  descriptions for valid on a TT or  something, but I  think that's
clear somehow.

On the  second I'm  expecting  you to understand  the basics  of 68k  programing
because I  don't have  the time  to teach  you 68k  coding in all its  sometimes
'bizarre' details like fx. The diffrent addresing modes or  something (but watch
out 'cause I'm going to release a 68k-tutorial in one of the next magazines).
And if there's something you don't understand just  read it 2 or more times just
until you got it - if this doesn't help just ask me.

So, if you wanna contact me don't hesitate:

1) e-mail  : reimund.dratwa@freenet.de

2) homepage: http://rd-developments.de.gs  (my page before I joined .tSCc.)
   (don't forget to grab the newest tutorial!)



 let's get it on
-----------------

Let me tell you  some sentences  about the purpose  of this tutorial. Today I'll
try to  explain how  a technique called 'chunk 2 planar graphics', which  is the
secret of all those modern demos, can be realized on the ST.
Now you might ask, what the hell means chunk to planar?!?

Since I think  you know  how the ST's bitplane-screen  works (if  that isn't the
case read on in the next paragraph) you'll propably know how hard it is to set a
pixel and even worse, how slow.
So chunk 2 planar 'conversion' means that the  bitplanes bits are  set according
to byte-values  in a so  called chunky-buffer  where  every byte  represents the
color of  one single plot  on the screen (this  buffer could  be compared to the
PC's VGA-memory  of  a  320x200x256 screen - don't  get me  wrong, Intel  really
suckz! - I just wanted to give an example).

First you  set up  your effect  or what ever you want  in that  chunky-buffer by
simply  moving the  according bytes  into it  and in a further step the  chunky-
buffer is  converted  to  bitplane-data (hence  planar  simply  means  bitplane-
graphics). How this works exactly will be described later on...


 a brief description of the bitplanes
--------------------------------------

Skip this paragraph if you already know how the ST's screen is arranged.
As already mentioned, it's  hard to set pixels 'by hand' because the ST's screen
is somehow splitted into bitplanes.
Ok I'll be some more exact, but notice: I'm only covering ST low rez 320x200x16.
In this resolution the screen is divided into 4 so called 'bitplanes' every four
'corresponding' bits of these bitplanes indicate in which color one pixel on the
screen should appear.
So lets take a look at the first 4 words of screen-memory.

The diagram should make it clear:


 screen
                0                                       319
                -------------------------------------------
                  0 -*                -
                    -                 -
                    -                 -
                    -                 -
                    -                 -
                                      .
                                      .
                                      .
                199 -                 -
                -------------------------------------------

Assuming  we wanted  to set  the pixel  in the upper left  corner of  the screen
(0/0), marked with *, in color $D (=13 if you don't know what's hexadecimal just
stop reading) this would mean we would have to manipulate the highest bit of the
first 4 words in screen memory.

$D (hex) = %1101 (bin)

 screen memory:
                bit  15                          7                           0
--------------------------------------------------------------------------------
bitpl/word 1   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
--------------------------------------------------------------------------------
bitpl/word 2   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
--------------------------------------------------------------------------------
bitpl/word 3   | 0 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
--------------------------------------------------------------------------------
bitpl/word 4   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
--------------------------------------------------------------------------------

Got it, no - then let's do some more examples:
Just imagine we wanted  set the pixel  right beside the one  before (ie. 1/0) in
the same color. Then we'd just mainpulate bit 14 of the 4 words.

But now be careful, what if we wanted to set pixel 16/0 (the 17th on the sceen)?
The answer  is just as  simple as this: since every  single of  the 4 words only
holds  16 bits (=16 pixels) we  just  had to skip  over to  the next  4 words in
memory (5,6,7,8).

I think  you're  getting the  idea, right? But  I'll  think now  you'll  see why
setting pixels is really slow on the ST, as well. It's just because  you have to
'bit set' 4 places  in screen-memory just  for one pixel - and I think  you know
that bit operations are kinda hard for a byte/word/long aligned cpu.
Now guys, I think it's time for some code at the end of this paragraph (let's go
back  to  the first  example when  we wanted to  set the  upper left  pixel with
color $D):

  move.l screen,A0           * assume the screen addr is stored in 'screen'
  clr.w  D0                * start at offset 0
  ori.w  #%1000000000000000,0(A0,D0.w) * set   bit 15 of the 1st word of bitpl 1
  ori.w  #%1000000000000000,2(A0,D0.w) * set   bit 15 of the 1st word of bitpl 2
  andi.w #%0111111111111111,4(A0,D0.w) * clear bit 15 of the 1st word of bitpl 3
  ori.w  #%1000000000000000,6(A0,D0.w) * set   bit 15 of the 1st word of bitpl 4

Hey - we've  just set  the first pixel! Now try what happens  if you fx. move #8
into D0 instead of clearing it...


 the secret of c2p, how it works and why it's so quick
-------------------------------------------------------

The only  significant thing  I can mention on top of this paragraph  is that the
whole let's call it secret of the c2p conversion is the 68k's movep-instruction:
what  movep  actually does is moving word or long values 'byte per word' aligned
by splitting them up into bytes, so let's take a look at a little example:

  move.l  #$FE23D2E0,D0
  movep.l D0,(A0)         * this instruction kinda covers 64 bit or in other
                          * words only the upper bytes of 4 following words in
                          * memory (already smelling what we'll do ? ;) )

after the instruction:

    (A0) = $FE
   2(A0) = $23
   4(A0) = $D2
   6(A0) = $E0


or if we start at an odd addr. (if A0 holds an even one):

   move.l  #$FE23D2E0,D0
   movep.l D0,1(A0)      * now movep will will use the lower 8 bits of the data-
                         * bus or some more simple : now it covers the lower
                         * bytes of 4 following words

after the instruction:

   1(A0) = $FE
   3(A0) = $23
   5(A0) = $D2
   7(A0) = $E0


if we didn't have movep we'd have to to:

        move.l  #$FE23D2E0,D0
        move.b  D0,6(A0)
        lsr.l   #8,D0
        move.b  D0,4(A0)
        lsr.l   #8,D0
        move.b  D0,2(A0)
        lsr.l   #8,D0
        move.b  D0,(A0)

I think then we could forget our chunk 2 planar graphics - thanx motorola!

And now  go on reading very carefully because now I'll describe the  core of the
c2p conversion:

Well, what we  need to do  a c2p conv. is, like mentioned  above, 1st a 'chunky-
buffer' holding  the chunky byte  data and at 2nd a c2p  table which  is finally
used for the conversion.
Don't be afraid I'll give  you some more  detail...first of all let's talk about
the chunky buffer:

 to get the  conversion anywhere  near reasonable  quick we  need to  double the
 pixels horizontally  and vertically, what means that  one 'chunky-byte' in fact
 represents  2x2 pixels on the screen and it means that we'll get a  virtual rez
 of 160x100 'chunky-pixels', of  course  that's  why the  chunky-buffer is sized
 160x100  bytes = 16000  bytes (notice: the  size of  the  chunky buffer  may be
 variable but to keep it simple I'm  talking of the fullscreen conversion).

Now some words on the c2p table and the actual conversion:

 what we will do now - let's assume the first 4 bytes of the chunky buffer would
 have the values  $03,$04,$0D,$02 (these 4 values represent 4 chunky-pixels or 4
 double pixels on the screen).
 because only  the lower 4 bits of these values are  used (16 colors!) we're now
 able to 'pack' those 4 bytes into one word the following way:

  lea     chunkybuffer,A0 * A0 now points to the chunky buffer

  moveq   #0,D0           * clear D0.l
  move.w  (A0)+,D0        * D0 = $0304 - 1st 2 values out of the chunky buffer
  lsl.w   #4,D0           * D0 = $3040 - shift them up one nybble
  or.w    (A0)+,D0        * D0 = $3D42 - that's the four 1st values in one word

 But stop, values 2 and 3 have been swapped somehow, wouldn't it be more simple
 if D0 would contain $34D2 like in the chunkybuffer??
 That brings us to the next element of the c2p conv. - the c2p table (which is,
 apart from  the low resolution of c2p graphics, a further disatvantage against
 the planar-graphics. ie. the high memory consumption!)
 The chunk 2 planar  table holds  all the  possible  bitplane-combinations of 4
 double-pixels  on the screen. And since  every one of those pixels can have 16
 colors  it's 16 * 16 * 16 * 16 * 4 bytes = 256 kb  huge! (* 4 because  we  get
 longwords out of it).
 But when  we precalc  this c2p-tbl we need to remeber  that pixels 2  and 3 of
 that 4  pixel 'quad' are swapped - because now we gonna use the value of D0 as
 offset:

     lsl.l   #2,D0              * get the longword alingment
     lea     c2p,A1             * set up the address of the c2p tbl
     move.l  screen,A2          * A2 points to our screen

     move.l  0(A1,D0.l),D0      * move the needed value into D0
     movep.l D0,0(A2)           * set the first 4 doublepixels on the screen
     movep.l D0,160(A2)         * do the scanline below

 Wow, this little program-block has set 4 double pixles on the screen! It's also
 managed  to  skip  the  lower  bytes  of the  bitplane  words  using the  movep
 instruction, meaning  that  the  other  byte of  the  bitplane  word  won't  be
 affected.
 What you'll do now is getting  the next for  double-pixels and  converting them
 to the  next  screenoffset  and  so  on. But notice that  you have to  skip one
 scanline when you've done 160 double-pixels because the pixels are also doubled
 vertically. So, that's all. Did you get it? I hope so...


 And now I think it's at the time to say a few words about the demo:

 what it does is

 1. setting up a pattern in the chunkybuffer

    CHUNKYBUFFER:   REPT 1000           * 16000 bytes
                    DC.L $00010203
                    DC.L $04050607
                    DC.L $08090A0B
                    DC.L $0C0D0E0F
                    ENDR

 2. doing the c2p conversion

(do this 100 times - one for each 2 scanlines)

                    move.w  #20-1,D7            * convert one scanline
    SETPIXEL:                                   * = 20 * 8 doublepixels
                    moveq   #0,D0
                    move.w  (A0)+,D0
                    lsl.w   #4,D0
                    or.w    (A0)+,D0
                    lsl.l   #2,D0
                    move.l  0(A1,D0.l),D0

                    movep.l D0,0(A2)
                    movep.l D0,160(A2)


                    moveq   #0,D0
                    move.w  (A0)+,D0
                    lsl.w   #4,D0
                    or.w    (A0)+,D0
                    lsl.l   #2,D0
                    move.l  0(A1,D0.l),D0

                    movep.l D0,1(A2)
                    movep.l D0,161(A2)

                    addq.w  #8,A2
                    bra    D7,SETPIXEL
                        .
                        .
                        .


 3. scrolling the chunkybuffer

                    lea     CHUNKYBUFFER,A3
                    lea     CHUNKYBUFFER,A4
                    addq.l  #1,A4

                    move.w  #16000-1,D4
     SCROLLLOOP:    move.b  (A4)+,(A3)+
                    dbra    D4,SCROLLLOOP


 4. repeat step 2 until space is pressed

 The whole thing runs at about 8.5 fps on a plain ST - which is as I think quite
 fast for a  fullscreen  fine-scroller (imagine  how  you  would code  this with
 planar graphics X(. but keep in mind this demo is very basic and not optimized,
 'cause I wanted to keep it simple.


 last remarks
--------------

Some words  on optimizing....one  way to  optimize this  algo a little bit is to
preshift every color up 2 bits (I mean your texture or image colors or whatever)
before  you store  them in  the  chunky buffer. That saves  the lsl.l #2,D0 when
calculating the offset in the c2p-tbl.

There are  even ways to  let the chunky-buffer fall away  completly. But believe
me, it shouln't  be that hard to figure it out yourself and by the way  it could
be a real good practice for you.
Ok, I'll give you a little hint. Just write a rout that directly sets a 4 double
pixel 'quad' not getting the data out of a chunky buffer...

Another way of speeding it up will be to just resize the chunky buffer when your
effect will not take up the whole screen.
Just keep coding, you'll get it ;)

What's for next time? Hmm... I'm thinking of  doing something  on raycasting, at
least on the basics, so keep looking for my tuts, he he...

hope to see you next time,

.tSCc.                                                                       Ray
--------------------------------------------------------------------------------

UCM 21