Z80 Assembly - Simple Tasks

This section contains some elementary routines that perform certain actions you are likely to need in the future. Besides, they are also useful to get a bit more practice in creating more and more efficient structures. I will also try to gradually introduce a new way of commenting from now on, because it doesn’t make much sense to comment every single line. My aim is to help you develop a better skill of understanding others’ sources on your own. It is essential for you to be able to understand these routines on your own, so take the time to think about how they work. Before starting the discussion of these tasks, I have to introduce some additional instructions.

Working with data

All the computers are capable of doing basically one single thing: they manipulate data. This section gives an introduction to these possibilities in the case of the Z80.

Addition and subtraction

The Z80 processor is able to directly add or subtract both 8 and 16-bit numbers. These operations are performed by four simple instructions: add, sub, adc and sbc. (If you have been reading linearly, you could already see add in action.) Except for sub, they all have two operands, and the result is written back into the first one. The number of bits is determined by the first operand: 8-bit operations always involve the A register, while the 16-bit versions rely on HL/IX/IY. In the case of sub, there is only one operand whose value is always subtracted from A, and the result is naturally written back into A, too. The four instructions do the following:

add op1,op2
 – op1=op1+op2
sub op2
 – A=A–op2
adc op1,op2
 – op1=op1+(op2+carry)
sbc op1,op2
 – op1=op1–(op2+carry)

As I said, the first operand is either of the four ones listed above. What the second operand can be depends on the number of bits. In 8-bit operations OP2 can be an 8-bit constant, any 8-bit general purpose register (A, B, C, D, E, F, G, H, L, IXH, IXL, IYH or IYL) or an indirectly addressed byte of the memory ((HL), (IX+n), (IY+n), but not (BC) or (DE)!). However, with 16-bit operations you can only use BC, DE, SP or what you used as OP1 (i. e. add hl,ix is not possible, contrary to add hl,hl), no constants or data in the memory.

Carry is the value of the carry flag: either 0 or 1. You might ask why it is useful to include the value of a flag in some operations, since you could not see such a thing in other programming languages. This is just another thing that is naturally handled by high-level languages, but has to be programmed manually in assembly (one example is adding numbers of a bit number greater than 16). Carry usually holds the (n+1)th bit of the result of arithmetic operations. For instance, if you add two 8-bit numbers, the result generally needs 9 bits to be stored. The name ‘carry’ suggests that this 9th bit might be of some use later, that is the reason to carry it around. You will see some examples for its usage in the following sections.

Bit-level operations

You could already encounter logical operations if you have programming experience, e. g. when examining conditions like “(i=1) and (j=2)”. On the CPU level, they are preformed by the logical and, or and xor instructions. All the three need an 8-bit operand, which can be the same as that of the 8-bit arithmetic instructions (just about anything). Naturally, the A register is always involved, both as one of the factors and as the holder of the result. The individual bits are completely independent of each other in these operations, and the carry is always cleared after one of these instructions is executed (so if you want to do a 16-bit subtraction without carry, you can still do it with sbc by putting or a or and a before it, so the carry is guaranteed to be zero).

When or is performed, each bit of the result will be one if at least one of the factors had its corresponding bit set:


        %00101110
        %10011101  i. e. only two zeroes give zero, all the other combinations result in one
        ---------
        %10111111 (result)

In turn, and makes each bit of the result be zero if at least one of the factors had its corresponding bit reset:


        %00101110
        %10011101  i. e. only two ones give one, all the other combinations result in zero
        ---------
        %00001100 (result)

The third one, xor (which comes from the expression “exclusive or”) makes a bit of the result set if and only if one of the factors had its corresponding bit set, while the same bit of the other factor was reset prior to execution:


        %00101110
        %10011101  i. e. inequality gives one, while equality gives zero as a result
        ---------
        %10110011 (result)

You are going to use these instructions a lot. To close their discussion, here is a little trick: you can load zero into A by executing xor a, i. e. by xoring A with itself (just think about it why this is true). This is useful as it is faster and smaller than ld a,0. The only drawback is that it modifies the flags, but you do not usually need to preserve them for a long time anyway.

Summing one-byte numbers

Let’s assume that we have five 8-bit numbers stored beginning at the address $1000. We want to calculate their sum. For now, we assume that the sum itself will also remain within 8 bits (i. e. it will be less than 256). An unefficient but straightforward solution could be the following:


       ld b,0                         ; Initialising the partial sum
       ld a,($1000)                   ; Reading the first number into the accumulator
       add a,b                        ; Adding it to the sum
       ld b,a                         ; Writing the sum back to B
       ld a,($1001)                   ; Reading the second number into the accumulator
       add a,b                        ; Adding it to the sum
       ld b,a                         ; Writing the sum back to B
       ld a,($1002)                   ; Adding the 3rd value
       add a,b
       ld b,a
       ld a,($1003)                   ; Adding the 4th value
       add a,b
       ld b,a
       ld a,($1004)                   ; And adding the 5th value, too
       add a,b
       ld b,a

If you have the eyes of an eagle and noticed that something odd is going on at the beginning, you could already ask why not ld the first value into B instead of zeroing it first. The answer is that this is not the final version. The first improvement could be using indirect addressing instead of directly addressing every single byte of the data.


       ld a,0                         ; Initialising the partial sum
       ld hl,$1000                    ; Initialising the pointer to the first byte of the data
       add a,(hl)                     ; Adding the current data to the sum
       inc hl                         ; Proceeding to the next byte of the data
       add a,(hl)
       inc hl
       add a,(hl)
       inc hl
       add a,(hl)
       inc hl
       add a,(hl)
       inc hl
       ld b,a                         ; We want the result in B

There are two advantages of this solution. First, you can directly add the data to the sum without having to load it first into a register. Naturally, addition implies that the sum is either in A (8-bit) or in HL/IX/IY (16-bit). The other, more important change is that now it is much easier to modify the program if you decide to sum another consecutive series of five numbers in the memory – you only need to modify the initialisation part. Besides these advantages, the code also improved in terms of speed and size. However, there is still a redundant inc hl at the end... Why? If you look at the code, you can see five add-inc pairs. As these pairs are completely identical, we could as well put them in a loop.


       ld c,5                         ; Setting up the loop counter
       ld a,0                         ; Initialising the partial sum
       ld hl,$1000                    ; Initialising the pointer to the first byte of the data
     Repeat:                          ; Adds the current byte to the sum and proceeds
       add a,(hl)
       inc hl
       dec c                          ; Handling the loop (without djnz this time)
       jr nz,Repeat
       ld b,a                         ; We want the result in B

This also explains why we could not use ld for the first byte: we had to separate the initialisation from the actual calculation. Doing so is in general unefficient, but it helps to maintain a cleaner code, which is useful in the development stage – but it is certainly worth to optimise the code prior to releasing it. Introducing the loop also enables us to easily modify the number of bytes involved without having to add or remove code. However, when it is possible to use B as loop counter, it is advisable to do so:


       ld b,5                         ; Setting up the loop counter
       ld a,0                         ; Initialising the partial sum
       ld hl,$1000                    ; Initialising the pointer to the first byte of the data
     Repeat:                          ; Adds the current byte to the sum and proceeds
       add a,(hl)
       inc hl
       djnz Repeat
       ld b,a                         ; We want the result in B

After going through all this, it is important to note that using 8 bits for the sum is not really practical. Let’s extend it to 16 bits then. And to make it even better, I will also make the loop counter 16-bit:


       ld ix,$1000                    ; Pointer to the data
       ld hl,0                        ; The beginning sum
       ld bc,500                      ; Loop counter
       ld d,0                         ; The upper 8 bits of the numbers to be added
     Repeat:                          ; Adds the current byte to the sum and proceeds
       ld e,(ix)
       add hl,de                      ; Note that D=0
       inc ix
       dec bc                         ; This instruction does not modify the flags!
       ld a,b                         ; Verifying whether the counter reached zero
       or c                           ; The zero flag is set if both bytes of BC are zero
       jr nz,Repeat

If you still don’t understand how this 16-bit counter works, try to remember the principle of the or operation: if the result is zero, then both values must have been zero. To perform the operation “B or C” the value of B has to be loaded into A, because all the logical operations suppose one of the factors to be in A. Get used to this method, because it is frequently applied in practice.

Adding large numbers

Let’s say we have two 16-byte (128-bit) numbers in the memory. The first is at $1000, the second at $1010. Their sum is to be put into the 16 bytes starting at $1020. All the numbers start with the least significant byte. The magic word is carry in this case, which holds the bits transferred between the byte boundaries.


       ld ix,$1000                    ; Pointer to the first number
       ld b,16                        ; The number of bytes in each number
       or a                           ; A dummy logical instruction, used to clear the carry
     Repeat:                          ; Adds 8 bits on each iteration
       ld a,(ix)
       adc a,(ix+$10)                 ; Add with carry (the 9th bit of the previous addition)
       ld (ix+$20),a                  ; Storing the current byte of the result
       inc ix
       djnz Repeat

Note that neither 16-bit inc nor djnz alters the flags, and actually this is the reason for it. The loop would not work if the CPU designers had not thought about these cases.

Moving data blocks

This is a typical programming task, you will certainly need to move around data in your programs. Let’s start with the elementary memory block movement. The aim is to move 500 bytes of data from the address $2000 to $4000. Fortunately the Z80 processor is capable of performing this task with a single instruction:


       ld hl,$2000                    ; Pointer to the source
       ld de,$4000                    ; Pointer to the destination
       ld bc,500                      ; Number of bytes to move
       ldir                           ; Moves BC bytes from (HL) to (DE)

The ldir instruction is a composite instruction, which is equivalent to the following piece of code:


     Repeat:
       ld a,(hl)                      ; Getting the current byte
       inc hl
       ld (de),a                      ; Storing it
       inc de
       dec bc                         ; Handling the loop
       ld a,b
       or c
       jr nz,Repeat

The only difference (besides the rather obvious fact that ldir is much smaller and much faster than the loop above) is that the A register is not involved when using ldir. For the programmers’ convenience there is also an instruction called ldi which does almost the same thing except that it moves only one byte (but still updates all the three counters!).

This little instruction can also be used to fill each byte of an area of the memory with a given value. I might as well call it a little trick, but it isn’t actually a complicated one. The following code fills 500 bytes from ($2000) with 150.


       ld hl,$2000                    ; Pointer to the source
       ld de,$2001                    ; Pointer to the destination
       ld bc,499                      ; Number of bytes to move
       ld (hl),150                    ; The value to fill
       ldir                           ; Moves BC bytes from (HL) to (DE)

What happens? If you think it over, you can realise that in each iteration the preceding byte is copied into the current byte, which results in step by step copying the value of 150 at the beginning into each byte of the region. This happens because the two regions – the source and the destination – overlap. Now you could start wondering about what to do if you really want to move these 500 bytes one byte ahead instead of filling them with the same value. The solution is simple: you have to start from the end of the region, and go backwards. The instruction to do this is lddr, which does almost the same as ldir, with the only difference that it decrements HL and DE in each iteration. The example to move 500 bytes from the address of $2000 to $2001:


       ld hl,$21F3                    ; Pointer to the end of the source (500=$1F4)
       ld de,$21F4                    ; Pointer to the end of the destination
       ld bc,500                      ; Number of bytes to move
       lddr                           ; Moves BC bytes from (HL) to (DE) backwards

Note that if the overlapping is the other way around, i. e. the destination is at the lower address, you have to use ldir. Think about this before proceeding to the next section.

Manipulating data blocks
 

Simple conditions

After getting to know some elementary methods, we can start thinking about practical problems. The next task is a bit more complicated: there are 200 numbers (8-bit signed integers) stored from the address $1000, and we want to separate the negative and the non-negative numbers. We want to create two separate lists: that of the non-negative numbers at $2000 and the negative values at $3000. A possible solution could look like this:


       ld hl,$1000                    ; Pointer to the data
       ld ix,$2000                    ; Pointer to the non-negative list
       ld iy,$3000                    ; Pointer to the negative list
       ld b,200                       ; Loop counter
     Repeat:
       ld a,(hl)                      ; Getting and checking the sign of the current element
       inc hl
       cp $80
       jr nc,Negative
       ld (ix),a                      ; Storing a non-negative value
       inc ix
       jr Continue
     Negative:
       ld (iy),a                      ; Storing a negative value
       inc iy
     Continue:
       djnz Repeat

A comment for programmers of TI calculators: the IY register is reserved for the system, so you can only use it if you save its value and disable interrupts. In this example, you could use DE instead of IY, but in a normal everyday situation you will most probably find all your registers full of important data, particularly the general purpose registers (A, B, C, D, E, H, L)...

Sorting

This time I would like to show a way to implement simple bubble sort in Z80 assembly. For those who don’t know the algorithm, here is the explanation:

1. Going through the array from the beginning to the end, if two neighbouring elements are in the wrong order, we swap them. By this the greatest number will be the last element of the list.

2. We repeat the step above, but this time we do not include the last element, we stop before reaching it. This way the second greatest number will also be put into its proper place.

3. Doing the second step with a decreasing number of elements until this number becomes one. Then we are done. We will also stop sorting if there was no need to sort in an intermediate step, because that implies that the elements are already in the right order.

The code:


       ld c,NumberOfElements
       dec c                          ; Note that the first step involves N-1 checks
       ld hl,1                        ; Setting H=0 and L=1, for optimising speed
     Step:
       ld ix,ArrayAddress
       ld e,h                         ; Bit 0 of E will indicate if there was need to swap
       ld b,c                         ; C holds the number of elements in the current step
     Loop:
       ld a,(ix)
       ld d,(ix+1)
       cp d                           ; If A was less than D, the carry will be set
       jr c,Continue
       ld (ix),d                      ; Swapping order is actually performed by simply writing
       ld (ix+1),a                    ; the values back in a reversed order
       ld e,l                         ; Swapping is indicated here (L=1)
     Continue:
       inc ix
       djnz Loop
       dec e
       jr nz,Finish                   ; If E became zero after DEC, we have to continue
       dec c
       jr nz,Step
     Finish:

Of course, this is the slowest sorting algorithm, but it is easy to understand. Later, in the advanced section you are going to find an implementation of the QuickSort algorithm, too.

Searching

Another useful thing is searching byte sequences in the memory, e. g. strings in a text. The program below does the following: given the address and length of a text, and the same parameters of a string to be found in it (all four are 2-byte words), it returns the (first) address where the string is found in HL. If the text does not contain the string given, it returns 0 in HL.


     Start:
       ld hl,(TextAddress)
       ld de,(StringAddress)
       ld bc,(StringLength)
     Repeat:                          ; This loop verifies if the text from the current byte
       ld a,(de)                      ; matches the string given, character by character. If
       cp (hl)                        ; it does, then the zero flag is set. Execution is 
       jr nz,EndRepeat                ; continued from EndRepeat, regardless of the success of 
       inc hl                         ; the search.
       inc de
       dec bc
       ld a,b
       or c
       jr nz,Repeat
     EndRepeat:
       ld hl,(TextAddress)            ; Note that LD preserves the flags
       jr z,Finish
       inc hl                         ; The text pointer is advanced
       ld (TextAddress),hl
       ld bc,(TextLength)
       dec bc                         ; Total byte count is decreased
       ld (TextLength),bc
       ld a,b
       or c
       jr nz,Start
       ld hl,0                        ; This part is executed in case of failure (BC=0)
     Finish:
       ...                            ; There should be some code following here, otherwise
                                      ; execution would continue in the data part...
     TextAddress:
       .word $1000
     TextLength:
       .word 500
     StringAddress:
       .word $2000
     StringLength:
       .word 20

It was intentional that I only gave some loose comments, because by now you should be able to understand what is going on. Take the time to do so, I give you a break for now.

Back to the index