After BASIC, the most popular method of programming home computers would appear to be machine code. This 'language' offers the advantage of being in-built in the computer (as it is fundamental to its operation), and readily accessible from BASIC through the USR function. The attraction of machine code is that programs run much faster (often 100 times faster than BASIC), but there is a price to be paid for that speed. Programming in machine code is considerably more difficult than BASIC.
The series is intended for complete beginners, but if you already know a little machine code, then I hope you'll still find a lot to interest you in this series. In each part, I shall provide some theory on machine code, and some examples. To begin to understand machine code, we have to know something about how computers work. And that's where I'll start.
Most modern-day computers, and certainly all home computers, process information stored in two states; we can think of these states, or BITS, as 0 or 1 - off or on - similar to a switch. Most home computers store information in blocks of eight bits. A BYTE is the name given to a block of 8 bits. Since each bit can have one of two possible states, then there are a total of 256 (2^8) combinations of bits within a byte. If you don't want to take my word for it, see how many different combinations of 1's and 0's you can produce by combining 8 at a time!
Instead of thinking of a byte as a block of 8 bits, we can think of it as a number, an integer between 0 and 255. Each number represents a certain combination of bits. For more advanced machine code, we'll have to go back to examining the bits that make up a byte, but for now we can use machine code by just using numbers.
In this way, numbers (0 - 255) are stored in the computer's memory. ROM (Read Only Memory) forms a fixed store of numbers, while RAM (Random Access Memory) allows the stored numbers to be modified by the user. Each byte in the computer's memory is identified by an ADDRESS. The address is itself a number, and it will normally be between 0 and 65535 (the exact range depending on the amount of memory built into the computer and certain other features of the microprocessors).
Simply then, the computer works by moving and manipulating numbers, under program control. The control is provided by a special 'chip' called the central processing unit (CPU). In the case of the Spectrum the CPU is the Zilog Z80. A block diagram of the Z80 is shown below. It is this chip that manipulates machine code instructions to operate the computer. So, to understand machine code, we need now to take a closer look at how the Z80 works.
There are a number of internal 8 bit stores in the CPU. These are similar to bytes in memory, and are called REGISTERS. They store (and manipulate) numbers in the CPU. They are given names A, F, B, C, D, E, H, and L. There are also some special purpose 16 bit registers called IX, IY, SP, and PC. Throughout the series I'll deal with most of these.
The CPU is connected to the rest of the computer through a number of connections called 'buses'. Simply, these comprise 8, or 16, 'wires' to transport electrical signals to and from other parts of the computer. To receive an instruction, the first thing the CPU does is to put onto the address bus the contents of the PC 16-bit register. PC is short for Program Counter, and it contains the address in memory where the next machine code instruction is held. That itself raises a lot of questions, which I hope to answer later. Circuitry outside the CPU decodes the message on the address bus to 'activate' or access the required address in memory. The contents of that address are placed on the DATA BUS, and this number is then transported to the CPU.
This number is a coded instruction to the CPU (called an OPCODE) which is decoded by the CPU, which then follows a fixed sequence of operations appropriate to that instruction. When complete the PC register is incremented (in BASIC this would be LET PC = PC + 1), and the next instruction is fetched from memory.
The Z80 CPU understands over 800 different instructions. Clearly, as one byte from memory can only hold 256 different numbers, sometimes more than one byte is required to complete the opcode; there are a large number of two byte opcodes. The CPU understands that, when the first byte is decoded, it requires a second byte to complete the instruction. To fetch this number, the PC register is incremented, and the same sequence of events, as described earlier, takes place.
Many instructions to the CPU require some data to be provided. These appear as numbers which follow immediately after the opcode in memory. These are fetched from memory in much the same way as opcodes, but are transported to a different part of the CPU, depending upon the requirements of the opcode. These data bytes are called OPERANDS. There can be one or two operands per opcode. Therefore, a single instruction to the CPU can be anything from one to four bytes long.
Other 'wires' from the CPU send out, and receive, control signals which ensure that all these operations occur at the right time. The CPU is under the control of a clock which acts in much the same way as a metronome, beating out a time sequence with which the CPU must keep step. The faster the clock, the faster the CPU will work, within the physical limits of the CPU, of course.
A common mistake in terminology is to mix up the names machine code and assembly language. I hope it is clear from my description what constitutes machine code; it is the sequence of numbers held in the computer's memory which give the CPU instructions on what to do. While the CPU understands numbers, these are not readily understood by humans, who would prefer something closer to the written language. For example, when the CPU receives the number 62 (as an opcode), it interprets this (in human terms) as "load the A register with the number in the next memory location". That description is somewhat cumbersome, but we could adopt a shorthand which means the same thing. Assembly language gives us that shorthand - LD A, n (LoaD into A the number n). Assembly language, therefore, is a descriptive shorthand of machine code operations. It is easier to write machine code in assembly language, then have a special program, called an assembler, translate assembly language into machine code. The reverse translation is called disassembly.
This series will cover both machine code and assembly language, all examples will be provided in both forms.
Machine code is a series of numbers stored in the computer's memory; these numbers need to be kept in a safe place where they can remain undisturbed by the invisible workings of BASIC. One of the safest storage area's is above RAMTOP (the address in RAM which is the highest accessible to BASIC). Lowering RAMTOP creates a safe area of RAM into which machine code can be loaded.
Once RAMTOP is lowered (we'll deal with that in a moment), then machine code can, be entered. For now, the best way to do this is use a BASIC loader; a program which POKEs values into a series of bytes above RAMTOP, these values being the numbers which make up machine code. The examples in this article all use the BASIC loader principle.
Once the machine code is in place, you then have to get it to operate. The BASIC word for this is USR. USR is a function; that is, to complete its syntax, it has to have what's called an ARGUMENT. The argument is the starting address of the machine code routine, the valid BASIC instructions to operate a machine code routine are:
PRINT USR 32000
RANDOMIZE USR 32000
LET T = USR 32000
IF USR 32000 THEN ....
There are others, but these are the ones most commonly used.
The way USR works is to store the current contents of the PC register, then puts into PC the argument of USR. The reason the original address in PC is stored is that, once the machine code is complete, a return to BASIC is possible provided that the machine code is written to allow that return. A machine code routine can be much like a subroutine in BASIC, with the BASIC program continuing from the place it left off once control is handed back to BASIC.
Like RETURN in BASIC which ends a BASIC subroutine, there is a machine code instruction which ends a machine code subroutine. In assembly language this mnemonic is RET; the machine code value is 201. You'll see this in most of my examples, for without it (or something similar) the computer cannot hand back control to BASIC. The result of this would be that the computer continues without you being able to stop it, or it gets totally confused - and crashes! The only solution to either of these is to pull out the power lead, which wipes everything from RAM, and start again. This is one of the main frustrations of machine code, for if you make the slightest error, a 'crash' is likely, which means you lose everything you placed into the computer. The golden rule of machine code is to save on tape any machine code BEFORE you run that machine code - just in case.
Earlier, I mentioned that there are several stores of data in the CPU called registers. Each register holds an 8 bit number, giving a range of 0 to 255. The Z80 has the facility for combining certain registers, such that the combined register can hold a 16-bit number. This provides an effective range of 0 to 65535. The combined registers are H with L, D and E, and B with C. H, D, and B are the 'high' bytes, while L, E, and C are 'low' bytes. The 16-bit number is calculated as 256 times the value in the high register plus the value in the low register.
You may have noticed from the USR instructions given earlier that a machine code routine can give a numerical value to BASIC, which can be printed on the screen (PRINT USR ...) or assigned to a variable (LET T = USR ). The number handed to BASIC is the 16-bit number in the BC register at the time of the return to BASIC. The easiest way to demonstrate this is to try a few examples.
To try the examples, simply load them into your speccy and RUN them.
Simple return to BASIC
All you have in example 1 is a simple return to BASIC. So, after you have loaded the single byte, and activated it with RANDOMIZE USR 30000 nothing happens. However, you may like to consider it an achievement to enter a machine code routine, and return safely from it! It's just like a BASIC subroutine in which the first line is RETURN. If you use PRINT USR 30000, then a number is printed on the screen. It should be 30000; the USR routine puts this number into BC as well as into PC, so this is the number which is in BC on return to BASIC, and so is printed on the screen. The remaining examples all modify the contents of BC before the return to BASIC so use PRINT USR 30000 with all of these to see the effect.
Load a number into B and C
|LD B,1||6, 1|
|LD C,50||14, 50|
In example two the B and C registers are loaded with two numbers 1 and 50 respectively. Notice that the opcode (LD B) is followed by the operand (1) the number loaded in B; similarly the operand 50 follows the opcode (14) to load a number into C. The BC register pair now contains (1 * 256) + 50 = 306; this is the number you should see printed on the screen.
Load a number into BC
|LD BC,500||1, 244, 1|
A different way of loading a number into BC is shown in three. A single opcode (1) instructs the CPU to load the next two numbers into BC. Note the first number (244) is loaded into C, and the second into B. This is a machine code convention - in a two byte number, the low byte is dealt with first, then the high byte. LD BC, number is a three byte instruction, the first byte is the opcode, followed by two operand bytes. There are similar instructions to load the HL and DE registers, one at a time, or as a pair.
Load the contents of HL into BC
In example four, the contents of the HL register are loaded into BC before return. It is possible to load the contents of any one register into another, but there are no instructions which move one register pair to another. The result you see printed on PRINT USR 30000 may well vary, as this depends on the contents of HL at the time of calling the routine.
PEEK an address in memory
|LD B,0||6, 0|
|LD A,(30000)||58, 48, 117|
The final example mimics the BASIC instruction PRINT PEEK 30000. The B register is loaded with 0, and the A register is loaded with the contents of byte address 30000. The brackets around 30000 in the assembly language instruction LD A, (30000) means 'the contents of'. As there is no instruction LD C, (30000), we have to load this byte value into A first, then transfer it to C with LD C, A. Address 30000 contains the first byte of our machine code routine, so you should see 6 printed on the screen.
So far, the machine code examples have not been earth-shattering, but, as it is said, from acorns (small a!) mighty oaks do grow. So, if you've got this far, and followed most of what I have said, then in the next issue, I'll be introducing more of the instruction set, and have a few more examples, some which might just produce a "wow".