Working Microcode

[ Home ] [ Up ] [ New Stuff ] [ Minix Port ] [ Magic-2? ] [ Overview ] [ Photo Gallery ] [ Construction ] [ My Other Projects ] [ Links ]

Overview

Here's a sketch of the basic block diagram:

Index

	Registers & ALUs
	Internal Busses
	Clocks
	Microcode Overview
	Microcode Listing
	Microcode Sequencer
	Fetch/Execute Sequence

Hardware design

Schematics are here.

Registers & ALUs

In most of the similar projects that I've seen, registers have been built by using TTL register files or simply SRAM. The register file devices are somewhat exotic and difficult to find, and don't have as much flexibility as I'd like. Using SRAM just didn't seem esthetically right to me, so I ended up designing with D flip-flops. For situations which the register has a single output to one of the internal busses, I use a pair of 74374 octal D flip-flops with 3-state output. Examples of this are A, B, C, DP and SSP. In cases in which the output of the register has only one non-bus destination, I use 74273 octal D flip-flops with 2-state output. Examples of this are the Instruction Register (IR) and the Page Table Base (PTB). There are also a few cases in which I need both 2-state and 3-state outputs. Here I also use 74273s for the basic register, and then provide a 3-state connection to the bus with a pair of 74241 octal 3-state non-inverting buffers. Examples of this are MAR, SP and PC.

For the main integer unit ALU, I originally designed in four of the old 74181 4-bit ALU parts with 74182 lookahead carry generator. I later, though, moved to a variant of the '181 - the 74F381. It is pretty much the same, except that it reduces the number of functions provided to the 8 useful ones (allowing me to drive the ALU with fewer microcode bits). The final configuration is 3 74F381s, one 74F382 and the 74F182 lookahead carry generator. The 74F382 is used as the most significant nibble in the ALU. It is the same as the 74F381, except that instead of providing carry generation output, it produces an overflow signal.

For left-shift functionality, I simply add the operand to itself. For right-shift, I'll have an alternate bus driver hanging off the ALU output with a hard-wired 1-bit shift to the right. There will be no direct rotate support - you'll have to do a bit-test first to detect the state of the "rotated" bit, do a normal shift and finally OR in the rotated bit. In addition to the alternate right-shift bus driver, there will also be an alternate bus driver for sign extension hanging off the ALU output. So, the ALU is going to be a bit messy, with three distinct 3-state output connections to the Z bus:

16-bit straight through (no shift, word operation)

Sign extended (no shift, byte operation, bits ALU[0..7] are copy of ALU[8].

Right shift (Z[0] gets 0, Z[1] gets ALU[0], Z[2] gets ALU[1] ... and so on)

Internal Busses

There are 3 internal busses connecting the registers and ALUs. Two busses, L and R, correspond to the left and right operands of an aluop. The other bus, Z, is the output of the ALU. In order to avoid microcode mistakes from damaging the machine, it will be possible for only one entity to drive a bus at a time (with one exception, see note below). The operand busses will have a corresponding "Enable" field in the microcode which will feed into a field decoder , the output of which will drive the 3-state enables of the various devices hanging off that bus.

If only the low byte of the Z bus is being driven by the ALU, then we automatically drive the high byte of the Z bus with a copy of bit 7 of the low byte. This provides the sign extend functionality.

The possible drivers for the various busses follow. Note that MDR can drive either or both operand busses.

R Bus

	MDR
	Immediate (from microcode - not instruction)

L Bus

	MDR
	A
	B
	C
	DP
	SP
	SSP
	TPC
	PC
	MAR
	Fault code (i.e. the output of the priority encoder)

Z Bus

ALU/Sign extend circuitry

Clocks

There is one master clock, CLK_S which is generated on the front panel logic card and travels to the control card. There, it is combined with microcode bits to generate edge-sensitive signals and drive the microcode sequencer state machine. In essence, the second half of each clock cycle (starting with CLK_S going high) is devoted to generating control signals. The first half of the following cycle is devoted to marshalling data based on the control signals, which are clocked on the rising edge of the various edge-sensitive signals.

In retrospect, this design was too simple. As a software guy, I had a tendency to think of the clock pulse causing everything to happen at the same time. In fact, propogation delays mean that the clock edge wave reaches different parts of the system at very different times. As a result, there are places in Magic-1 where I had to specifically select fast (74Fxx) vs slow (74LSxx) parts to sidestep timing issues. In general, this is not a good thing - and could have been avoided if I had a series of overlapping clocks to select from.

Microcode

In other examples of homebuilt CPUs that I've seen, remarkably little microcode was necessary. This is largely the result of having fairly wide and regular instruction encodings. In this way, bit patterns within the instruction word itself can be used to generate the control signals (RISC-like). On the other extreme, you could use the instruction word simply as an index to the starting address of the corresponding microcode program used to carry it out and not use any portion of the instruction bits to assist control signal generation. What I've done is something close to the latter. Wherever convenient, I'm using instruction bits to assist signal generation, but primarily I'm using the instruction byte as a direct index into the microcode.

My original plan had me using the 8-bit opcode as in index into a PROM of starting addresses in the main microcode store. However, that added a lookup latency in the control signal generation path, so I decided to just burn more PROM bits and have the opcode be a direct starting address.

So, for microcode I'm using five 512x8-bit PROMs. The low half of the PROM will be devoted to the first microinstruction of each instruction. Each microinstruction contains a "next" field, which will route microexecution into an appropriate spot in the other half of the microcode store. Within the sequencer there is the ability to conditionally branch. I recently also allowed a 1-deep subroutine call/return mechanism, but decided to drop it when it became clear that I had lots of unused space in the microcode PROMs. Not especially elegant, but it certainly simplified the sequencer. Oh, there's also a special microinstruction for fetching, and lots of nasty logic dealing with faults and interrupts. Interrupts will be recognized at instruction boundaries. Faults will immediately suppress clocking of results and will transfer control to a fault microcode sequence at the beginning of the next T cycle.

As far as the parts used, I'm going with 74s472's, which are expensive and hard to find. For this reason, I put together an EPROM daughter card to try things out before I burn the real PROMs. The daughter card uses fast 60ns 27C256 EPROMs and also provides a hex display to show the address of the next microinstruction. [Note: as of this writing, I'm still using the EPROM daughter card and am inclined to just keep using it permanently].

So far, the part of the M1 design that I'm most embarrassed about is the large amount of microcode I'm using. I can see how I could significantly reduce it (in particular, by factoring in T-state pulses to give particular bits different meaning during different T-states). However, given that I'm going to be hand-wiring every connection, I think it best to trade off microcode bits for reduced complexity.

Here's the current state of the M1 Microcode.

Microcode Sequencer

	Each microinstruction has an 8-bit "next" field, which tells which microinstruction follows.
	If (next==0x00), then the next microinstruction address is the 4-bit output of a 16-line priority encoder or'd with 0x100. The least priority value is the address of the fetch microinstruction. The other values represent traps and interrupts, and the encoder value will vector control off to the appropriate interrupt or trap handling microcode. The fetch line is tied active, and so will take effect if there are no traps or interrupts pending.
	if (next==0xff), then the next microinstruction address is the value of the IR (instruction register). In other words, the value of the 8-bit opcode is treated as a direct index into the microcode store.
	Otherwise, the next field is or'd with 0x100 and that value is the address of the next microinstruction.
	Which of the above three cases is used is determined by two control lines - MISC[INIT_INST] and a logical line which says whether next equals 0x00. INIT_INST is low active, and is asserted only during the fetch microinstruction.
	Next==0x00 normally happens at the end of each sequence of microinstructions which represents an M1 instruction. However, we also want to interrupt normal execution in the event of a trap, reset or interrupt. In the interrupt case, we want to recognize the interrupt only at M1 instruction boundaries. That will happen normally the next time next == 0x00. For traps and reset, though, we need to break the flow immediately - even in the middle of a microcode instruction sequence. In these cases, there is some glue logic which will assert the asynchronous clear line of the 8-bit register holding next and resetting it to 0x00. When that happens, we in effect normalize the exceptional instruction interrupt events as if they were regular instruction boundaries. The different microcode vectors for each trap or interrupt case can then handle the cleanup for any needed state rollback or fault state collection.
	Conditional microcode branches are handled using the same mechanism as the trap's next reset scheme. If a conditional microcode branch is indicated and the condition is not met, next is reset just as it would have been had there been a trap. Care was taken when writing the microcode to ensure that no traps were possible during a microinstruction which indicated a conditional branch, so there is no ambiguity.
	The conditional logic is handled by computing the various branch conditions based on the current values of the MSW condition bits. Keep in mind when looking at the logic is that when a condition is met and the machine instruction branch is taken, that we don't take the microinstruction branch. The branch microcode is structured so that if the branch is not to be taken, the microcode sequence aborts before it finished. If the branch is to be taken, the microcode continues to load the target address into PC and MAR.

Fetch/Execute Sequence

M1 features a single fetch/decode clock cycle, followed by 1 or more execute clock cycles. A lot goes on in the fetch cycle. First, note that we assume that the Memory Address Register (MAR) has been loaded with the next instruction address in T-1, and we require that all instructions set up PC for the next one. Following is a typical instruction fetch/decode/execute cycle, an immediate add of #32 to the accumulator, or "ADD.8 A,#32". To try to show the sequence of events, I'll tag the steps with [L] for low system clock, [R] for rising edge, [H] for high clock and [F] for falling system clock.

Clock 0

	[F] Fetch control signals (microcode address 0x100) clocked in and start flowing out into the system.
	[L] Asynchronous reset clears MDR; memory byte addressed by MAR and the code page table is read and flows onto DBUS; MAR and an immediate 1 are fed into the ALU and are added and fed to Z bus
	[R] Memory byte (0x34 - opcode for add.32 a,u8) is clocked into IR; the current microcode instruction's NEXT field of 0xff designates the start of an an instruction decode cycle, which means take the value in the IR as the next microinstruction; current value of MAR is stored into TPC; incremented MAR is clocked into both PC and MAR (note that on the same edge, MAR's output is clocked into TPC, and MAR gets a new value. The circuitry should ensure that the edge copying MAR's old value arrives sufficiently before the edge clocking in a new value)
	[H] The microcode sequencer goes to work, and the various muxes use the value of the IR (the 0x34 byte read from memory) as a direct opcode index into the microcode store. Microcode address 0x34 is selected and read.

Clock 1

	[F] The first microcode instruction for the add.8 a,#32 - address 0x34, is retrieved and the control signals begin flowing out to the system. This microinstruction will cause an immediate byte to be read and both PC and MAR advanced.
	[L] Memory byte addressed by MAR and code page table is read (#32 - hex 0x20) from memory and flows onto the DBUS; MAR and an immediate 1 are fed into the ALU and are added and fed to the Z bus
	[R] The immediate byte, #32, is clocked into the low byte of MDR and the incremented MAR is stored into both PC and MAR. If no traps are pending, the value of the current microinstruction's NEXT field, 0x10, is or'd with 0x100 to select the next microinstruction to execute.
	[H] The microcode sequencer goes to work, and the microcode instruction at address 0x110 is read from PROM.

Clock 2

	[F] The second microcode instruction for the add.8 a,#32 - address 0x110, is fetched and the control signals begin flowing out to the system. This microinstruction will cause the contents of the low byte of MDR to be added to the low byte of A and the result stored in the low byte of A.
	[L] The contents of register A flow onto the L bus, the contents of MDR go to the R bus, the ALU is set for an add, and the results flow onto the Z bus.
	[R] The contents of the low byte of the Z bus are clocked into the low byte of register A. The various condition codes are clocked into their appropriate spots in the MSW. The value of the current microinstruction's NEXT field, 0x00, causes the microcode sequencer to select the output of the priority encoder to choose the next microinstruction. Assuming no interrupts, the sequencer will return 0x0 and will be or'd with 0x100 to yield 0x100 - the address of the fetch microinstruction.
	[H] The microcode sequencer goes to work, and the fetch microinstruction at address 0x100 is read from PROM.

Hit Counter