I haven't published on this topic in a while, but that doesn't mean the project is dead. Quite the contrary.
Since my last post, I've completely re-implemented the ORG M1 core logic and hardware architecture from scratch. The instruction set has been enriched tremendously with the addition of opcodes enabling conditionals, routines, comparisons, bitwise operations, and more. System memory has also been expanded. At first I went to four channels so that one machine instruction could be fetched for every one physical page read (32-bits). After that I was presented with the issue of how to enable the machine to write instructions to memory when all you could do is load or save 24-bit registers. The most elegant solution was to go to a six-channel serial design enabling "virtual" address doubling with each hardware page (48-bits) to be able to be saved by two 24-bit STORE operations. Machine instructions are still to be fetched just one per hardware page, ignoring the least significant 16 bits in each page. The PC increments by two per instruction, referring to the "virtual" doubled addresses.
As of this week I've obtained some MRAM to replace the FeROM + SRAM design currently (sort-of) employed. First, let me complain about the bad part of what I'm doing. Magnetoresistive RAM is rather expensive. I couldn't afford to actually buy the parts for my homebrew system for ages. Monday, however, I placed an order at Digikey for six, 1 Mb serial MRAM devices. How much? USD $54. That comes out to about $0.07/KB. That much space from a hard drive runs just shy of five micro-cents.
Now, let me wax uninvitedly about the awesome part of what I'm doing. MRAM is one of the universe's gifts to mankind. Data is stored as a magnetic field on a silicon die. It's completely like SRAM except that the data isn't in the form of an electrostatic charge. The magnetic field happens to be non-volatile. Win! There are no appreciable endurance limits, there are no write delays, bits are writable without erasure, etc. It's a complete replacement for both system memory and data storage at the same time. No more ROM is required unless you're just after having a protected area of memory that doesn't get screwed up by a buggy program.
The six-channel design can yield up to 240 Mbps of throughput in a sequential transfer, but with one-byte reads per transaction like I'm using now most of the cycles are wasted on overhead. That will be changing soon. I intend to set the devices in sequential mode and leave the CS lines low, just clocking in new bits as new instructions are wanted. This can continue until the address needs to change, an event hit any time LOAD, STORE, or JUMP family opcodes are encountered.
Theoretically, this new architecture should allow me to execute up to about 21 million instructions per second compared to the current ~500 KIPS the machine is capable of. That's a very, very nice boost to performance!