Thursday, November 17, 2011

SVGA Output Test (Now With More Smoke!)

Last night marked my first glorious step forward toward the glamorous world of VGA-style output capability.  After months' of off-and-on research into how VGA timings work as well as the physical layer required to make it happen, I made my first attempt at bringing my ideas into the real world last night.

It didn't work.

Specifically, it didn't work because when I went to program the CPLD I was going to use as a video logic chip with my hello-world-vga-pattern-test configuration, I connected the power wires backwards.  Unfortunately, this had the effect of frying both the CPLD and the oscillator attached to it within seconds.  I didn't have the right frequency oscillator anyway, however, so it was only ever going to be an oscilloscope waveform proofing run.

I've got the proper 40 MHz oscillator on order.  I should be back in business early next week.  Hopefully, my test of 800x600 @ 60 Hz will prove successful.

Saturday, October 1, 2011

Milled PCBs for the M1

For a long time I've been been etching circuit boards as needed using ferric chloride.  The process is simple enough (though tedious):  Print a mask using a regular laser printer, melt the toner onto a blank copper clad board, remove the paper, and eat away the non-masked areas of the board by dipping it in ferric chloride.

I mentioned with my last post that I had ordered six 1 Mb MRAM ICs for the M1.  The trouble comes in how to assemble the memory for the system.  I had originally planned to put down six DFN pads in FreePCB and let the board auto-route, but with the clearances that I have to work with due to the limits of my PCB manufacturing method it just wasn't working.  The router was choking with quad channel memory, yet alone six-channel.

Those thoughts have prompted me to investigate (once again) building a small CNC mill for PCB manufacture via trace isolation.  It's more or less the mechanical equivalent to what I do now chemically.  The benefits are numerous, however, in terms of labor required and resolution achieved.

The mask & etch process goes like this:
  1. Put on clean, un-powdered latex or nitrile gloves and wet-sand the surface of a copper clad board first with ~150 grit until the copper loses its initial texture and fingerprints are no longer visible.
  2. Wet sand the surface with 200-220 grit until the copper becomes smooth again.
  3. Wipe the board with a clean, oil-free cloth of some sort while under water until the cloth quits accumulating copper dust.  DO NOT TOUCH THE SURFACE.
  4. Remove from water and wipe dry with a different clean, oil-free cloth.  Leave out in the sun to dry completely.  Store in a plastic bag with a desiccant, wrapped in a clean, oil-free cloth until needed.
  5. Print a photomask onto photopaper with a laser printer.  Repeat if necessary until a good print (free of pinholes, scratches) is achieved.  DO NOT TOUCH THE PRINTED SURFACE.
  6. Carefully scribe the paper to roughly fit the outline of the printed area or the PCB.  They're usually the same size.
  7. Melt-transfer the mask to the copper board by putting the printed side of the mask against the copper board, applying even pressure, and heating to 250-300F (depending on toner brand and series) for 2-3 minutes.  20 PSI for 2m45s at 275F seems to work well for black, genuine Samsung toner.  An industrial heat press or a lamination machine works.
  8. Soak the paper in water until soft.  Scrub the paper with a stiff plastic brush until it is wholly removed, leaving just toner.  If your transfer wasn't perfect or if you scrub -too- hard you'll start to lose flakes of toner around edges.  This will happen to a small degree even on a "perfect" transfer.  Some areas will also have run more than others, reducing realized resolution.
  9. Double-glove using nitrile.  This chemical is bad for your health.  Etch the board by immersing it in ferric chloride and agitating the solution for 5-20 minutes while the board etches.  Remove after the minimum amount of time required to remove all unwanted copper.  Excess time leads to undercutting of traces since the walls of copper tracks and pads aren't covered by toner.
  10. Rinse and clean throughly with water.  Ferric chloride is toxic.
  11. Remove the toner mask with acetone while wearing butyl rubber gloves.  Rinse the board again. 
  12. Drill all necessary holes one-by-one with a drill press.  Go slow and carefully so as to get as good of hole alignment as possible. 
Milling removes all of those steps.  All of them.  They are instead replaced by:
  1. Export gcode from your CAD package for your design.  This is used by the mill controller to route the engraving bit around.
  2. Export a drill file from your CAD package.  This is used by the mill controller for drilling the holes.
  3. Mount a copper clad board to the work table and define the "home" position.
  4. Chuck in an engraving bit, press start, and go have a beer.
  5. Change the engraving bit for a drill bit.  Press resume.  More beer.
That's it.  No more careful pre-cleaning is required, no handling concerns exist, no toxic chemicals and protective equipment is needed, no manual labor during production is needed, etc.  You just press a few buttons and come back to find a superior board.

In a few months, I'm hoping to have my small mill project finished so that I may resume work on the M1.

Thursday, September 15, 2011

Six Channels of MRAM Goodness

I haven't published on this topic in a while, but that doesn't mean the project is dead.  Quite the contrary.

Since my last post, I've completely re-implemented the ORG M1 core logic and hardware architecture from scratch.  The instruction set has been enriched tremendously with the addition of opcodes enabling conditionals, routines, comparisons, bitwise operations, and more.  System memory has also been expanded.  At first I went to four channels so that one machine instruction could be fetched for every one physical page read (32-bits).  After that I was presented with the issue of how to enable the machine to write instructions to memory when all you could do is load or save 24-bit registers.  The most elegant solution was to go to a six-channel serial design enabling "virtual" address doubling with each hardware page (48-bits) to be able to be saved by two 24-bit STORE operations.  Machine instructions are still to be fetched just one per hardware page, ignoring the least significant 16 bits in each page.  The PC increments by two per instruction, referring to the "virtual" doubled addresses.

As of this week I've obtained some MRAM to replace the FeROM + SRAM design currently (sort-of) employed.  First, let me complain about the bad part of what I'm doing.  Magnetoresistive RAM is rather expensive.  I couldn't afford to actually buy the parts for my homebrew system for ages.  Monday, however, I placed an order at Digikey for six, 1 Mb serial MRAM devices.  How much?  USD $54.  That comes out to about $0.07/KB.  That much space from a hard drive runs just shy of five micro-cents.

Now, let me wax uninvitedly about the awesome part of what I'm doing.  MRAM is one of the universe's gifts to mankind.  Data is stored as a magnetic field on a silicon die.  It's completely like SRAM except that the data isn't in the form of an electrostatic charge.  The magnetic field happens to be non-volatile.  Win!  There are no appreciable endurance limits, there are no write delays, bits are writable without erasure, etc.  It's a complete replacement for both system memory and data storage at the same time.  No more ROM is required unless you're just after having a protected area of memory that doesn't get screwed up by a buggy program.

The six-channel design can yield up to 240 Mbps of throughput in a sequential transfer, but with one-byte reads per transaction like I'm using now most of the cycles are wasted on overhead.  That will be changing soon.  I intend to set the devices in sequential mode and leave the CS lines low, just clocking in new bits as new instructions are wanted.  This can continue until the address needs to change, an event hit any time LOAD, STORE, or JUMP family opcodes are encountered.

Theoretically, this new architecture should allow me to execute up to about 21 million instructions per second compared to the current ~500 KIPS the machine is capable of.  That's a very, very nice boost to performance!

Tuesday, July 19, 2011

Divide to Conquer

Division has been a concern of mine for quite some time now. Being able to divide numbers is an important part of every day calculations, after all. You have fifty dollars. How many ten dollar items can you buy? That's the sort of thing you'd want to be able to figure out using a computer and could reasonably expect the system to be capable of. Yet, if you don't have a computer with a DIV instruction, you can't directly figure that particular class of problem out.

Multiplication presents a similar sort of problem. The M1 contains logic for doing a multiply instruction intelligently, however. If you use a ripple carry adder architecture and do some clever bit fiddling, you magically get a product from two inputs. The same can be done for division, but I've not yet figured it out.

I've decided to take the grade-school math route until I figure out more advanced methods that I seek and take advantage of the fact that division is functionally just a loop of subtraction operations. Keep track of how many times you were able to subtract the divisor from the dividend and you end up with the quotient. Whatever is left over that's too small for the divisor is your remainder.

There are obvious tradeoffs to this approach. First of all, the time to complete it is wholly dependent upon how large the quotient ends up being. The quotient interestingly ends up being a measure of system cycles consumed by the bulk of the operation. 40 / 5 would take 100 nanoseconds at the M1's current clock speed while 2,000,000 divided by 1,000 would take 25 microseconds (25,000 ns).

While not an ideal solution, it's important enough for this instruction to be there that I'll include this suboptimal method of obtaining quotients and remainders pending the development of a better method.

Monday, April 18, 2011

UARTs

As it turned out, the issue I had in my last post was power related.  The ground path for the charge pump capacitors on the transceiver I was using was too long.  I corrected this and my skew issues went away, interestingly.  With this came clean data at 115.2 Kbps--the maximum I could do over traditional serial.

I took that opportunity to start playing with flash memory, but I'll talk more about that later.  The real thing I want to get on the record is that I later used a USB UART transceiver instead of an RS-232 one.  I've been playing with it for a while now, and it seems to have an issue where the connection will seize until power is cycled on the transceiver IC itself.  Resetting via the reset pin doesn't fix it.

From what I can tell, it's a glitch in the MCP2200 UART IC when blink mode is enabled for the RX/TX LED output pins.  If the pins are disabled or set to toggle instead, this doesn't happen.  The real curious part that makes troubleshooting it difficult is that the problem only manifests at high bitrates.  It's likely to work with blink mode enabled at 1200 bps or even 9600 bps.

Wednesday, April 6, 2011

Bit Skew, Oh How I Dislike Thee

Things have gotten pretty interesting ever since I finished up the basic instruction set of the ORG M1 CPU.  I still enjoy watching the indicator lights on the memory interface that merrily twiddle away when crunching through memory instructions (LOAD and SAVE).

Now that everything is working and I've done programs as large as 13 instructions by setting bitlines in the CPU itself that emulate reading a program out of memory, it's time to load larger programs into the system's non-volatile storage.  To facilitate this, I chose to add an RS-232 serial port to the system and load programs straight from my PC.  That's where I'm currently getting hung up.

It seems that I'm registering bits improperly no matter what I do.  I can't seem to figure this one out (yet).  I've managed to get the system to clock in data correctly at about 300 baud, but there's no reason why 115.2 Kbps shouldn't work too.  The UART I'm using (Sipex SP3232ECP) is capable of a quarter megabit per second when using one channel, and the card in my PC is supposedly good for 1 Mbps or better.

I've done many sorts of diagnostics counting bit transitions and such but to no avail.  In that past I've used this same transceiver to send data at 115.2 Kbps just fine.  I'm as yet unsure why RX is being such a bitch.

Sunday, April 3, 2011

ORG M1's First Program

Now that's quite an exciting turn of events.  This weekend I was able to get the "ORG M1" CPU to the point where it was able to run its first two programs!

The first "program" is only really sort of a program.  What was executed was entirely hardwired into the CPU itself, but it was able to perform the following automatically:
  1. Power on
  2. Wait 250 million cycles (2.5 seconds)
  3. Load the A register of an 8-bit adder with the value stored in page 0 of the attached F-RAM (loaded earlier by hand using switches)
  4. Load the B register of the same adder with the byte in page 1
  5. Display the Y register of aforementioned adder on the UI panel (LEDs)
  6. Stop
This was exciting and all, but it was only half of a program, truth be told.  There were no opcodes and the program wasn't loaded and executed dynamically from memory.  It was an important first step, however.

Further excitement came with today's developments:  The first program executed using opcodes.  The storage holding the program was again registers in the CPU itself, but this was only because the first program's function was itself to write data into memory containing an actual program.  :)

After some frustrating events where the CPU repeatedly refused to do anything other than enter an exception state and angrily cause the UI to blink rapidly at me, I was finally able to get the program to execute automatically and finally complete.

There's still more work to be done as only a couple of opcodes are implemented, but once the LOAD and STORE opcodes are finished I can move the code into non-volatile F-RAM instead of relying on hard-coded bit lines in the CPU itself.