The Nintendo Entertainment System, the Famicom in Japan, will definitely go down in history as one of the most popular game consoles in history. It’ll also go down as one of the most cloned; today, rather than looking at NOAC (Nintendo-on-a-chip) or discrete clones from the 90’s, we’ll take a look at what 2019 has to offer. Or, er, a few years before actually, but they’re still selling it, so it counts!

Our weapon of choice

To compare these systems, we’re going to use the “Nicole Simulator”, a simple BASIC program that prints “nya” 10,000 times. This can be timed with a stopwatch, is repeatable, and though it isn’t the best benchmark, it at least… well, er, it’s what I used in other blog posts so I’m not stopping now!

But wait, you say! How can you do a benchmark that requires a keyboard? Easy! Nintendo released a keyboard for the Famicom in Japan, along with a BASIC software cartridge, known as “Family BASIC”. This is a custom Hudson BASIC, rather than the Microsoft-derived BASICs you usually see, but that doesn’t matter here.

Here’s our keyboard:

The Family BASIC keyboard

And here’s our program:

A BASIC program that prints nya 10,000 times

The Control Group

To make this a fair comparison, I have two authentic 80’s Famicoms that we’ll be comparing them to. They’re connected to different monitors, and are both over composite video. First off, the Sharp Twin Famicom.

My Sharp Twin Famicom. It's Red.

And, its tag team partner, the machine I affectionately call “Famicom-chan”, a Nintendo Famicom that suffered from a botched composite mod, that I managed to fix. (See my twitter for more details on that)

A Nintendo Famicom, somewhat yellowed

Both ran the program described above, and we got the following results:

  Time to 10k nya (mm:ss)
Sharp Twin Famicom 8:21
Nintendo Famicom 8:20

Note that I fully expect their time to be the same; this one-second delay can be attributed instead to me timing poorly. We could also call this 500 seconds, which is awfully convenient.

That’s 0.05 seconds per nya. The 6502-like Ricoh 2A03 CPU of the Famicom runs at 1.79 MHz. That gives us a whopping 89,500 clock cycles per nya. Let’s compare that to some other systems:

Computer CPU Clock speed Time to 10K nya (mm:ss)
Apple ][plus MOS 6502 1 MHz 3:22
Tandy 1000 HX Intel i8088 7.16 MHz 4:39
Tandy 1000 HX NEC V20 7.16MHz 4:05
Sega SC-3000 Zilog Z80 4.00MHz 9:54
Tandy Color Computer 2 Motorola 6809A 895kHz 1:47
Nintendo Famicom Ricoh 2A03 1.79MHz 8:20

It’s a respectable performance, given it’s a console with 2KB of work RAM; faster than the Sega SC-3000, which was actually intended to be a computer.

Now, what are we comparing it to?

The Challenger!

Our challenger is an “FPGA” console. FPGA stands for Field-Programmable Gate Array, and when you consider that a processor is just at its heart a series of logic gates, you can see that this is basically a chip you can program. This is perfect for enthusiast consoles, which can’t justify the volume required to justify a custom chip. It also allows for firmware updates.

A Retro USB AVS

And there it is. The RetroUSB AVS. It’s a really nice machine, with a stylish case, micro-USB power and HDMI output. Admittedly, though, that sleep design was clearly made assuming you’d spend most of your time playing NES games. For a Famicom owner like myself, well, that lid can easily get in the way.

A Retro USB AVS with its plastic lid propped open by the tall Family BASIC cartridge

And before I say anything else, I feel like I have to stress that I’m very, very happy with the AVS. The image quality is great, despite “only” being 720p. Look how crisp those pixels are: (and yes, I like crisp pixels– blame growing up with the Game Boy)

A Retro USB AVS video output showing colored squares

So anyway, this will surely give an accurate score in the Nicole Simulator!

  Time to 10k nya (mm:ss)
Sharp Twin Famicom 8:21
Nintendo Famicom 8:20
RetroUSB AVS 8:31

So, the AVS is 10 seconds slower than the Nintendo systems in our benchmark. But why? As an FPGA designed by enthusiasts, it should be perfect, right? Sure, no one will ever notice 2% slowdown, but why is it there?

Let’s talk about video standards

The Nintendo Famicom and NES use the Ricoh 2C02 “Picture Processing Unit”. This chip only outputs composite video; there is no internal RGB, it’s composite through and through. (This is why NES RGB mods require user-provided palettes) It was only ever intended for an analog output.

Analog world is tricky. Every part has to have some leeway; capacitors have variance, even crystals aren’t perfect, and when you’re capturing a signal over the air using 1950’s antenna technology, all sorts of things can go wrong. Composite inputs have to take all of that into account. Therefore, it’s not a big deal when you learn that the 2C02 outputs video at 60.10 Hz, even though the NTSC standard used in North America should run at 59.94 Hz.

Digital video, meanwhile, is a lot more strict. Digital signals generally either exist and are perfect, or aren’t readable at all. A lot of modern TVs expect anything that comes into the HDMI inputs to be perfect. Therefore, you want to make sure you’re giving it as close as possible to 59.94Hz. (Why is it such a strange number? Well, I could write a whole blog post on that… Long story short, blame the NTSC)

That’s a 0.2% speed increase. The NES has a single master clock that’s divided down to get all the other frequencies the system needs. So to slow the video down, to keep everything synchronized, you need to slow the processor down too.

Wait, is that all?

Honestly, I thought that timing slowdown would be the main reason for the slower performance. But the numbers don’t add up. Why would it take ~91,000 clock cycles per nya? That’s an increase of 1,500 cycles!

One thing to note is that I left the hotkeys enabled for the AVS menu while running this test. That means the AVS needs to take some time each frame (and with 0.05 seconds per nya, we have three frames) to read the joypad. The NES joypads use a shift register. To make a long story short, a shift register is a chip that can convert data from being parallel (requiring multiple wires) to serial (requiring one wire). This means you don’t need a separate wire on the controller port for each button.

I’m going to make the assumption that the FPGA in the AVS uses the same processor to run the controller checking code that it already has; the Ricoh 2A03. This core already exists, and already is wired into the controllers. Therefore, to read from the NES controllers, you need to do the following:

  1. Write to the shift register latch to have it save the current values of the 8 parallel bits
  2. Write again to get the shift register to be in the mode to send the bits over the line
  3. Read the values one bit at a time

On the 6502, one way you might write a byte is to load it to the “accumulator”, a register in the processor (one command, LDA), and then write it (STA).

An “immediate mode” LDA (using a value we know, rather than having to look for memory) takes 2 cycles on the 6502 processor. (My source is here) STA takes 4 cycles. So it should take 12 cycles to do the first two steps.

Looking at some example code for step 3 on Nesdev, we note that LDA is slower when reading from memory (3 cycles), and they use an LSR (2 cycles), ROL (2 cycles), and the BCC branch instruction (3 cycles when it branches). These are used as a loop over the 8 bits. (80 cycles).

So all three steps take 92 cycles. And due to a bug with the PCM sound, you actually need to do this (at least) twice per frame. We’ll call it an even 200 cycles per frame. Combine that with the time needed to actually use the value, the time it takes to restore everything to the state needed for the game, and to switch out of this system mode, and I think we’ve found our 500 frames!

In conclusion…

So, is this a damning indictment of the AVS? Not at all! It’s a really nice machine, is more affordable and has better TV compatibility than an RGB or HD-modded NES, and it’s fun. You can disable the hotkey, and as far as I know there will be no slowdown beyond the tiny amount needed for wider TV compatibility.

But it does show that if you really care about exact timing, you need to know what your machine is doing. Am I going to notice a 2% slowdown? Nope. Will a speedrunner? Maybe!

What really matters, of course, is that we get two more rows for the Nicole Simulator timing chart.

Computer CPU Clock speed Time to 10K nya (mm:ss)
Apple ][plus MOS 6502 1 MHz 3:22
Tandy 1000 HX Intel i8088 7.16 MHz 4:39
Tandy 1000 HX NEC V20 7.16MHz 4:05
Sega SC-3000 Zilog Z80 4.00MHz 9:54
Tandy Color Computer 2 Motorola 6809A 895kHz 1:47
Nintendo Famicom Ricoh 2A03 1.79MHz 8:20
RetroUSB AVS Simulated 2A03 1.79MHz 8:30