Foreword
It’s that time. Are you excited? I may not be an artist, and the VB may have its limitations… and while I may not have had a project ready in time for the Virtual Fall deadline, I do know me some software development. So here comes my next project, and you’re invited!
I hack video games. And I love hacking video games. But to hack video games, I need some pretty hefty tools. And these tools come from emulators, which all too often don’t actually have the tools I need to hack video games. Basically what I’m saying is, I want to make an emulator with a feature-rich suite of debugging tools that I can use to hack video games. That’s where I’m coming from, and that’s the overall goal of the project.
But why stop there? Why don’t we make this the official Planet Virtual Boy emulator, where Planet Virtual Boy members work together to make it happen? That would be awesome! So that’s why I started this thread. I want to document the development project for something like this, and I want as many people to be involved as possible. I think this can really become something great.
Selecting a Language
There are various programming languages out there that can be used to make emulators. I just had a chat conversation with morintari about how I wanted to use Java instead of C, but I’m going back on that. And here’s why.
Java comes with a lot of built-in packages to handle things like user interfaces, networking, graphics, audio, controller input, etc. These are things that, if the project were written in C, we’d have to go through and learn/figure out for every single target platform we want the emulator to run on. The only real downside to Java in this regard is that it won’t run on portable systems like the Nintendo 3DS (which is perfect for VB emulation), and that to me sounded like an acceptable loss.
Buuuuuut, something from the back of my mind remembered something, so I looked into it. It’s called Simple DirectMedia Layer, or SDL for short, and it’s a lot better than it was years ago when I first looked into it. The short version is this: it’s a library for the C programming language that implements cross-platform functions for user interfaces, networking, graphics, audio, controller input, etc. Basically, it does everything I’d considered using Java for, and it works with C. That’s a win-win as far as I’m concerned.
For the record, anything with any variant of the GPL is, as far as I’m concerned, qualified as “GNUtered” and I will refuse to use it. SDL used to be LGPL, but now uses the zlib license, meaning I’m totally on board with using it. The source code of the project will still be released. But it’s the fact that it doesn’t have to be that makes me happy.
So this is gonna be a C project. In time, this can even be used on the 3DS itself. Some of it will be advanced topics, but since VB homebrew is done in C to begin with, I trust we’ll have plenty of help making it happen.
System Features
In its simplest form, an emulator does what the real system does: it runs programs, accepts input, and produces output. The simplest emulator will be a window with some graphics, will play sounds, and will accept user controller input. As far as all that’s concerned, there’s a small list of distinct components that will go into this project:
Memory Controller
From the CPU’s perspective, there’s this big region of memory addresses 32 bits wide. The way the Virtual Boy works is that certain addresses are magical hardware ports that are used to configure different components of the system, such as graphics and audio. The process of rerouting CPU activity to its respective targets is going to be the job of the memory controller.
CPU
The main think-box of the system, the CPU reads in program code, does something with it, then stores a result somewhere. Everything passes through the CPU, so it’s arguably the most important component of the entire system. The CPU gets its information from the memory bus, though, so it can’t exist without the memory controller first existing.
VIP
The Virtual Image Processor is the video hardware for the Virtual Boy. There’s more memory allocated for this component than there is for system RAM, but they, that’s not a bad thing! The VIP is configured by the CPU, and produces pictures 50 times every second–one for each eye. This will in some form or another wind up on your computer’s monitor.
VSU
The Virtual Sound Unit is… well, it’s an actual sound unit. There are five channels that can play tones from a program-defined wave table, and one channel that plays random noise. It’s configured by the CPU, but it produces no output that the program can use. VSU output is entirely for the human ear to process. After some magic happens, this will come out of your computer’s speakers.
Game Pad
If you want to control your program, you use a controller. It’s the human interface component: when you push a button, the CPU hears about it and processes the program accordingly. Through some means, keyboard, mouse or joystick input will get mapped into this virtual component.
Timer
There is a high-frequency timer present in the Virtual Boy that can be configured to run at 100μs per tick or 20μs per tick. A reload value is specified by the CPU, which is then reduced by 1 every tick. When the timer reaches zero, it notifies the CPU via an interrupt, causing an execution break. Implementation of this component will simply make use of the host system’s own high-frequency timer.
Link Port
Communications can occur between two Virtual Boy systems using a link cable. DogP has done a wonderful job producing such a cable, and I want to make it worth his while by fully supporting link functions in this emulator. How does an emulator handle a link port? Well, one way is to run two instances of the emulator and use inter-process communication. The other way is through networking, such as LAN or internet communication.
Debugging Features
This is the part that has me totally stoked. Since an emulator has to keep track of all the various program and memory states for the entire emulated system, it’s a fairly trivial matter to produce additional software features that allow the user to tap into the inner workings. For someone like me, this is essential before the really fun stuff can happen. So without further ado, here’s what I have in mind…
Disassembler
Generally speaking, CPUs do their thing by reading in some small number of bytes at a time (typically one to four bytes) then, depending on the values of those bytes, doing something. Each time it does this, it’s called executing an instruction (the bytes themselves define the instruction). Byte values in a Virtual Boy program can be converted into a human-readable format, and that’s what a disassembler does. I also want there to be trace and step-over functions, which will allow the user to see what happens in the system one instruction or one function at a time.
Options should be available regarding disassembler output. The official NEC notation places the destination register last, which contrasts with the more familiar convention of putting it first. Registers can be named as well, such as r3 being the stack pointer SP. Registers themselves can also be referred to by different conventions such as r16 or $16. There are more examples, but there’s no need to force a particular notation on everyone. But for the love of beans, there needs to be a way to copy text from the disassembler to paste into a text editor in another window!
Additional disassembler output would come in the form of comments off to the right of the instructions themselves. These comments, where appropriate, will indicate to the user whether, for instance, game pad memory is being accessed, or if it’s a call to a known function, etc.
Furthermore, I’d like there to be a little pane on the disassembler that gives information on the instruction about to be executed. It will give a sentence description of what it does, what its format is, and what the before-and-after states will be. I mean, if you come across a “CAXI 28E4h[r12], r22”, it’s not necessarily apparent exactly what’s going to happen.
Lastly, the disassembler should have the option to assemble at a given address, even so far as overwriting the contents of ROM with the new program code.
Breakpoints
Let’s say you find a function with the disassembler, and you need to know what part of the program uses that function. Or, say you know where in VIP memory a particular graphic is stored, and you want to know where in the program the memory actually gets written there. This is done via breakpoints, both the execution and memory access varieties. This is in many ways an extension of the disassembler. During an emulator break, the step features of the disassembler become useful.
Breakpoints can often be paired with conditions. For example, say there’s a particular memory address that’s accessed a lot, but you’re only interested if a particular value gets written there. That’s a conditional breakpoint. I want to expand on this a bit, allowing breakpoints for particular CPU instructions, as well as particular pixel patterns in video memory, to name a few.
Function Logging
Programs written in C and compiled–such as every Virtual Boy program not coded by hand by yours truly–share a particular characteristic: every line of code is in a function, and every function has a beginning and an end. VUCC and gcc alike use the JAL or JMP instructions to call functions, and always use JMP [r31] to return. This makes it especially easy for a debugger to detect function calls, which can then be used to keep a running log of every function used by a program. Once the starting address of a function is known, it’s an easy matter to find the end by following branches until the longest execution path is found.
Upon inspection, the user should be able to give names to functions, as well as specify parameter lists and name the return value. The whole log plus user modifications can be saved and loaded in the interface.
Note: Recompilers rely on this characteristic to translate emulated code into native code. They cache the new native functions for the next time the emulated function is called.
Memory Hex Editor
The entire CPU memory bus can be viewed in a hex editor interface. Even read-only bytes can be modified here, such as overwriting particular instructions in ROM in order to test things. Individual values can be interpreted as signed or unsigned, 8-, 16- or 32-bit, and integer or floating-point. New values can also be supplied in those formats, rather than entering the hexadecimal values directly.
Video and Sound Memory Viewers
While the hex editor can technically display the same information, the sound and video memory represent things that are a little more useful to humans if displayed as graphics. Pixel patterns can be displayed with various palettes, and wave memory can be shown as a graph of the actual wave (as well as have a button to listen to the tone).
The video viewer itself should pick apart each subset of the VIP, including the individual palettes, characters, backgrounds, window arrangements, objects, and even the LED pattern table. The final scene with all layered windows can toggle individual windows on or off, which can propagate to the main game display for when you want to not show the health bar or whatever.
All video and sound data should be modify-able in this interface as well as the hex editor.
Cheat Engine
You’ve seen this a hundred times by now. In addition to forcing a particular value in RAM, you can search for such values by comparing what RAM values have changed since the previous search (or specify the value if you happen to know what it is). I’d also like to see Game Genie-esque ROM patches for when program modifications are desired.
Invitation
We’ll need some hosting of sorts, and maybe a repository. I can use Dropbox for now as needed, but if there is more interest in this project, we can set up something better. But hey, if you’re interested, shout out! There’s plenty of work to be done, and I’m all fired up to make it happen.
Sounds awesome, nice write-up! I’d definately want to play some part in this. But since you’re talking about the importance of various notation options for the debugger, can we please (optionally) start calling “windows” “worlds” again? 😉
Sure, why not. We’ll add an option to switch between “Window”, “World”, “Donut” and “David Wise” (-:
The First Rule
The most important rule you can know about software design is this: Don’t assume the project requirements will never change. The second most important rule is this: Do your best. Just gotta drive that point home. Making assumptions and hard-coding things to the letter of the project spec is a very bad idea.
Case in point: City of Heroes wasn’t designed to have colorable powers, and it came back and bit them hard. Don’t Starve wasn’t designed for online play, and now they have to go back and add it in. My Little Pony: Friendship is Magic didn’t plan for a fourth season, so now they’re stuck with Twilight Sparkle with wings.
Good software design will afford future changes, be it modifying existing features or adding new ones later on. Bad software design will get the job done no matter how much has to be set in stone to make it happen. I aim for this emulator project to be good. Therefore, I want to avoid hard-coding assumptions.
Core Context
The first assumption we’re going to avoid making is that the emulator will only emulate one game on one system at a time. I mean, think about it… Pokémon Red and Pokémon Crystal can trade with each other, right? Not that we’ll necessarily set up the software to run multiple games simultaneously, but there should be a way to do it nonetheless. That way, if at some point down the line we decide to do so, there will already be the groundwork to build on.
To simplify matters, this project will consist primarily of an emulation core–a library that does all the processing of the Virtual Boy, but doesn’t do anything specific regarding the system that the emulator is running on. For instance, it will create the image buffers for each video frame, but it won’t actually display them on the screen (something that happens differently depending on the device or operating system). This way, the core code can be re-used on all manner of platforms and architectures without requiring any modifications at all to make it work.
There is an approach to software implementation known as object oriented programming. While I’ve heard it defined as specifically linking data and methods together in the programming language, I prefer to think of it as a technique. At the machine code level, a C++ class is just a group of functions that accept a struct pointer as their first argument, and that’s absolutely something you can do in C.
Putting both of these things together, the emulation core will be designed to work with an instanced context. The core context will contain all of the state information for the emulation core: CPU registers, RAM contents, etc. (Serializing the core context and writing it out as a file is the same thing as a “savestate”, which can be loaded to resume from an earlier state.) If multiple systems are to be emulated, then multiple core contexts can be instantiated. Simple as that.
Efficiency
Making the “best” program is often a matter of opinion, but there are some things that are objectively better than others. For example, consider writing an instruction processor for the CPU. Each instruction can have arbitrary behavior, so some means needs to be present to select how to process each one. A naïve approach would use a bunch of if statements, or perhaps a switch block:
// if statements: if (opcode == ADD) cpuAdd(operands); if (opcode == JMP) cpuJmp(operands); if (opcode == XOR) cpuXor(operands); // etc. // switch block: switch (opcode) { case ADD: cpuAdd(operands); break; case JMP: cpuJmp(operands); break; case XOR: cpuXor(operands); break; // etc. }
Emulators by their nature often deal with bit sequences that have a number of different behaviors depending on the values of the bits. If there’s conditional code run for every one of those, it can slow it down substatially. So for the sake of efficiency, conditional execution should be kept to a minimum.
In particular, we can eliminate most of the conditional execution by using the bit values directly to execute code. This is done with something called a function pointer, which unfortunately isn’t a beginner-friendly programming concept. The basic gist of it is that the program knows the address in memory of a function, and can call that function by address rather than by name. This gives the program the ability to run a function out of an array or some kind of hashtable or something:
cpuExecute[opcode](operands);
Other programming languages can accomplish the same thing through other constructs, such as delegates or interfaces.
As you’ll see as this project progresses, we can use function pointers for quite a few things; not just CPU instructions. If there’s a way to make the program run itself without having to inspect data as it passes through, that’ll be the way to go.
Plug It In
The second assumption we’ll be making is that any of the functions in Virtual Boy will work the same way they do on the real system. Wait, what? Isn’t that the point of an emulator? Yes, most of the time. But there may be cases where the application wants to override a behavior (such as with custom breakpoint handling).
As such, every conceivable thing (where appropriate) should be override-able, most often by using function pointers. If the application needs to override something, it can simply specify a new function via address and things will continue to work.
Memory Controller
The third assumption we’re going to avoid making is that any given memory address has a specific behavior associated with it. A perfect example is the 0x04000000-0x04FFFFFF region, which is allocated as the cartridge expansion area. There doesn’t exist any cartridge that uses this area, meaning for all intents and purposes there is no behavior associated with it. But what if at some point MineStorm makes a cartridge that uses it? We’ll want a way to add that to the emulation core, right?
Fortunately, there are some assumptions we can make regarding the system. For instance, the memory bus is 27 bits wide, meaning anything at or above 0x08000000 are mirrors of the lowest 27 bits. We also know that the hardware functions are separated depending on the uppermost 3 bits, meaning 0x00000000, 0x01000000, 0x02000000, etc. all refer to different hardware components. Basically, the top-level memory logic looks like this: 1) Take the uppermost three bits, 2) execute the function that handles that region. And then we write some functions to handle those regions.
Individual subsystem components should be responsible for processing memory accesses. For example, the functions that access VIP memory should be located in the VIP module. That way, as emulation core modules are introduced, they can simply be plugged into the memory controller module and it will Just Plain Work™ out of the box.
There are five kinds of reads on Virtual Boy:
* Signed 8-bit
* Unsigned 8-bit
* Signed 16-bit
* Unsigned 16-bit
* 32-bit
And three kinds of writes:
* 8-bit
* 16-bit
* 32-bit
From the debugger’s point of view, there are two ways to read: one for what the emulated CPU would see, and one for what’s actually there. What’s the difference? Namely VSU memory: it’s always read back as 0x00 to the CPU, but the wave memory isn’t forgotten after being written. I’ll refer to these as “read” and “debug read” respectively. The memory controller needs to support both.
In the interest of allowing the application to hijack core functionality, every read and write function should be called as a pointer, and the application can supply new function addresses. The debugger can set up a memory read breakpoint by overriding the behavior of one of the memory controller’s read functions, for instance.
The return value of each read and write function should, in the interest of cycle accuracy (to be covered in my next post), return the number of cycles taken to process the access.
Once the memory controller is set up, a cartridge ROM and cartridge RAM controller can be made, then the CPU can start to exist.
Guy, count me in to help (even though I might not be the best help). I’m just about to be out on summer break after my first year as a high school math teacher. Now I need to get my programming chops back so I can teach CS. I’m advanced enough in C to know what a function pointer is but not so advanced as to know how to use it for an emulator. But I’m willing and ready to learn! When can we start?
This is absolutely fascinating and I’m super glad to see people working on it. Unfortunately I cannot contribute — I got a C+ in my one and only C++ class back in 2000, haha. I wish I knew enough to help, but I really, really don’t.
Guy Perfect: You mentioned something about cartridge expansion space. Could that be used by something equivalent to a SuperFX chip (which I believe itself is a custom ARM chip) for faster 3D? Or just anything, right? It’s expansion! haha. I know we’ll probably never do anything like that, especially considering the difficulty by which new carts are made, but it’s really fascinating to know it’s there.
That’ll probably be the last post from me in this thread — I’ll be reading every post because it’s interesting and well-written, but I don’t want to get in the way of the real work. Good luck everyone! This is a great project!
This sounds like fun. I’ve been wanting to try to develop an emulator but the project always seems to overwhelming for what little time I have. I wouldn’t mind pitching in a little. My C skills are probably average but I’m a quick learner and an expert Googler 🙂
Guy writes V810 machine code by hand in hex, so I figure he’s the best possible person for this job. I don’t know how the heck a emulator is programmed, but I assume Guy knows his stuff about that too.
I would like to update the gccvb compiler to a newer version of GCC, even though Guy hates GNU. I do most of my work on GNU/Linux (Grrrr GNU) and my optimizations do not work. I’ve started looking in to the patches made to binutils and gcc that’s included on this website. This might be a worthwhile project for me to learn a little more about the V810 processor instructions.
As for the SDL, there are two major versions of SDL out these days. SDL 2 came out a little while ago and I don’t really know much about it. SDL in any version is great for handling graphics, sound, and input. Most resources online are about SDL version < 2. All I know is that part of my issue doing VB work in the past was no good way to debug my programs. Mednafen is about as good as it gets and I think Guy has found Mednafen to be lacking in many aspects as far as emulation and debugging go.
Yeti_dude wrote:
I’m just about to be out on summer break after my first year as a high school math teacher. Now I need to get my programming chops back so I can teach CS.
Well happy mathing! This project oughtta be just what you need, then.
Yeti_dude wrote:
When can we start?
As soon as I finish collecting my thoughts. Part of my reasons for these walls of text is so I can make sure I don’t forget anything before starting. (-:
jrronimo wrote:
Unfortunately I cannot contribute — I got a C+ in my one and only C++ class back in 2000, haha. I wish I knew enough to help, but I really, really don’t.
The actual programming is only a small part of the project, despite the fact that the whole goal is to produce the program… More work goes into design than implementation, so if you can come up with ideas, you can definitely pitch in.
jrronimo wrote:
Guy Perfect: You mentioned something about cartridge expansion space. Could that be used by something equivalent to a SuperFX chip (which I believe itself is a custom ARM chip) for faster 3D? Or just anything, right?
Correct, but it can only interact with the CPU. Video and audio and what-not aren’t accessible to the cartridge directly. That’s not to say it can’t render images in its own internal buffer and have the CPU load it into video memory, though.
Greg Stevens wrote:
I’ve been wanting to try to develop an emulator but the project always seems to overwhelming for what little time I have.
No worries. We can all be overwhelmed together. (-:
Yeti_dude wrote:
I would like to update the gccvb compiler to a newer version of GCC, even though Guy hates GNU.
For the record, I have no qualm with GNUtered software. Debian is my preferred Linux distribution, and gcc my preferred compiler. It’s GNUtered source code I take issue with, especially if it tries to creep into my source code.
I really need to get with dasi and push DevkitV810 out into the open. It uses the latest gcc, and libvue was rebuilt from the ground up. Plus, it’s fully documented. (-:
Yeti_dude wrote:
As for the SDL, there are two major versions of SDL out these days. SDL 2 came out a little while ago and I don’t really know much about it.
For the time being, you know more about SDL than I do. (-: Alls I know is that it can do what I need it to do and isn’t GNUtered, so I’ll look into it later when the time comes.
Yeti_dude wrote:Mednafen is about as good as it gets and I think Guy has found Mednafen to be lacking in many aspects as far as emulation and debugging go.
You could…. say that, yes. Nothing against Mednafen, though. For what it is, they did a great job on it.
Cycle Accuracy
This was more important for older systems like NES, where fancy scrolling effects relied on perfect CPU-video timing, and on Virtual Boy may well be impossible, but there’s no reason we shouldn’t support it anyway. The VB’s CPU runs at 20MHz, meaning 20,000,000 cycles per second. Each CPU instruction takes one or more cycles to execute. Therefore, efficient code will use cheaper instructions to get the job done.
Other than the CPU, some hardware components operate on their own accord. The VIP will render frames 50 times a second, or if you will, one frame every 400,000 cycles. The high-frequency timer running at 20μs per tick means 400 cycles per tick. The VSU outputs audio at 41,700hz, meaning somewhere between 479 and 480 cycles per sample. In order for everything to stay synchronized, everybody needs to be operating on the same time scale.
The CPU is fortunately the fastest interval in the system. No other hardware component operates at a frequency anywhere close to the CPU, so we can use “one CPU cycle” as the atomic unit of time.
It’s important to note that while the emulation core components will be synchronized relative to one another, they may not be synchronized with real time. One CPU cycle in the emulation core, being on an emulator and all, is pretty much guaranteed to take an amount of time way smaller than 1 / 20,000,000 of a second. The only real-time synchronization we need to ensure is that the video and audio outputs match those of the real system.
As mentioned earlier, each CPU instruction may take one or more cycles. So one easy way to synchronize emulation core components is to execute one instruction, record how many cycles it took, then ask each component “do some processing for X cycles”.
There are some unknowns in Virtual Boy, however:
* Floating-point instructions have variable cycle counts and NEC won’t tell us how.
* The bit string instructions have variable cycle counts, and all we have are best-case, cache-hit numbers.
* Speaking of the cache, we have no idea how its records are populated or replaced.
* The VIP is a mysterious beast, and figuring out how long it takes to draw whatever pixels is a months-long project in its own right.
* Time taken to display VIP frames varies depending on the situation of the system. This is the reason for a frame clock signal on the link port, to keep both systems synchronized.
In the interest of simplifying the entire process, I suggest we do one thing slightly differently from the actual system: only supply user input once per frame (as opposed to whenever the emulated program asks for it). This will enable the emulation core to operate one frame at a time, like a decent God-fearing emulator, and simplify the task of the driver application.
Braaaaaains
The CPU is a very simple module, considering. All it needs to do is read in instructions from the memory controller, process them to produce output (potentially writing back to the memory controller), and return the number of cycles it took.
The NVC (the Virtual Boy’s customized V810 processor) is a RISC architecture, standing for “reduced instruction set computing”. What that means is that the CPU only contains the bare minimum number of instructions to satisfy its desired functionality (mostly; some arithmetic operations can use either immediate or register data). Like most RISC systems, it does this through the load/store approach, where data is loaded into registers, manipulated, then written back to system memory.
What’s the alternative? They call it CISC, complex instruction set computing, and it’s a play on RISC. Other architectures like x86 have various addressing modes for each operation. You might have individual instructions for each of the following:
* Add from immediate
* Add from register
* Add from memory
* Add one
* Add from an address pointed to by this place in memory
* Add from an address pointed to by this register
* Add from an address pointed to by this place in memory plus the value in this register
* So on
* So forth
Virtual Boy has two:
* Add from immediate
* Add from register
If you need to add with the contents of system memory, you have to load it first. In fact, the “add from immediate” instruction is a bit superfluous: you can easily load a literal value into a register using one instruction, then add the value in that register using a solitary addition instruction. But the immediate version makes some things easier.
NVC/V810 instructions are either 16 or 32 bits in size. 16 bits are always read, and the uppermost 6 bits are then examined to determine which instruction it is. Depending on the instruction, an additional 16 bits may be read (there is no instruction that is sometimes 16 bits and sometimes 32 bits; it’s one or the other). Processing instructions looks like this:
* Read 16 bits, and examine the highest 6 bits as the opcode.
* All opcodes less than 101000 are 16-bit instructions.
* All opcodes greater than or equal to 101000 are 32-bit instructions.
* If the highest 3 bits are 100 (or, if you will, if the six bits are 100xxx), the instruction is a conditional branch
* If the opcode is 011111, then it’s a bit string instruction and a sub-opcode field is present
* If the opcode is 111110, then it’s a floating-point instruction and a sub-opcode field is present
Pinky, are you pondering what I’m pondering? If we can read in 16 bits from the memory controller, and parse out that 6-bit opcode as a number, couldn’t we use that as the index into an array of function pointers? Why yes, we can! That’s the example I used in my previous post, too, and it makes CPU processing a lot simpler.
The same goes for conditional branches, bit string instructions and floating-point instructions. Within their respective handler functions, additional arrays of function pointers can be used. Branches always use one of 16 conditions, and the bitstring/floating-point sub-opcode field works the same as the primary opcode field. Function pointers make life easy!.. Well, easier.
Instruction decoding can actually be done with function pointers per instruction format, but that’s a tad too technical for the scope of this post. We’ll go over that in greater depth when implementing the CPU.
The fourth assumption we’ll avoid making is that each CPU instruction has fixed behavior. Again, if we want to add a debugger to this project, it’ll need a means for plugging into the emulation core (breakpoints don’t happen on their own, after all). And another layer of function pointers can make that happen. The core context will itself have an array of function pointers for each instruction. The application can then change these directly by supplying addresses to new instruction handlers. These new handlers, once their custom tasks are done, can themselves then call the default functions supplied by the emulation core!
I know it might be kinda confusing right now, but trust me, this is going to give a lot of flexibility to the emulation core and won’t sacrifice speed due to unwarranted overhead. Considering the benefits this brings to the table, it’s well worth the time planning it out from the outset.
See Code Run. Run, Code, Run.
The first two modules of this project are going to be the memory controller and the instruction implementation. That is, the bus emulator and the CPU emulator. Both of those things consist primarily of logic and data, and not so much in the things-you-can-see category. So how do we even know that we did it right? Well, you’re gonna need some kind of output for debugging your own programs. Some may call this a test program, but I’m partial to referring to it as the driver (not to be confused with a device driver). We need a program to drive the code we write so we can make sure we did it right.
As corrections often need to be made, you could say that after you write code, you have to right code… And that process is called debugging. Not to toot my own horn, but I rarely make logic or implementation errors: most of my problems are either spelling things wrong, making syntax errors or using the wrong variable names (especially after renaming a variable). Still, that’s something that will be caught during debugging, even if the code happens to run without crashing.
This particular project presents a somewhat more challenging environment than a typical closed-shop development run. Where in a production environment you can hack together the sloppiest, ugliest, most unholy assortment of lines of code imaginable in order to verify your code is working, that really won’t fly here where we’re putting the project up on a pedestal for presentation to the general public. We’re going to have to debug in style.
So the way I see it, if we’re going through the effort to make our driver program for debugging fancy, why not actually make the debugger that will be in use by the finished product? You know, that disassembler and hex editor and video memory viewer and stuff. Those can be what we make to verify that our emulator is working correctly!
Again, the first two modules of this project are going to be the memory controller and the CPU. The memory controller is simple enough that it can be verified algorithmically: a few simple test cases of reads and writes is all it will take to ensure it’s working correctly, and no human actually has to inspect it for that to happen. I’m comfortable testing the memory controller with a few lines of code. The CPU, on the other hand, has a lot going on. It has a bunch of state information, and the program that it’s running needs to be seen to make sure execution is flowing properly. This looks like a job for *da dada DAAAAAAH!* the disassembler!
The disassembler is a GUI component, meaning before we can begin this I’m going to have to do some research on SDL. |-: Gimme a couple of days to look into it, I promise it’ll be worth it.
Following is an image mockup I made a couple years ago when I first got into Virtual Boy. Like, when I first got into Virtual Boy. What started out as a feasibility case study for porting Mega Man 9 to NES evolved into a actual consideration for Virtual Boy emulation. This was before I joined Planet Virtual Boy, and in fact was long enough ago that Photobucket hadn’t started putting unique suffixes on filenames:
Forgive the red and black. That was totally tongue-in-cheek. Still, the general layout I have in mind is… well, pretty self-explanatory. That’s what I’d like to see in the disassembler, give or take a few things.
When the memory controller loads the ROM file, and the CPU begins doing its thing, the disassembler will display the machine instructions, the CPU monitor will show the status flags, and the controls will allow one instruction to be executed at a time, verifying whether things work properly. And that, I think, would make for a very handsome first step towards a full-fledged emulator.
This is a first draft. Please feel free to pitch in any ideas or suggestions you may have.
Attachments:
Fail Faster
I made this its own post because of how important it is.
I’d like everyone to take a few minutes to watch the following video. Granted, it is presented in the context of video game design, but it works for so many other things in life too… not the least of which is software development:
There is an oft-quoted bit of wisdom said by the famous author Ernest Hemingway:
The first draft of anything is shit.
Whether it be design, implementation, or in some cases even testing and debugging, the first pass shouldn’t be expected to be the last. You can’t expect to be able to say “Today, I will implement this. I will be done by tomorrow, at which time I will implement that.” I tried that a fair number of times in my past, and there’s always something that comes up. Sometimes a problem turns out to be more difficult than I expected. Sometimes halfway through I think of things that would make it better. Sometimes I get everything exactly the way I wanted it and decide that, in the end, I don’t really like it that way.
And sometimes, I get to a point in a project, and learn enough about how it should be put together, that I just start the whole darn thing over from scratch. This happens a number of times during the course of development. It’s the #1 reason I don’t work as a professional developer: there’s no way I can, in good conscience, work on a schedule. I need to do it right, and the first draft of anything is shit. I need to fail early and often to make sure what comes out on the other end is the best that I can make it. I won’t settle for something that “works well enough,” even if it means delaying the product’s final release.
My name needs to live up to me, after all. (-:
It’s going to happen, so be ready for it. As well as development may go, and as much progress is made, it’s pretty much guaranteed that I’ll eventually call for a do-over. That’s not to say that progress will be thrown away; quite the opposite. All of what we learn, and a great deal of the code and assets that we produce, will be reincorporated into the new project. But if I can look at the code base and think, “You know, it’d really be better if XYZ”, I’ll rebuild it in order for XYZ. I won’t be all tyrannical about it, but at the same time I do ask for the patience and understanding of anyone else involved that it’s something I’ll need to do on a basic level. That’s a part of who I am.
It’s going to be a fun ride, I guarantee you that. But it will also be a bumpy one. Here’s to a new emulator!
Guy Perfect wrote
The # reason why I do not work as a professional developer:
Not a professional developer? HA could of fooled me!
SDL is looking quite nice. For years I’ve had a sort of cross-platform API of my own in the works, which I set up so I could run my programs on both Windows and Linux. SDL takes more or less the same approach as I did, except they did a lot more and support way many more systems. (-:
I spent my morning before work, and my entire evening, working with the SDL source code to produce a static library without much trouble. See, the libraries it comes with require this pesky little SDL2.dll on Windows, which I didn’t want.
Well, turns out they already distributed the static libraries in the exact same archive as the DLL. They’re in a different folder, but they’re there: .a files instead of .lib files. So not only am I now intimately familiar with the SDL modules and making libraries with gcc & friends, but I’ve got SDL linking straight into my executables with no need for a DLL!
Since ld libraries don’t themselves import any of their dependencies, it’s necessary to link everything that SDL needs when using its static libraries directly. For my own reference, here’s what a command looks like that does the job (provided the static libraries are in the compiler’s lib directory):
gcc main.c -o test.exe -lSDL2main -lSDL2 -lWinmm -lOle32 -lGDI32 -lImm32 -lVersion -loleaut32
And since I’m not one to include all the symbols just for the sake of because, there is a wee problem doing it this way.
See, SDL2main requires WinMain() to be present (a typical Windows entry point), and it actually implements its own then calls the application’s own main() function after taking care of some initialization. It may not be very elegant, but I won’t argue with the guys who did things the way I was doing it except better. (-:
Anyhoo, WinMain() is defined in src/main/windows/SDL_windows_main.c. In order to get that module to actually stick around by the time SDL2main needs to grab ahold of it, the program should reference one of the functions in that file. It doesn’t have to be pretty; just something to get it to work without including every darn function in the entire library.
For whatever reason, referencing WinMain() itself doesn’t seem to do the trick, but there’s another function in there called console_main() that works just fine. The following gets the job done just swell-ly by sticking this function in the main C source file:
// Import WinMain() for use with libSDL2main on Windows #ifdef _WIN32 void MainHack() { console_main(0, NULL); } #endif
That MainHack() probably shouldn’t be called, but it doesn’t have to be. Since the main module requires it, the module containing WinMain() will be included, and everything compiles without a hitch.
Perhaps tomorrow, I can do something constructive!
I’ve got SDL up and running now, and I keep giggling at just how similar it is to the API I developed over the years. I’ve only been at it for, like, 3 or 4 years, but the SDL project started way back in 1997. The fact that the method those guys settled on after all their research and experimentation happens to be the same as the one I came up with… that’s satisfying. (-:
This post doesn’t go into much detail, because everything below is a topic of discussion in its own right. If anyone is interested into delving deeper into the topics I’m about to discuss, lemme know and we can spin up another thread for it.
OpenGL
OpenGL is an industry-standard graphics library that, as the name implies, is open and free to use (from the vendor’s standpoint; software development for stuff like this is always free). It’s supported by all modern video hardware, and is even in use on non-desktop systems such as phones and video game systems (Nintendo 64 and newer, for instance). SDL has direct support for OpenGL rendering contexts linking up with the windows it creates, making it just dandy to use for this project.
There are three main reasons I want to use OpenGL:
* It’s available everywhere. SDL may support Direct3D, but Linux and Mac don’t. Everything supports OpenGL.
* It’s hardware-accelerated. This means the software can do processing thing, and let the video hardware handle displaying things to the user.
* It supports stereoscopic displays. I don’t have a 3D monitor, but do any of you guys? It’d be awesome if both the left and right images from the emulated Virtual Boy can be properly red, without the need for 3D glasses. Of course the anaglyph mode will be there for us puny mortals, but stereoscopy is a thing now!
OpenGL operates on a context in much the same way our emulation core will. This context is bound on a system level and is often linked with the video hardware. It’s called the rendering context, and it has some interesting properties. Among them, the available features of OpenGL that can be used by the rendering context depend on the features available in the hardware. The device driver itself actually exposes this functionality to the application.
Legacy versions of OpenGL had “fixed functionality” and like a billion functions to manage its parameters. Starting with OpenGL 3.0, practically all of the fixed functionality was deprecated and the application now specifies its own shader programs. Any video card worth its salt will still support the old OpenGL stuff, but shaders give a lot more flexibility.
The problem is, in the event shaders aren’t available on a system (such as my crappy old netbook I keep by my bed), some kind of fallback to an older version of OpenGL is appropriate. Therefore, at least two graphics engines will need to be made: one for shaders, one for fixed functionality.
Threads and Blocking
On operating systems, each program exists as a process: a virtual memory context under which machine code is executed by the CPU. This can work thanks to a hardware memory controller that is configured by the OS and makes the CPU think the contents of any given range of addresses are whatever the OS wants them to be. Processes in turn spawn threads, which are individual bits of code that run side-by-side. When only one processor is present, the CPU makes threads take turns. When multiple CPUs or multiple cores are present, multiple threads can actually execute at the exact same time.
Care must be taken when accessing the same memory from multiple threads. After all, you don’t want to read memory in one thread while it’s also being modified by another thread. That can cause crashes. There are various synchronization mechanisms available to prevent this from happening, and they’re all controlled by the operating system.
Threads make the world go ’round, but they’re not something you’ll encounter in everyday development (unless you do a lot of developing every day). SDL provides some simple threading features, and we’ll want to make use of them for a particular reason: blocking is something we’ll want to do when appropriate. When a thread blocks, it relinquishes itself to the OS indefinitely: it ceases to continue until such a time that the OS is ready for it to resume. This is useful for a number of reasons.
The chief situation is window events. Window events come and go, but in the grand scheme of things, they’re not very frequent. They’ll happen when you move the mouse, or press a key on the keyboard, or resize the window or whatever. A naïve game may just check for events in a loop, then proceed with graphics code when all events are processed. This is inefficient because the program will try to soak up all available CPU time. If you can cause the window thread to block and just wait until the OS has an event ready for it, that frees up CPU time for other programs running on the system.
Ours is not the only program the user is allowed to use. We should make the effort to use our CPU time wisely. Virtual Boy games failed to do this: they sit in a loop checking a global variable until the video hardware finishes a frame and the variable is updated in an interrupt handler. If they’d have just used the CPU’s halt functionality, I bet they could get the same battery life out of just three or four batteries, instead of six.
Blocking also plays a part in networking, file I/O and probably some other things that aren’t in my brain for the time being. Logically, we can’t block for all those things, or nothing will ever get done! That’s why we need multiple threads. For each place we can block (which is something we want to do), it needs to be in its own thread so the other parts of the program will continue to work.
Networking
SDL doesn’t implement a networking API. And it doesn’t have to, thanks to our good friends who wrote the POSIX specification. POSIX is a standardized set of operating system functions and structures, and while Microsoft is notorious for flipping the bird at standards, one thing they did decide on is the POSIX sockets API.
Linux and UNIX also implement POSIX, and Mac OS nowadays runs on UNIX, so it’s available there too. All we need is some means for exposing the API to the application and it’ll be good to go on all of our target platforms. I understand there’s an SDL extension that you can use that does this, but if need be, I can do it myself. It’s mind-numbingly easy.
Sockets can block when waiting for data to come in. We’ll want to do this to an extent, but with a timeout: we don’t want to wait forever if no data will ever arrive.
Okie dokie, I think we’re about ready to get this thing started now. I’ve got my serpentine Pokémon team trained up (you’re invited to participate), Drake Von Vladstone was pogo’d back to the grave, and the Snowmads have been forced to retreat. Once the crispy snacks arrive in the mail, I’ll be able to save Eagleland from a villain attacking from the future.
Yes, I do have a 3DS and a Wii U. What gave it away?
The Source of This Madness
I’ve set up a Dropbox folder for the project right now. If we can get something better set up at some point, that would be swell, but for now this does the job. I’m calling it “pvbe”, short for “Planet Virtual Boy Emulator”, and in the future we can come up with a catchy product name. (-:
Within there are two child folders: “repository” and “archives”. The repository folder contains all of the code, assets and tools used in the project. For now, it’s just a .c file and a couple of compile scripts.
The archives folder contains only .zip files named chronologically. They’re all-in-one copies of the entire repository folder. That’s just for the sake of backup in case something goes horribly wrong and we can roll it back. Rudimentary version control.
The program itself does this:
Just a spinning box, and it’s not even red-on-black! But it is a cross-platform OpenGL window, which is the key point of interest. With this, we can get started on the emulator proper.
Regarding SDL
I’m trying to get in touch with the SDL guys to see what their intentions were with regards to distributing SDL with software. The way it looks right now is that SDL itself needs to be installed as a runtime requirement, then software that uses it… uses it. I was never one too fond of third-party distributables just for running simple programs, but I trust these guys, it’s not a huge installation, and it’s not .NET or Java. All-in-all, I’d be okay requiring the user to install SDL before running the emulator.
SDL was designed in such a way that new features can be introduced without breaking compatibility with older software. Combine this with its, like, 4MB of runtime binaries and it really doesn’t sound too bad to me.
Now let’s get this show on the road!
Attachments:
Coding Convetions
These are flexible regulations, and can be modified depending on the preferences of people who will be actively working on the project. But for starters, these are the rules I personally stick to:
* Function names are camelCase, where the name itself begins with the library name. Example: vueFunction()
* Static functions (non-library) are typically TitleCase, though the names can mutate as appropriate. Example: DoSomething();
* Variables are all-lowercase, and multiple words are separated by underscores. Example: int some_variable;
* Object instance structures are named like functions. This is not ambiguous since you can’t declare a variable as a type that is the name of a function. Example: vueContext
* Multi-field data type structures are named all-uppercase, with an underscore following the library name. Example: VUE_INSTRUCTION
* Library-defined constants are all-uppercase. Example: VUE_CPU_NOP
* Opening braces appear on the same line as the beginning of the construct, both for functions and statements.
* Indents are defined as four spaces per level, not one tab. This ensures the same appearance on any editor.
* Lines may not exceed 79 characters, since console displays are 80 characters wide and sometimes a newline is added automatically after column 80.
The Core Context
I’m thinking the emulation core, for the sake of this particular project, should be called “vue” after the Nintendo project name. Any other software incorporating “the vue library” will therefore be incorporating the emulation core.
The highest-level core context structure, then, will simply be called vueContext. It’ll look something like this at this point in the project:
typedef struct { void *tag; } vueContext;
What’s with “tag”? That’s for application-defined data. You never know when it might be useful, so it’s a good idea to toss it in.
The core API will contain functions for creating and deleting core contexts:
int vueCreateContext(vueContext *context); int vueDeleteContext(vueContext *context);
I’m returning integers here for the sake of reporting errors back to the application. If everything is successful, the return value will be zero, indicating no errors. Otherwise, the exact value returned can be used by the application to decide how to handle the situation according to what went wrong.
The main operation of the emulation core will be to process one video frame at a time, taking into account all system operations that occur during that 1/50 of a second period. Once that’s done, the application can retrieve the video and audio output to give to the user. The application then provides controller input to the core and another frame is processed. This will look something like this:
vueInput(&context, GetJoystickOrWhatever()); vueProcessFrame(&context, vbuffer, abuffer); DoSomeOpenGL(vbuffer); DoSomeAudio(abuffer); DoEvents(&SystemWindow);
It’s not at all likely to look exactly like that, but it’s the general idea. You do one frame, then accept the output. In fact, there may be additional functions for accepting video and audio buffers, and perhaps one display at a time can be accepted for video. Scanline stride should be taken into account too in case a powers-of-two texture needs to be created (as is the case on GL 1.1, meaning a 512×256-pixel texture). The audio sampling rate and bit format need to be taken into account.
And that’s just about it for the emulation core. Some other features like save/load state and reset will be in there, but in the grand scheme of things, using the emulation core in an emulator project will be a piece of cake.
The Memory Controller
I got pretty well along the way towards this section, and the post was getting pretty big. Gimme a little while to collect my thoughts and I’ll make a new post just for this concept.
[size=20px]This post contains a coding challenge! Read through for more info.
Right. So. The memory controller.
From the Top
The job of the memory controller is to emulate the hardware memory bus on the Virtual Boy. This specifically refers to the state of affairs within the system from the perspective of the CPU: the addresses that are available, and the significance of the memory at those locations.
For the hardware, addresses are only 27 bits wide. Bits 27 to 31 are thrown away. Masked out. Tossed in the bit bucket. Of the remaining bits, only the uppermost three of them matter to the memory controller. They can be one of eight different values, which correspond with hardware modules:
0 = VIP (Video)
1 = VSU (Audio)
2 = Hardware Control (Timer, Game Pad, Wait Control, Link)
3 = Not used
4 = Cartridge expansion (currently not used)
5 = System RAM
6 = Cartridge RAM
7 = Cartridge ROM
To emulate the memory bus is to emulate this division of address regions. But that’s easy to do: given a 32-bit address, the region number can be calculated as (address >> 24 & 7).
There needs to be a means within the emulation core API to read and write values to every address accessible to the CPU, which in turn means that each emulation core module needs to implement exactly what those reads and writes mean. After all, some data may not be write-able (ROM), and some may perform tasks on write (timer, VIP control, etc.).
As mentioned before, there are five kinds of reads and three kinds of writes. I’m thinking the emulation core library makes the following available to the application:
int8_t vueRead8 (vueContext *context, uint32_t address); uint8_t vueReadU8 (vueContext *context, uint32_t address); int16_t vueRead16 (vueContext *context, uint32_t address); uint16_t vueReadU16(vueContext *context, uint32_t address); int32_t vueRead32 (vueContext *context, uint32_t address); void vueWrite8 (vueContext *context, uint32_t address, int8_t value); void vueWrite16(vueContext *context, uint32_t address, int16_t value); void vueWrite32(vueContext *context, uint32_t address, int32_t value);
While it’s true that the NVC can process 32-bit floating shorts, the bit values that get stored in the registers can easily be treated like integers. Therefore, there won’t be read and write functions for floating-point. It’s also the case that signed and unsigned 32-bit values aren’t necessarily the same thing, but since there’s no sign-extension going on for 32-bit values, only one read type is supplied.
The debugger can use these, the CPU module will use these, a cheat engine can use these, etc. They’re the highest-level means for memory access, and they take care of all the work.
Delegation
Emulation core modules can come and go. What happens if at some point we add in something for the cartridge expansion area? We’ll need a way to put that into the core without rewiring the whole thing. So for that, we’ll want to set up a system that affords such development. After all, we don’t want to assume that the project requirements will never change, or else we’ll spend years with heroes using blue lightning and villains using red lightning.
Function pointers come to the rescue here. We can set up 8 handlers for each access type (one for each region of CPU addresses), then initialize them all to harmless dummy functions. As emulation core modules are implemented, they can then simply implement their own read/write functions and update the function pointers in the memory controller. Smart!
These are function pointer types with the same prototypes as the functions listed above:
typedef int8_t (*VUE_READ8) (vueContext *, uint32_t); typedef uint8_t (*VUE_READU8) (vueContext *, uint32_t); typedef int16_t (*VUE_READ16) (vueContext *, uint32_t); typedef uint16_t (*VUE_READU16)(vueContext *, uint32_t); typedef int32_t (*VUE_READ32) (vueContext *, uint32_t); typedef void (*VUE_WRITE8) (vueContext *, uint32_t, int8_t); typedef void (*VUE_WRITE16)(vueContext *, uint32_t, int16_t); typedef void (*VUE_WRITE32)(vueContext *, uint32_t, int32_t);
The & operator in C, while it can be used for bitwise AND, can also be used to retrieve the address of a variable. Pointers are often wizardry for people more accustomed to things like .NET or virtually any scripting language, but they’re part of how computers work at the CPU level, so bear with me.
The address-of operator & can also be used on function names to the same extent. If you have a function called main() in your program, then “&main” is the address of that function. And if you have a variable whose type is a function pointer with the same prototype as main(), you can store its address with “variable = &main;”
I won’t be giving a function pointer tutorial at this time, but it’s important to know what you’re looking at. In this case, the type defined as VUE_READ8 is a function pointer with the same prototype as vueRead8(), so you can use it with a variable and call it indirectly like this:
VUE_READ8 reader_proc = &vueRead8; // Assign the address of the function reader_proc(context, address); // Call the function using the pointer variable
Typecasting pointers doesn’t have any effect on compiled machine code; it’s simply a way to let the compiler know what’s valid and what isn’t. While it may reduce readability to some extent, I’m going to opt to cast these function pointers to type “void *” in some cases because it will greatly simplify the implementation of certain parts of this module.
Since we want to allow the application to inject itself into the memory controller, we’ll need a layer of abstraction between functions like vueRead8() and the functions implemented by the emulation core components. For instance, the path of execution might look like this:
vueRead8() -> debugger’s read8() -> default read8() -> ROM module’s read8()
We’ve already got vueRead8(), and the ROM module’s read8() will come along at a later time when the ROM module exists. The debugger’s read8() is purely hypothetical at this point, as it won’t exist in the emulation core. That means what’s left is that we have to A) make it so the debugger’s read8() can be used, and B) define that default read8().
The “default” memory access functions will be functions that process addresses into one of those 8 memory regions, then forward the accesses to the corresponding emulation core modules for further processing. This whole chain of command is called delegation.
Building a Bridge
It’s established that we want to do the following:
* Create library API functions for reading and writing against the CPU bus
* The API functions call other functions by pointer for handling the access request
* Default handler functions are provided when no hijacking is required
* The default handlers process addresses to route memory accesses to the appropriate emulation core modules
* Each emulation core module implements its own versions of the memory access functions
This actually isn’t as complicated as it sounds. It’s harder to say than it is to implement. (-:
For hijacking specifically, let’s start with an example of how I’d like to see it in action. Let’s say we have a debugger setting up a read breakpoint. That might look something like this:
// Persistent variable to hold the address of the original read8 handler static VUE_READ8 read8_old; // ... then, down in the hijacking code: // Custom debugger function to use in place of vueRead8() int8_t read8_debug(vueContext *context, uint32_t address) { // Put the emulation core in break status if the address matches if (address == break_address) vueBreak(context); // Use the original handler to finish processing the read return read8_old(context, address); } // ... meanwhile, somewhere in the initializer: read8_old = vueMemHijack(context, VUE_MEM_READ8, &read8_debug);
In this example, “read8_old” stores the value of the original read8 handler. This is useful because, after intercepting a read request, the injected function can merely call the original function to do the heavy lifting of processing the memory access. Smart!
To support this, the vueMemHijack() function should return the previous value of the handler function pointer for that access type. Not only does this expose the address of the default handlers to hijacking applications, but it also enables multiple levels of hijacking to take place, where each subsequent hijacker finishes up by calling the hijacker that came before it. Passing a value of NULL as the new handler address, on the other hand, will revert back to the default handler.
Another type of hijacking I’ll want to implement is vueCPUHijack() for CPU instructions. The second parameter is a symbolic name for exactly which process to hijack. In the example above, I used VUE_MEM_READ8, but later on we might see stuff like VUE_CPU_MUL and so-forth.
Unlike CPU instructions, memory accesses aren’t all of identical type: each access type has a function pointer type all to its own. This means they can’t be used interchangeably in an array… or can they? Turns out they can, to an extent, but not without a little typecasting. That’s where void * comes in handy (even if it makes things a little less clear).
This is what happens to the core context structure:
// CPU bus memory region index constants #define VUE_BUS_VIP 0 #define VUE_BUS_VSU 1 #define VUE_BUS_HWCTRL 2 #define VUE_BUS_UNUSED 3 #define VUE_BUS_CARTEXP 4 #define VUE_BUS_SYSRAM 5 #define VUE_BUS_CARTRAM 6 #define VUE_BUS_CARTROM 7 // Memory controller handler index constants #define VUE_MEM_READ8 0 #define VUE_MEM_READU8 1 #define VUE_MEM_READ16 2 #define VUE_MEM_READU16 3 #define VUE_MEM_READ32 4 #define VUE_MEM_WRITE8 5 #define VUE_MEM_WRITE16 6 #define VUE_MEM_WRITE32 7 typedef struct { void *access[8]; // Hijackable memory access handler pointers void *memctrl[8][8]; // Final memory access handler pointers void *tag; // Application-defined data } vueContext;
What’s going on here is that “access” is an array of void *, or untyped pointers, where we’ll store one function pointer for each memory access type #defined up above (5 reads and 3 writes). Reading an unsigned 16-bit value evaluates to 3 (since VUE_MEM_READU16 is defined as 3), so you’d access the handler function’s address via access[3]. Naturally, being of type void *, you’d have to first cast this to type VUE_READU16 before you could call the function.
“memctrl” is another list of typeless function pointers, but this time with two subscripts. Indexing this field looks like this: memctrl[address_region][access_type]… For instance, if I wanted to read an unsigned 16-bit value (VUE_MEM_READU16 = 3) from ROM (VUE_BUS_CARTROM = 7), I’d use memctrl[7][3], and of course I’d cast it to type VUE_READU16.
… Wow, I could spend all day trying to make this clearer. Let’s take a look at some code and, provided you can make heads or tails of it, it oughtta clear things up. (-:
// Library API function for reading a signed 8-bit value int8 vueRead8(vueContext *context, uint32_t address) { return ( (VUE_READ8) context->access[VUE_MEM_READ8] )(context, address); } // Hidden, default read handler for a signed 8-bit value static int8 DefRead8(vueContext *context, uint32_t address) { return ( (VUE_READ8) context->memctrl[address >> 24 & 7][VUE_MEM_READ8] )(context, address); }
Here, vueRead8() is the library API function available to the application. It indirectly calls the function stored in vueContext.access[VUE_MEM_READ8]. In some cases, that might be the default handler, or in others, it might be a function in a debugger or whatever.
DefRead8() is the default handler, and it is not available to the application. Rather than using vueContext.access, it uses vueContext.memctrl, then calls a function indirectly from that depending on the address being accessed. In some cases, that might be a useless dummy function, or in others, it might be a function in an emulation core module.
Registration for Dummies
I’ve mentioned the “dummy handlers” a couple of times, but what do those look like? They look like this:
static int8_t DummyRead8(vueContext *context, uint32_t address) { return 0; }
These are used within vueCreateContext() to assign initial values to vueContext.memctrl, and nowhere else for any other purpose. You never want to have uninitialized pointers laying around, especially in cases like this where there’s a possibility they could be accessed even though there’s never a reason to (like reading from the 0x03000000-0x03FFFFFF range on the CPU bus). They’re totally harmless if used, but they’re also totally useless. This makes them handy as placeholders as well as stopgaps.
All CPU bus range indexes within vueContext.memctrl are initialized to dummy handlers, and they’re only ever changed when an emulation core module comes along that supports memory accesses within a given range. So let’s say the ROM module comes along. The ROM bus address range is 0x07000000 to 0x07FFFFFF, so it would therefore supply handler addresses for each index “x” of vueContext.memctrl[7][x]. Or, rather, vueContext.memctrl[VUE_BUS_CARTROM][x].
The question now is, how exactly does an emulation core module “come along” to implement memory controller handlers? Glad you asked, person-who-knows-the-right-questions! This is done through the module registering itself with the memory controller. Let’s say you have the following lines of code in the ROM module:
static const void *handlers[] = { &Read8, &ReadU8, &Read16, etc. }; vueMemRegister(context, handlers);
Exactly where this function gets used, well, I haven’t decided yet. I suppose a default setup can be configured within vueCreateContext() itself, but I don’t want to preclude future modules that can do other things, or external code that works with the emulation core. There’s an NES emulator called Nintendulator that allows “mappers” to be supported through DLLs. I’d like for this Virtual Boy emulator to support something similar (even if we don’t use that feature).
What I’ll probably end up doing for now is include something like vueCreateDefaultContext() that uses default mappings for the memory controller as well as accepts a ROM data buffer.
Coding Challenge!
While I’m getting the mechanics of the memory controller implemented, who thinks they can implement the ROM module? The ROM module is a self-contained object file with a header containing the following API definitions:
// Ownership constants for ROM buffers #define VUE_ROM_REFERENCE 0 #define VUE_ROM_INHERIT 1 #define VUE_ROM_COPY 2 // ROM module instance struct typedef struct { int own_data; // Determines whether the object owns its data buffer uint32_t size; // Number of ROM bytes uint32_t mask; // The bit mask for bus addresses uint8_t *data; // The actual ROM data } vueROM; // Function prototypes int vueROMCreate (vueROM *rom, uint8_t *data, uint32_t size, int mode); int vueROMDelete (vueROM *rom); uint32_t vueROMGetSize (vueROM *rom); int vueROMRegister(vueContext *context);
For vueROMCreate():
* rom – Pointer to a vueROM structure to initialize.
* data – A byte buffer containing the ROM data.
* size – The number of bytes in the ROM data.
* mode – The ownership mode for the ROM object:
-* VUE_ROM_REFERENCE – The buffer is not owned, but merely read from
-* VUE_ROM_INHERIT – The buffer is given to the API and must be deallocated locally
-* VUE_ROM_COPY – A copy of the buffer is made locally by the API (and must be deallocated)
For vueROMDelete():
* rom – Pointer to a vueROM structure to uninitialize.
For vueROMGetSize():
* rom – Pointer to a vueROM structure to get the number of bytes from.
For vueROMRegister():
* context – Pointer to a vueContext structure to configure the memory controller for.
As a matter of formality, vueROMRegister() should use vueMemRegister() rather than modify vueContext.memctrl directly. This is important to guarantee compatibility and promote loose coupling. Remember, we’re not here to make assumptions about how other modules work!
If vueROMCreate() is specified to copy the ROM data, you’ll need to use malloc() and memcpy(), and vueROMDelete() will have to use free().
Since we the Virtual Boy community haven’t decided on a ROM file format yet (since we don’t really need one), keep in mind that Virtual Boy programs are always some exponent of 2 bytes in size, and any addresses higher than the size of the data get masked off and produce mirrors of the data.
This is important for when the CPU initializes its program counter to 0xFFFFFFF0; it will wind up specifying the sixteenth-to-last byte in the ROM data in all cases.
Last post for a while, I promise. (-:
I’ve implemented the memory controller module and uploaded it to the Dropbox folder, and created a new snapshot archive. main.c was moved backwards into repository, an include directory was added with vue.h, and src now contains a memctrl.c.
One change I made was that I decided to make the memory controller its own, true-blue module complete with object file, hence memctrl.h. I wasn’t able to totally decouple it from the core module, however: it still has to access the core context’s function pointer array directly.
In doing so, there’s a new object type: vueMemCtrl. It looks like this now:
// Object structure for memory controllers typedef struct { void *access[8]; // Hijackable memory access handler pointers void *route[8][8]; // Handlers routing to core modules } vueMemCtrl; // Primary object structure for emulation core contexts struct vueContext { vueMemCtrl memctrl; // Memory controller void *tag; // Application-defined data };
vueContext was reworded a bit (typedef removed, name moved before the curly brace) in order for those function pointer types to be able to forward-reference it. It’s otherwise the same.
I encourage you to look at vue.h and memctrl.c to see how it all goes together. Should help with implementing the ROM module too.
And with that, I’ll give it a rest for a couple of days. Hopefully someone steps up to the plate with the ROM module!
Saw this a couple of days ago but then got struck down with computer troubles for a while… and now literally need to head out the door for the weekend so haven’t gotten to read the last few posts yet, but just wanted to jump in while I can and say that this sounds awesome and I’d love to be a part of it!
[size=16px]Do Over #1!
Told ya the first draft wasn’t worth beans. (-: Let’s try this again, with pictures. I’m going to reiterate what I already said, but it should be better this time.
Also, I’ll be rewriting the code I wrote accordingly.
*Ahem*
The Memory Controller
Real Virtual Boy units have a memory bus with 32-bits of accessible memory addresses. The bus is split into 8 sections, each of which represents a different hardware component. Addresses are actually 27 bits wide, and anything from bits 27 to 31 are simply masked off and produce mirrors of the 27-bit memory range. The uppermost three bits, bits 24, 25 and 26, represent which of the 8 bus sections that the address belongs to:
0 = VIP (video)
1 = VSU (audio)
2 = Hardware Control
3 = (Not used)
4 = Cartridge Expansion (currently not used)
5 = System RAM
6 = Cartridge RAM
7 = Cartridge ROM
In the emulator project, the job of the memory controller module is to mimic the behavior of the memory bus. Any time something needs to access system memory or components from the CPU’s perspective, it needs to go through the memory controller. Since the CPU isn’t the only thing that might need to do this–debuggers and cheat engines need it as well–a common library-level API should be set up to handle routing of access requests.
Unlike last time, let’s try a diagram to demonstrate what’s going on!
Wow, that was way easier than last time!
Access Types
Virtual Boy effectively has 8 different ways to access memory:
* 8-bit signed read
* 8-bit unsigned read
* 16-bit signed read
* 16-bit unsigned read
* 32-bit read
* 8-bit write
* 16-bit write
* 32-bit write
The reason for signed and unsigned reads should be self-explanatory: since the CPU registers are 32 bits, smaller data types need to be made big enough to fit. An unsigned read fills in the extra bits with zeroes, whereas a signed read makes a copy of the highest bit and propagates it all the way to the left.
For the sake of simplicity, let’s pretend we’re reading an 8-bit value into a 16-bit register.
Bit pattern: 11010010
I trust you know how binary works. Thus, you should see how this bit pattern can represent the decimal number 210. Extending with zeroes to form 16 bits gives us 0000000011010010 = 210.
Signed, on the other hand, works on a principle called two’s complemet. If you have bits 11111111 and try to add 1, you get 00000000 because the value is only 8 bits, right? Well, in the same way, 00000000 minus 1 is 11111111. As it turns out, using two’s complement in this way means that both signed and unsigned operations are identical from the CPU’s perspective.
The leftmost bit, though, the 2^7 digit… Since that’s a 1 in this example, if we propagate that to the left when extending to 16 bits, we get 1111111111010010 = -46.
Since 32-bit reads don’t need extension, there is no differentiation between signed and unsigned.
Since writing values may reduce the number of bits, there’s no need to distinguish between signed and unsigned there either.
The Doing Over
In the previous draft, I said that debug-reads weren’t necessary because the only place that would matter is the waveform memory, which could have *urk* hard-coded handling in the hex editor. Well, I forgot about something: CPU cycle counts. The V810 bus takes a different number of cycles for load/store operations depending on how many of the same operation happened consecutively in previous instructions. Not to mention the fact that different hardware components operate at different speeds (VIP memory is slower than cartridge RAM). So not only am I going back on what I said about debug-reads, but we’ll also need debug-writes so debugger features don’t screw up the cycle counts of the emulated CPU.
Additionally, I went over all that to-do regarding making the memory controller hijackable. Well, I’m going back on that as well! The reason this time is because it’s totally redundant. If you want to intercept, say, an unsigned 8-bit read, then just hijack the IN.B instruction in the CPU module. That’s the only instruction capable of performing that kind of read, and hijacking that won’t cause debuggers to hijack themselves when they use the library-level memory API.
To accommodate CPU cycles, all memory controller read/write functions will return an integer representing the number of cycles taken. Read functions will now have an additional parameter for passing a reference to the variable where the loaded data should be put. So without any further ado, let’s talk about that!
Building a Bridge
When the CPU or debugger or whatever calls a library-level API function to access memory, the memory controller needs to route the request depending on the address to the appropriate emulation core module according to the corresponding bus section (numbered 0 through 7 from earlier). Further, since we don’t want to preclude any expansions to the emulation core later on (such as if we implement something for the cartridge expansion area), we’ll want the exact bus section handlers to be dynamic.
This can be accomplished with function pointers. Function pointers are variables that represent functions by way of storing the addresses in memory where those functions are located. A function can be called indirectly by using a pointer to it in the same way it can be called directly by name.
The memory controller needs to have 8 groups of 8 handlers: one for each bus section, and one for each access type. Initially, the handlers will point to dummy functions that don’t do anything. As emulation core modules come along, they can register themselves with the memory controller to support additional bus sections.
The Code
Once again, the code is located here:
https://www.dropbox.com/sh/3w3ql9ybasien76/AADcp_lpfgKOmoliAsnJYV09a
I’ve modified a few things from last time:
* Files were reorganized again. Now, all the emulation core code is in a libvue subdirectory, which contains src and include.
* The memory controller hijacking feature was removed.
* All memory accesses now return int: the number of cycles taken.
* Read functions now pass a variable by reference to store the loaded value.
* All memory access functions now have a “debug” parameter that signals a debug access.
* The memory controller function pointer table’s subscripts were reversed. They are now [access][bus], as that results in faster code once compiled (details are available on request).
* Memory access functions were renamed as “vueMemRead*” and “vueMemWrite*” to enforce a consistent naming convention across the API.
* The code was updated and compiles correctly.
Attachments:
Mkay then. It’s been two weeks, and nobody went for the ROM module. Not that I expected anyone would, but I figured I’d toss in something easy just in case someone wanted to step up to the plate. (-:
On that note… The cartridge!
Three Times the Charm
Of the eight sections of the memory bus, three of them are allocated to the cartridge:
* 7 = Cartridge ROM
* 6 = Cartridge RAM
* 4 = Cartridge Expansion
7 – Cartridge ROM
ROM is, by definition, Read-Only Memory. The intent is that data that should not be changed, such as program data, be unable to be changed in order to prevent corruption. It also typically means cheaper circuits, since writeable memory is more complex than memory that can only be read.
Per this thread, I don’t want to try and support All The Things™ in some feeble attempt to foresee the future. I’ve decided the best way to keep this project going is to just implement something that makes sense, and if at some point we develop new hardware for Virtual Boy, we can update the emulator accordingly.
Writing to a ROM address won’t do anything. On older systems such as NES, where 32KB of ROM addresses are allocated, additional ROM banks could be selected by configuring mapper circuits in the cartridge itself. This was done by writing into ROM addresses, but did not in and of itself alter the data at those addresses.
6 – Cartridge RAM
RAM stands for Random Access Memory, which doesn’t make a lot of sense. It has to do with those really old tape storage devices where in order to read from whatever byte, you had to seek forwards and backwards, making it a slower operation and efficient code would read sequentially when possible. “Random access” refers to the ability to read any arbitrary byte without any overhead.
Anymore, the term “RAM” is used to refer to memory that can be written to and used as scratch memory for a program. On NES, where 2KB of system memory is available, it was useful to have cartridge memory because that had 8KB of addresses allocated to it, and many games did use cartridge RAM as scratch memory.
Typically, cartridges with RAM (both NES and Virtual Boy, and a great many other devices) had some means of preservation–usually a battery. This made it so that the memory stored in the cartridge was still there after powering off the system. And that was instrumental to the art of the save game.
4 – Cartridge Expansion
Wait, if ROM is 7 and RAM is 6, wouldn’t expansion be 5 instead of 4? One would think so, but system RAM got the 5 slot on the bus. This makes me think that the cartridge expansion section was added in as an afterthought once the initial memory map was drafted.
What’s it for? Anything you want. Maybe a game needs more than 16MB for program data, and uses the expansion area as another ROM bank. Maybe the cartridge contains some audio hardware (the audio signal passes through the cartridge before going to the speakers) and the expansion area accesses the control registers. Maybe… maybe… well, you get the idea.
As no known hardware exists at this time that uses the cartridge expansion area on Virtual Boy, we don’t need to supply a driver for it. It can remain with those dummy memory access handlers. (I’ll have to double-check to make sure values read from this region are indeed 0x00)
Mirroring and You
For an address range to be mirrored means that it can be accessed by more than one physical address. In the case of the VB’s memory bus, which is only 27 bits wide, the value of the upper 5 bits is meaningless. 0x00000000 accesses the same memory as 0x08000000, 0x50000000 and 0xF8000000.
Addresses within cartridge ROM and cartridge RAM are likewise mirrored: memory circuits must be some exponent of 2 bytes in size. That is to say, they must have some exact number of accessible bits. The reason for this is that any bits above the readable portion are, like the system bus, meaningless and produce mirrors of the lower addresses.
The reason for that is part of the V810 specification: The address of the first instruction to be executed by the CPU after a system reset is 0xFFFFFFF0. Thanks to address mirroring, provided the ROM is some exponent of 2 bytes in size, the address 0xFFFFFFF0 will always be the sixteenth-to-last byte in the ROM.
Consider Wario Land, which is 2MB in size, which in turn is 21 bits out of the 24 available:
* System address 0xFFFFFFF0 loses its top 5 bits to produce a mirror of 0x07FFFFF0, which is in the cartridge ROM area.
* The remaining bits, 0xFFFFF0, lose the top 3 bits to produce a mirror of 0x1FFFF0, which is the sixteenth-to-last byte of Wario Land.
* Therefore, system address 0xFFFFFFF0 maps to ROM address 0x1FFFF0 after all mirroring takes place.
While it’s not strictly a requirement, RAM also typically mirrors. I don’t know about commercial games, but the FlashBoy Plus at least has 8KB of memory (13 bits), and testing has shown that it does mask off bits above that and produce mirrors.
Local File Data
A file needs to be loaded for cartridge ROM, and a file will need to be loaded and saved for cartridge RAM.
Ta dah! The shortest section you’ll see in any of these posts. (-:
The Code
Changes to the code include the following:
* Cartridge RAM and cartridge ROM APIs implemented in ramrom.c
* System WRAM also implemented in core.c