Register - Login
Views: 99792256
Main - Memberlist - Active users - Calendar - Wiki - IRC Chat - Online users
Ranks - Rules/FAQ - Stats - Latest Posts - Color Chart - Smilies
05-03-22 04:46:09 AM
Jul - General Game/ROM Hacking - Food For Thought... (NES Emulation) Black and Red 3DChat New poll - New thread - New reply
Pages: 1 2 3Next newer thread | Next older thread
Peardian

  
Magikoopa

16/3/1: KvSG #479 is up!

Level: 157


Posts: 5931/7597
EXP: 48602485
For next: 976748

Since: 08-02-07

From: Isle Delfino

Since last post: 10 days
Last activity: 10 hours

Posted on 05-15-12 07:22:09 PM Link | Quote
Yeah, ZSNES has a nice rewind system. I don't know how it works exactly, but it doesn't take up a lot of memory or ruin performance. From what I can tell, it periodically makes a rewind point every some-number of seconds, while tossing really old ones. And of course, this includes things like activating cheats and loading save states, so it's handy if you accidentally press the wrong button.


Of course, you guys probably knew all that already.

____________________
-Peardian-

"Kindness is the language which the deaf can hear and the blind can see." -Mark Twain


GuyPerfect
Catgirl
Level: 68


Posts: 719/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-16-12 01:30:50 AM Link | Quote
The short story is that I've ultimately decided to put off NES emulation for now, since I'd like to get a chance to run some code on the real system to see what makes it tick. There's a handful of unknowns right now that I'd like to tackle first hand to make sure the emulator I'd make actually behaves the same as the system. You know, since that's what started all this.

Instead, I've turned my attention to... a different system. A bigger system: one with up to 16 MB of ROM and RAM, which suddenly makes the mapping/breakpoint approach we've been talking about a bit hefty in terms of memory usage. I don't really want to have 64 MB hanging around, you know, just in case (and that's at one byte per address; we're talking pointers here), so something a little less speedy is in order to reduce the footprint. Except that also eats into performance, and yadda-yadda-yadda...

The nature of the instructions on this system is such that the execution time of individual instructions varies depending on, in some specific cases, external interactions. Things vary depending on bus size, conditional behavior, and even how many times you execute the same thing in a row. The CPU is also typically halted while waiting for VBlank. All of this comes together to mean that precise timings on CPU cycles isn't anywhere near as critical as it is on NES. \o/

Much of the previous discussion still applies, and I think it'd work well in the general theme of emulator development. There'd be an all-encompassing "core" module that has other modules plugged into it such as the CPU, ROM mapping, I/O handlers... Even though the system in question doesn't do bank switching, it's still important to handle reads/writes into cartridge addresses, so everything that applies to NES seems to apply here as well.

Still, when it comes to a 4 GB address space, I'm open to ideas in regards to how to implement breakpoints.
Joe
Common spammer
🍬
Level: 111


Posts: 2549/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-16-12 03:57:16 AM Link | Quote
Originally posted by GuyPerfect
Instead, I've turned my attention to... a different system.
Very specific.
Originally posted by GuyPerfect
Still, when it comes to a 4 GB address space, I'm open to ideas in regards to how to implement breakpoints.
Use a self-sorting binary tree to store the breakpoints. Each memory access would involve traversing the tree in search of a breakpoint, but the reduced memory usage should be worth the speed hit.

I'm going to guess that the CPU has some sort of virtual address mapping; occasionally, it is useful to be able to set breakpoints within a virtual address rather than a physical address. How that might work depends a bit more on exactly which CPU you are referring to.

____________________
Rena
I had one (1) message in Discord deleted and proceeded to make a huge, huge mess about how it was a violation of free speech and how moderators are supposed to be spam janitors and nobody should have the right to tell me not to talk about school shootings
Level: 135


Posts: 4757/5390
EXP: 29075298
For next: 259707

Since: 07-22-07

Pronouns: he/him/whatever
From: RSP Segment 6

Since last post: 342 days
Last activity: 342 days

Posted on 05-16-12 09:58:38 AM Link | Quote
Post #4757 · 05-16-12 04:58:38 AM
Originally posted by Peardian
Yeah, ZSNES has a nice rewind system. I don't know how it works exactly, but it doesn't take up a lot of memory or ruin performance. From what I can tell, it periodically makes a rewind point every some-number of seconds, while tossing really old ones.
Trouble is, "every few seconds" doesn't make for nice smooth rewinding, and it can be annoying. You press rewind and jump back several seconds into the middle of a manoeuvre that requires precision, so you screw that up too, hit rewind again and go back even further... Some, like nesDS, are very smooth; it looks like the game is actually running in reverse while you hold the button. That makes it much more useable, since you can go back to a specific point.

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 720/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-16-12 01:31:07 PM (last edited by GuyPerfect at 05-16-12 01:31:53 PM) Link | Quote
Originally posted by Joe
Very specific.
I'm not sure the world is ready to hear it yet.

The system has never been fully emulated, and it has a bit of a divided fan base. Half the people would say "Here, you should do this and that," and the other half would say "Don't waste your time." For now, just know that it's a nondescript 32-bit system that largely operates on 16-bit instructions and has double-buffered video.

Most of the system's functionality has been reverse-engineered by a handful of people, including one particularly enterprising fellow. However, there's still no comprehensive resource that pools all of the technical aspects into one spot (there's a web site that's sorta good, though), so I'm thinkin' I'll take the mantle of the writer of the sacred tech scroll and do something like what the Nocash guy did with GBATEK and Everynes.

I'm gonna start on that today.


Originally posted by GuyPerfect
I'm going to guess that the CPU has some sort of virtual address mapping; occasionally, it is useful to be able to set breakpoints within a virtual address rather than a physical address. How that might work depends a bit more on exactly which CPU you are referring to.
ROM is fixed within a predetermined memory range and cannot change: what's on the cart is what the system sees.

I spoke with someone with a bit more experience in this area than I and he suggested a binary tree as well, but more strongly suggested using a simple hash table since the total number of breakpoints in play at any given time will be fairly small--less than 10 in most cases. The impact on performance is minimal that way, and the amount of extra memory needed is miniscule; growing only as the number of breakpoints does.
Joe
Common spammer
🍬
Level: 111


Posts: 2550/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-16-12 09:01:39 PM (last edited by Joe at 05-16-12 09:03:59 PM) Link | Quote
Originally posted by GuyPerfect
The system has never been fully emulated, and it has a bit of a divided fan base. Half the people would say "Here, you should do this and that," and the other half would say "Don't waste your time."
Preserving technology is not a waste of time.
Originally posted by GuyPerfect
ROM is fixed within a predetermined memory range and cannot change: what's on the cart is what the system sees.
This is also true of the N64, but the cartridge is relatively slow, so games run from RAM and copy data from the ROM as needed. Some (such as GoldenEye 007) use virtual addresses rather extensively, so being able to set breakpoints on virtual addresses can be very useful.

herp derp forgot to reply to the thing i was going to reply to edit:
Originally posted by GuyPerfect
The impact on performance is minimal that way, and the amount of extra memory needed is miniscule; growing only as the number of breakpoints does.
The binary tree would only contain addresses at which breakpoints are set. I don't know how a mostly-empty tree compares to a hash table in terms of performance, but it should be similar in memory usage.

____________________
Rena
I had one (1) message in Discord deleted and proceeded to make a huge, huge mess about how it was a violation of free speech and how moderators are supposed to be spam janitors and nobody should have the right to tell me not to talk about school shootings
Level: 135


Posts: 4761/5390
EXP: 29075298
For next: 259707

Since: 07-22-07

Pronouns: he/him/whatever
From: RSP Segment 6

Since last post: 342 days
Last activity: 342 days

Posted on 05-17-12 12:00:14 AM Link | Quote
Post #4761 · 05-16-12 07:00:14 PM
Using a hash table means having to hash the address on every memory access? Mupen64plus uses a linked list of start address, end address, breakpoint type (on read/write/etc), sorted by start address to avoid having to search the entire list in most cases. It also optimizes this check with an array of memory access functions, one to each block of some number of bytes; if a breakpoint is set in that region, it points them to a function which checks for breakpoints before performing the access, and if not, it points them to one that skips the check. That way for ranges that don't have a breakpoint (i.e. most of them), it doesn't waste time checking for them.

Keep in mind that the cost of editing the breakpoint list, no matter how it's done, is going to be miniscule compared to the cost of iterating it, since the former is done very rarely, and the latter potentially millions of times per second. So it's always worth optimizing to reduce the time required to iterate it. It's also going to be very rare that you actually hit a breakpoint, and typically execution will stop when you do, so it's much more important to handle the no-breakpoint case quickly even at the expense of making the hit-breakpoint case slow.

One thing I want to do in my emulator (something development versions of Mupen64plus do as well) is provide an API for breakpoints to Lua scripts. Then a script can set a breakpoint, and when it hits, do something such as log to the console and continue execution. If you do this, then the scenario changes slightly - you might be hitting a few breakpoints per frame, and not stopping execution. But still, it's very unlikely that this has any significant performance impact compared to that of checking for them in the first place.

I find myself wondering if the native CPU/OS's breakpoint functions can be used here, and if that would be any faster...

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 721/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-17-12 01:12:19 AM Link | Quote
Originally posted by Joe
Preserving technology is not a waste of time.
Oh, very well.

I spent, like, all day documenting the instruction set, and I'm still not done (haven't even touched on the bit string operations)... But the entire rest of it is reorganized in an emulation-friendly manner, so you can take a look at it from this mysterious link right here.

Originally posted by Joe
This is also true of the N64, but the cartridge is relatively slow, so games run from RAM and copy data from the ROM as needed.
Ah, I see what you mean. It all works the same: that's just an execute breakpoint on a RAM address. Shouldn't throw any kinks in the works.
Rena
I had one (1) message in Discord deleted and proceeded to make a huge, huge mess about how it was a violation of free speech and how moderators are supposed to be spam janitors and nobody should have the right to tell me not to talk about school shootings
Level: 135


Posts: 4763/5390
EXP: 29075298
For next: 259707

Since: 07-22-07

Pronouns: he/him/whatever
From: RSP Segment 6

Since last post: 342 days
Last activity: 342 days

Posted on 05-17-12 02:50:59 AM Link | Quote
Post #4763 · 05-16-12 09:50:59 PM
Ah, I didn't know that thing had so much processing power.

Having a heck of a time figuring out what's wrong with my ADC/SBC instructions. Some flags aren't being set properly...

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 722/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-17-12 04:19:10 AM Link | Quote
Carry is like a ninth bit in an 8-bit register, and pertains to unsigned operations.

For ADC, the value of the bit itself is also added as input (as 1) to the result. Something like this:

result = op1 + op2 + carry;
if (result > 255) carry = 1; else carry = 0;
result &= 0xFF;

For SBC, at least on NES, everything's backwards: the result will add 1 if the carry bit is clear, and it will be set if the value did not wrap around:

result = op1 - op2 - carry + 1;
if (result < 0) carry = 0; else carry = 1;
result &= 0xFF;

Overflow, on the other hand, signals a corruption of the sign bit if data spills into it during a signed operation. It occurs when the value exceeds the maximum or minimum value and wraps around to the other side.

For ADC:

result = op1 + op2 + carry;
if (result > 127) overflow = 1; else overflow = 0;
result &= 0xFF;

For SBC:

result = op1 - op2 - carry + 1;
if (result < -128) overflow = 1; else overflow = 0;
result &= 0xFF;
Joe
Common spammer
🍬
Level: 111


Posts: 2552/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-17-12 04:21:16 AM Link | Quote
Originally posted by GuyPerfect
this mysterious link right here
Wow, that looks a lot like MIPS.

...Except for virtual address translation. I guess all the stuff I said doesn't apply here. Maybe if you decide to write an emulator for the N64?

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 723/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-17-12 11:47:50 PM (last edited by GuyPerfect at 05-17-12 11:48:26 PM) Link | Quote
So I was talking to someone about how to handle flag detection without actually having access to the CPU flags, and we decided what was best is to actually promote operands to larger data types for computation and then truncate the result to give back to the emulator. The alternative is to use a series of if statements, which in practice takes more host CPU cycles than the conversion between data types does.

In the case of integer, in my case 32-bit integers, converting both operands to 64-bit integers and truncating the result to a 32-bit integer will be faster than checking the different situations where the overflow flag can become set. Plus, it's easier that way: the sign bit of the larger data type won't get corrupted, so a simple AND mask can detect overflow.

In the case of floating-point, promoting the 32-bit singles to 64-bit doubles and doing the calculations there can provide a means to detect floating-point overflow as well as loss of precision. The other guy also informs me that floats on the x86 instruction set are actually extended to 80 bits so that double exponentiation won't lose precision, so if that's how the processor does it, then we can do it that way too.

Some personal notes for floating-point processing:
  • Single binary format is signed, e8s23.
  • If e is all 1s and f is all 0s, it represents infinity (reserved operand exception).
  • If e is all 1s and f is not all 0s, it represents a NaN (reserved operand exception).
  • if e is all 0s and f is all 0s, it represents zero.
  • if e is all 0s and f is not all 0s, it represents a denormal number (reserved operand exception).
  • Double binary format is signed, e11s52.
  • After processing, if any of the high three bits of e are set, it can't fit in a single (overflow exception).
  • After processing, if any of the low 29 bits of s are set, it can't fit in a single (loss of precision).
// Conversion macros

#define Int32AsDouble(x) ( (double) *(float *)(&x) )
#define DoubleAsInt64(x) ( *(long long *)(&x) )

// Variables in some make-believe floating-point division implementation
long long result_bits, temp64;
double result;
int temp32;

// reg1 and reg2 are the 32-bit integer register values

// Check for division by zero
if ( !(reg1 & 0x7FFFFFFF) ) {

// Invalid operation exception
if ( !(reg2 & 0x7FFFFFFF) ) {
PSW.FIV = 1;
return EX_INVALIDOP;
}

// Zero division exception
PSW.FZD = 1;
return EX_ZER0DIVIDE;
}

// Check for a reserved operand (should be checked for both operands)
temp32 = reg2 & 0x7F800000;
if ( (temp32 == 0x7F800000) || (!temp32 && (reg2 & 0x007FFFFF)) ) {
PSW.FRO = 1;
return EX_RESERVED;
}

// Perform the calculation
result = Int32AsDouble(reg2) / Int32AsDouble(reg1);
result_bits = DoubleAsInt64(result);

// Check for overflow
if (result_bits & 0x7000000000000000) {
PSW.FOV = 1;
return EX_OVERFLOW;
}

// Check for underflow and precision degradation
temp64 = result_bits & 0x000FFFFFFFFFFFFF;
if (temp64 && !(result_bits & 0x7FF0000000000000))
PSW.FUD = 1; // Underflow on denormal numbers
if (temp64 & 0x000000001FFFFFFF)
PSW.FPR = 1; // Loss of precision on significand truncation

// Store the result as a single
*(float *)(&reg2) = (float) result;

return EX_OK;
Joe
Common spammer
🍬
Level: 111


Posts: 2553/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-18-12 02:04:19 AM Link | Quote
Originally posted by GuyPerfect
The other guy also informs me that floats on the x86 instruction set are actually extended to 80 bits so that double exponentiation won't lose precision, so if that's how the processor does it, then we can do it that way too.
The x87 FPU has 80-bit precision only when all intermediate values stay in FPU registers, and only when the FPU is not in a compatibility mode for a lower precision. There's really no guarantee the compiler will set this up properly. Furthermore, if you decide to compile a 64-bit version, it won't use x87 at all in favor of SSE2.

Originally posted by GuyPerfect
// Conversion macros

#define Int32AsDouble(x) ( (double) *(float *)(&x) )
#define DoubleAsInt64(x) ( *(long long *)(&x) )

Technically you should use unions instead of pointer casts; otherwise you run the risk of crashing with an unaligned access.

Also "int64_t" is prettier than "long long" and "display:inline-block;" works like "float:left;" without messing up quotes. Anything else for me to complain about?

____________________
Rena
I had one (1) message in Discord deleted and proceeded to make a huge, huge mess about how it was a violation of free speech and how moderators are supposed to be spam janitors and nobody should have the right to tell me not to talk about school shootings
Level: 135


Posts: 4766/5390
EXP: 29075298
For next: 259707

Since: 07-22-07

Pronouns: he/him/whatever
From: RSP Segment 6

Since last post: 342 days
Last activity: 342 days

Posted on 05-18-12 04:22:25 AM Link | Quote
Post #4766 · 05-17-12 11:22:25 PM
That sure looks slow... I hope you'll be able to provide a faster mode that does use the native CPU flags/ops where available?

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 724/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-18-12 06:44:54 AM (last edited by GuyPerfect at 05-18-12 06:45:33 AM) Link | Quote
Oy, what a long day. Who knew documenting registers and exceptions would take so long? Of course, half of what I'm doing is research against a document that is frustratingly brief at times under the assumption that you already know what it's talking about, so I guess the progress is good all things considered.

Once again, the mysterious link
Furthermore, with special flavors

Someone at least do some cursory proofreading on that or something, purty please. I'm focusing more on getting the information down and haven't spent a lot of time revising it or making sure it's easy to read and adequately conveys the information therein.

Originally posted by Joe
Technically you should use unions instead of pointer casts; otherwise you run the risk of crashing with an unaligned access.
Duly noted, though I thought floats and ints could be read from any location? Either way, I like the union idea. I hadn't even thought of doing it that way.

Originally posted by Joe
Also "int64_t" is prettier than "long long"
It's also safer, because long long has only a loosely-defined size and format in the C specification. I'd already planned to use it, but used long long in the mockup because more people are likely to know what it is.

Originally posted by Rena
That sure looks slow... I hope you'll be able to provide a faster mode that does use the native CPU flags/ops where available?
There are currently no plans to optimize the code for any one CPU architecture--the implementation will look pretty close to that. The goal is to keep the code portable so it can run on anything.

For floating-point operations in particular, it's important to be able to detect overflows, reserved operands and indefinites in order to ensure proper emulation (they raise exceptions; there are also flags for underflow and precision degradation), and there's no guarantee that the host system can even signal that kind of thing to the emulator.

Also, since the time I did that mockup, I've learned that the reserved operand exception should be checked before the invalid operation and zero division exceptions, even if it means the code will run more slowly in certain edge cases. Yeah, I'm that hardcore.
GuyPerfect
Catgirl
Level: 68


Posts: 725/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-18-12 11:51:52 PM (last edited by GuyPerfect at 05-19-12 03:12:37 AM) Link | Quote
Okay, I've removed the red document and instead added a link to the white one that shows up when you have JavaScript enabled. Clicking it will change the colors for the whole document. It looks great on iPad.

I've also formatted the document to add page breaks in appropriate locations for printing.

New updates include documentation of the bit string instructions and the instruction cache, along with some appendix-style lists of instructions. There's still a few tweaks to make here and there, but at this point I've pretty much gotten the entire CPU documented.

I dare say that I'm at a point where I can make an emulation module for the CPU. Here's what I have in mind (though I won't be touching it for a few days):

All instructions are at least 16 bits wide, and the high 6 bits of the first 16-bit unit (bytes are read low byte first) represent the opcode field. Except for Bcond, in which case only the high 3 bits are the opcode. Depending on the value of the opcode, there may be 16 more bits to read for the instruction, and some instructions contain a sub-opcode field.

For Bcond in particular, the high 3 bits are (high bit first) 100, and no other opcode begins with those three bits.

So what I can do is set up a stupidly simple hash table for opcodes: just use an array of 64 function pointers, and assign the address of the corresponding instruction-processing function to the index of each opcode. For all indexes that correspond with opcodes that begin with 100, which would be 0x20-0x27, they can each just point to the generic Bcond interpreter. All other opcodes will point to their 6-bit handlers, except for the three opcodes that are invalid--those will point to a function that triggers an exception.

Alternately, I could read opcodes of 7 bits, which is sufficient for knowing exactly which conditional branch is used by the instruction with a 100 opcode (there are 4 condition code bits immediately after the opcode). The holes in the hash table/pointer array could be filled in with duplicates of the pointers to the handlers of the high 6 bits of the opcode, meaning the 7 bit value is opcode+junk.

Even more alternately, I could extend that to 8 bits to prevent any unnecessary processing on the high byte. The hash table/pointer array would contain 256 entries when it really only needs 57 (the conditional branches are compacted into one instruction). It will contain 4 copies of most instructions, and 2 copies of each of the conditional branches. However, in the grand scheme of things, it's an array with only 256 entries, and the benefit is that instruction processing functions can be called directly by using a byte from the program data as the index in the list.


EDIT:
The document will now only apply the red-on-black color scheme to the screen, saving the black-on-white for printing regardless of the color mode. Also, adding something for the search portion of the URL will cause it to load as red by default (something like "stsvb.html?red").

Who knew technical writing could be so FUN!
Joe
Common spammer
🍬
Level: 111


Posts: 2555/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-19-12 05:32:22 AM (last edited by Joe at 05-19-12 05:32:47 AM) Link | Quote
Originally posted by GuyPerfect
Originally posted by Joe
Technically you should use unions instead of pointer casts; otherwise you run the risk of crashing with an unaligned access.
Duly noted, though I thought floats and ints could be read from any location? Either way, I like the union idea. I hadn't even thought of doing it that way.
SSE2 has alignment requirements of up to 16 bytes. In fact, you aren't supposed to do that even when there's no alignment restrictions, since it's undefined behavior.

It turns out, abusing unions for type punning is also undefined, but GCC explicitly supports it. The manual explains it pretty well.

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 726/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-19-12 08:50:44 PM (last edited by GuyPerfect at 05-19-12 10:20:15 PM) Link | Quote
Okay, I've got some ideas for how I want the CPU module to pan out...

• No static memory will be used. I thought long and hard on this, and decided that the benefits of being able to emulate multiple systems (for instance, when a link cable is used) far outweigh the performance benefits that would result from using global variables. It would be a non-trivial task to convert the CPU engine from using static memory to using the heap for use as an object, and the teensy boost in performance with static memory just isn't worth that. Even in the case of local variables in functions, reading memory from a stack-relative address isn't terribly slower than just referencing a global variable directly. I mean, come on: Virtual Boy runs at all of 20 MHz. Super-optimal performance is non-critical, but flexibility of use with the code has some significant worth.

• The emulation core itself will be the central module, called Core. To get the CPU up and running, at least the CPU, ROM and RAM modules need to be implemented and plugged into the Core. The Core will support Read and Write operations for the CPU, which translates addresses to their corresponding modules (ROM, RAM, input, video, etc.). Preferably, memory ranges can be redirected to their corresponding modules directly, without sending the requested address through a series of if statements to figure out where it should go.

• Virtual Boy ROM images, to my knowledge, do not specify the capacity of the in-cart RAM chip. The maximum usable size of the chip on the system bus is 16 MB, and I'd rather not allocate that up-front for the purpose of just in case. Instead, I can start with a smaller value like 1 MB or even 512 KB and, in the event a RAM access is requested in a higher range, the block can simply be reallocated. This can only happen a handful of times at most during a single session, so it won't be a performance hit.

• As outlined in my previous post, a list of 256 function pointers will be used to immediately direct CPU execution/disassembly to the corresponding handler using the second byte of the next instruction. Additionally, a second table will be used to fetch and decode instructions before the CPU executes them, rather than have the fetch/decode/execute all done by the instruction handler. It's a six-to-half-dozen scenario, but performing the fetch/decode in centralized code will at the least decrease the compiled size of the executable. This is possible because the formats of all instructions are known based on the value of the same byte that's used to determine which instruction to execute.

• Since the exact cycles for some instructions aren't known in all cases, uh... I'm just going to use a fixed value, probably the average of the minimum and maximum execution time for most instructions. Timing on Virtual Boy is driven by interrupts anyway, and the CPU is actually halted to wait for VBlank. It won't be totally accurate as an emulator, but it will still be pretty close.

• For bit string instructions, the program counter might not be incremented. Yeah, you read that correctly. Since those instructions can potentially take a long time to complete--by design--and also need to be abortable by interrupts, I figure the best way to handle them is to process one word at a time, taking 3 cycles for each one, and only incrementing PC when the entire process completes (then adding something like 30-40 cycles at the end). You know, the way they behave on the real system. As far as the Core is concerned, it's just executing as normal, and due to the specifications of the bit string instructions using r26-r30 as working variables for just that exact reason, it makes a lot of good sense and doesn't require any special support for the bit string instructions.

And then, this is where things get interesting. I also want to introduce some debugging functionality into the CPU and Core.

• As repeatedly mentioned earlier in the thread, memory breakpoints is a big issue. The types of breakpoints I want to support are Read, Write and Execute, meaning the Core will need to be able to intercept addresses and check them against a list of active breakpoints... in the most efficient way possible. There are too many addresses to just keep a big list of flags for each one, and processing of addresses that don't have breakpoints on them should happen in as little time as possible. Buuut, since Core will actually be implementing Read, Write and Execute functions for the purpose of normal processing, it's just a matter of hijacking what's already there.

• I want breakpoints to support complex conditions (at the very least being able to check the contents of registers and memory), and in some form or another an action to take--be it the Single Step routine (see below) baked into the debugger or a custom action given by the user. Maybe not Lua exactly, but something.

• Single Step (aka Step In), Step Over and Step Out functions are also on the list. Single Step is a cakewalk: just execute one instruction at a time during a break (perhaps most easily implemented with execute breakpoints on every following instruction). But Step Over and Step Out, uh... Yeah, this might sound weird, but the V810 family doesn't have a stack. As such, there is no instruction for returning from functions that I can trap to detect stepping out. What I'll probably have to do is keep track of the n most recent return addresses set by the JAL instruction. For Step Out, just put an execute breakpoint on the most recent. For Step Over, put a breakpoint on the instruction after the JAL in question. Exactly how many levels to support is tricky, though, because there's nothing about the system that says all functions have to return.

• The debugger should sport a disassembler so the user knows what the crap the program is trying to do. The debugger should also allow on-the-fly modifications of all registers during a break for the purpose of debugging specific scenarios without actually requiring them to occur normally. Status flags should take the form of check boxes, since trying to pack a 32-bit integer value when all you want to do is set the Z flag isn't my idea of productive.

• Like I brought up in the first post, I'd like some way to keep track of function calls as they occur in the program. The calls are easy: that's what JAL is for. But detecting returns is a bit more involved during execution, and virtually impossible just looking at the instructions in a disassembly. I imagine most returns take the form of JMP r31, but I suppose any JMP would qualify, since conditional branches are used within functions. I'll have to think on it. Either way, all starting addresses for functions when JAL is used will be added to a list for further review.

So hey, uh, any feedback or ideas are welcome!
Joe
Common spammer
🍬
Level: 111


Posts: 2558/3392
EXP: 14501032
For next: 367328

Since: 08-02-07

From: Pororoca

Since last post: 12 days
Last activity: 6 hours

Posted on 05-20-12 12:38:01 AM (last edited by Joe at 05-20-12 12:43:11 AM) Link | Quote
Originally posted by GuyPerfect
• Since the exact cycles for some instructions aren't known in all cases, uh... I'm just going to use a fixed value, probably the average of the minimum and maximum execution time for most instructions. Timing on Virtual Boy is driven by interrupts anyway, and the CPU is actually halted to wait for VBlank. It won't be totally accurate as an emulator, but it will still be pretty close.
Sounds like we need some reverse-engineering to be done. A friend of mine actually has a Virtual Boy, but no flashcart or anything.
Originally posted by GuyPerfect
• As repeatedly mentioned earlier in the thread, memory breakpoints is a big issue. The types of breakpoints I want to support are Read, Write and Execute, meaning the Core will need to be able to intercept addresses and check them against a list of active breakpoints... in the most efficient way possible.
I'm telling you, man, binary trees.
Originally posted by GuyPerfect
But Step Over and Step Out, uh... Yeah, this might sound weird, but the V810 family doesn't have a stack.
[...]
I imagine most returns take the form of JMP r31, but I suppose any JMP would qualify, since conditional branches are used within functions.
This sounds exactly like MIPS, where one register will point to the stack and r31 will be used for returning from functions. Even handwritten assembly tends to follow the ABI out of convenience. Does the V810 have an ABI? Even if it doesn't, Nintendo may have chosen to enforce one.

____________________
GuyPerfect
Catgirl
Level: 68


Posts: 727/1096
EXP: 2665648
For next: 63152

Since: 07-23-07


Since last post: 1.7 years
Last activity: 219 days

Posted on 05-20-12 01:24:28 AM (last edited by GuyPerfect at 05-20-12 01:28:14 AM) Link | Quote
Woo hoo! The document is up to 40 pages now, and the CPU info is all done. I read through the entire thing on my tablet earlier (using the red color scheme because shut up), and made a few revisions/corrections. It looks pretty good to me, now: everything seems to be fully explained, correct, and I don't think I left anything out. But I welcome feedback on it.

I've also added a credits section (I had to give the document a name for this), example of some CPU program code, and information on the cartridges/ROM format. So hey, check it out!

http://perfectkiosk.net/guyperfect/stsvb.html

Originally posted by Joe
Sounds like we need some reverse-engineering to be done. A friend of mine actually has a Virtual Boy, but no flashcart or anything.
I hope he has a microscope, too. The clock timings in question are on the lowest level inside the processing core. For example, two floating-point multiplications with different operands may take a different number of cycles to complete. Even if we did have all the specifics on this, it'd probably bog down the emulator just trying to get the number of cycles to be correct.

The variance in cycle counts is fairly small, however. For the aforementioned floating-point multiplication... okay, nevermind, 8-30 is kind of a broad range. Still, I'm likely to just implement it as 19 no matter what the operands are.

Originally posted by Joe
I'm telling you, man, binary trees.
I'm not disagreeing with you.

Originally posted by Joe
This sounds exactly like MIPS, where one register will point to the stack and r31 will be used for returning from functions. Even handwritten assembly tends to follow the ABI out of convenience. Does the V810 have an ABI? Even if it doesn't, Nintendo may have chosen to enforce one.
The document mentions a handful of registers used by the assembler and the gcc compiler they made for it. r3 is used as the stack pointer, and since you have to load an address into a register in order to jump to it, I'd be extremely surprised if JMP r31 wasn't both A) the only time that register is used and B) used every time the program needs to return from a function.
Pages: 1 2 3Next newer thread | Next older thread
Jul - General Game/ROM Hacking - Food For Thought... (NES Emulation) Black and Red 3DChat New poll - New thread - New reply


Rusted Logic

Acmlmboard - commit 47be4dc [2021-08-23]
©2000-2022 Acmlm, Xkeeper, Kaito Sinclaire, et al.

29 database queries, 13 query cache hits.
Query execution time: 0.086317 seconds
Script execution time: 0.058647 seconds
Total render time: 0.144964 seconds