We're using cookies to ensure you get the best experience on our website. More info
Understood
@blitterRegistered December 14, 2007Active 7 months ago
161 Replies made

Probably fake. At 0:53 you can see the left half of the Goomba’s sprites disappear as it moves offscreen– This happens naturally on the NES but is a rendering artifact very unlikely to survive in an engine rewrite, which a port to the VB would require.

cr1901 wrote:
I HATE to be the one to bring this up, but perhaps it’s time that some of us take a look at GCC internals to see what’s going wrong? I’m taking a bit of a break from VB coding (call it “guilt that I’m letting my other code rot”) anyway, and I probably could take a look if I had some code that is known to generate bad jumps.

If somebody wanted to take a look at the GCC 4 patches, I know of a few spots that look suspicious:

– The “return “movhi hi(%1),%.,%0\n\tmovea lo(%1),%0,%0″;” in output_move_single, line 2255 or thereabouts in gcc-4.4.2-vb.patch. 32-bit loads are always encoded this way, even if the high word doesn’t change between consecutive loads. This line also shows up elsewhere in that function for handling other such loads.

– “sprintf (buff, “mov r31,r10\n\tmovhi hi(%s), r0, r11\n\tmovea lo(%s), r11, r11\n\tjal .+4\n\tadd 4, r31\n\tjmp r11″, name, name);” in construct_save_jarl, line 3797 or so also in gcc-4.4.2-vb.patch. While this isn’t bogus code– this code works– I don’t think we need to be doing long jumps in this way since jal takes up to a 26-bit displacement, which if my math is correct means up to an almost 64MByte jump in either direction– well more than we need on the VB.

– Prologue and epilogue function generation. Building with -O3 or -Os in gccVB 4.8 (part of dasi’s devkitV810 WIP) is totally broken here, generating unnecessary epilogue functions that clobber lp, leading to subroutines that in my testing return to address 0 (the first framebuffer), causing a crash. This can also happen occasionally in gccVB 4.4.2, though I haven’t been able to create a minimal example yet.

– Lines 837-849 in binutils-2.20-vb.patch, beginning with “HOWTO (R_V810_9_PCREL,”: This might be what’s causing the bad jump logic, creating relative jump addresses that are multiples of 0x400000 from what they should be, eventually wrapping around to the beginning of the address space and crashing. Not sure why the entry for 9-bit branches uses 26 for bitsize and type ‘long.’ I posted some code that exhibits this bug a while ago– http://www.planetvb.com/modules/newbb/viewtopic.php?post_id=26069#forumpost26069 — would be happy if somebody took a good hard look at what’s going on. (EDIT: It might just be a linker order problem, but would be nice to know for sure.)

🙂

Some of it yes, some of it no. I’ve thought of this– in fact I wrote a tool that post-processes my ROM to poke addresses directly into the assembly, saving runtime lookups– but any processing that adds or removes instructions would be non-trivial since that would throw any other instructions that deal with addresses completely out of whack.

Greg Stevens wrote:
Im pretty sure blitter was referring to the “precompiled version” that DaVince mentioned which is 2.95. But he can correct me if I am wrong. However I would like to note that the version with VBDE as I and blitter have pointed out in other posts doesn’t compile with the most optimal code for whatever reasons. Whether its the patches or something inherant in gccvb 4 is probably still an outstanding question. I setup a windows vm and installed cygwin just to compile with the 2.95 version because the compiled code was roughly 5 times faster than with the newer version. Things like using “inline” on a function which should cause the compiler to inject the function inline still produce normal function jump and return code in VBDE. Of course I don’t know enough about gcc to even guess at where to look for that kind of stuff.

By default I don’t think gccvb 4 builds with any optimizations. Building with -O3 or sometimes -Os can generate really performant code. However like we’ve mentioned there are still some outstanding bugs. gccvb 4 also has the ability to strip out code that isn’t used, resulting in a smaller .text section and therefore more room for other goodies in .data and .rodata.

DaVince wrote:
I guess you’re talking about this? I cloned the Git repo and at least now I understand what libgccVB is for. And that I need v810-gcc, which fails to compile (at least the version included in the gccVB 2.95 source).

Was referring to the compiler suite, but since it’s mentioned, the libgccvb headers I use are based on a really old set. I basically only use them for the equates and const mappings– setting up the column table is done in my crt0.s and the remaining functionality I rewrite to fit whatever project I’m working on.

I can’t speak for that ancient version of GCC since I’ve never bothered with it, but the executables for gccVB 4 are prefixed with “v810-” so they don’t interfere with the system version. In any case, the make_v810.sh script installs everything to /opt/gccvb so it’s contained in its own directory, but that’s easily changed by editing the script yourself.

There’s a bit of an effort to stabilize gccVB 4 (It’s functional, but has a few outstanding issues) so that’s probably why it’s not in the Tools area yet. If you’re feeling adventurous you can search the forums for the patches and build it yourself…

Took a look tonight at the gcc 4.4.2 patch that’s floating out there, and I think I might have an idea of what’s causing this: in output_move_single…

	return "movhi hi(%1),%.,%0\n\tmovea lo(%1),%0,%0";

That line occurs several times for each time a 32-bit quantity needs to be loaded, and basically encodes those two instructions as a couplet, always. So the compiler doesn’t have a chance to optimize away the extra instruction. Looks either to me like a bug, or it simply doesn’t bother optimizing that case by design. I’m leaning toward the former, as it’s clearly suboptimal code. Anybody with knowledge of GCC have any ideas how to fix it?

Thanks M.K., that should work for WRAM accesses.

Upon closer inspection I see that this pattern is also applied to other areas of memory. I found a simple example using hardware registers:

movhi 0x200, r0, r10
movea 0x20, r10, r10
ld.b [r10], r11
mov 5, r12
andi 0xFF, r11, r11
ori 0x10, r11, r11
st.b r11, [r10]
movhi 0x200, r0, r11
movea 0x18, r11, r11
st.b r12, [r11]
movhi 0x200, r0, r11
movea 0x1C, r11, r11
st.b r0, [r11]

This is the equivalent assembly when built with -Os to:

HW_REGS[TCR] |= TIMER_20US;
HW_REGS[TLR] = 0x05;
HW_REGS[THR] = 0x00;

The instruction ‘movhi 0x200, r0, r11’ is executed twice even when nothing is done in between to change the value of r11, making this unnecessary. This is when compiled with -Os for code size. Is this something that can be worked around (without writing it by hand in asm) or a bug in GCC/v810?

I’ve built gccVB 4 under OS X, both PPC and Intel, and combined with Eclipse is how I do all my VB development. The FlashBoy software though is relegated to a PC with I believe a flaky motherboard, so I’ll have to find a solution for that one of these days. As far as I know the FlashBoy software is Windows-only, so you’d either have to use WINE/CrossOver or write one yourself– seems somebody has figured out the protocol… http://www.planetvb.com/modules/newbb/viewtopic.php?topic_id=3673&post_id=8666#forumpost8666

HorvatM wrote:
I’m surprised there isn’t the same amount of myths/criticism/superstition surrounding it as the VB, which, IMO, is for now simply a better product. Maybe because it’s got John Carmack behind it? Gunpei Yokoi apparently wasn’t a good enough celebrity.

For one thing, Oculus is making a big deal out of the fact that they are developer kits and there are warnings and guidelines everywhere about keeping latency down, using the proper projections, crafting the experience to minimize disorientation, etc. They have a whole team at Oculus dedicated to this kind of cognitive research, whereas with the VB Nintendo just put together documentation basically saying “This is how 3D works on the VB, good luck!”

The default IPD on the Rift is pretty reasonable– 64mm, which according to statistical research is the military average– and unlike the VB, the field of view is much *much* larger. That last bit alone is a big part of why the Rift is getting such widespread praise– no other consumer-focused headset before it has been able to achieve such immersion. I love the VB too, but there’s no denying that the Rift provides a much better VR experience. The VB by comparison is just a toy. There really shouldn’t be a comparison.

retronintendonerd are you a developer? Getting a DK2 now if you’re not a developer would be a waste of money, as the new Crescent Bay makes it obsolete in every way. (I played with it at OC last weekend). Odds are, given what happened with the DK1->DK2 switch, another SDK refactoring is coming– by that time, the DK2 won’t be compatible with anything anymore. I’d hold off until CV1.

I tell people unfamiliar with the system that it’s the processing power of a SNES with a SuperFX chip combined with the color depth and sound hardware of the Game Boy. Not quite 100% accurate, but close enough and mostly gets the point across. 🙂

My only request is the ability to fill the VB’s entire ROM address space. I don’t care what it looks like; I want a flash cart that I can use to build larger games. 🙂

Out of curiosity, are you working on a project that uses some of the aforementioned unemulated features? If so I’m very curious what for. (And would love such projects to lead to Mednafen bug fixes)

Unless your project relies on features that Mednafen currently struggles with, I still don’t see why testing against hardware so often is necessary. Mednafen is more than good enough for constant iteration on most projects.

If you encounter a bug in Mednafen that needs fixing, do what I’ve done and write a patch for Ryphecha. That’s the beauty of open source. 😉

One feature request: importing of MIDI files. There are already a lot of excellent MIDI sequencers out there (some commercial-grade), and I for one don’t relish the thought of learning a new composing tool. 😉

Thanks Guy Perfect!

In my first post I tried adding a fade in/out to my sound files as you suggested but that had no effect on the popping. However, what you said got me to take a closer look at the raw sound data, and that got me thinking (always a scary thing). My mixer was resetting LRV to 0 when there were no samples to play, and my sound data is converted from unsigned 8-bit PCM to unsigned 4-bit. *But,* unsigned PCM is more like “offset PCM,” centered around 0x80, not 0 as I had incorrectly assumed. So… I added a tiny bit of code that resets LRV to 0x88 when no samples are present, and now the pops are gone, both on hardware and in Mednafen. 🙂

Using SSTOP for SFX silence isn’t an option for me since I have music playing in the other channels.

Ah, good question. No, I only set SSTOP when I initialize the DC waveform, then I reset SSTOP and from then on just leave LRV of each channel at 0 unless there are samples played.

I almost never use dev mode– most of my development happens against Mednafen, and my game consumes nearly 100% of the 16Mb of the FlashBoy anyway, so dev mode isn’t too effective for me. Why do you need to test against hardware so often? Is there a bug in Mednafen requiring hardware for accurate results?