Working on my project, building in GCC 4, I have finally run into the issue shown in the attached screenshot and described in this post http://www.planetvb.com/modules/newbb/viewtopic.php?post_id=10819#forumpost10819 . What in C is an innocent call to a function I’ve defined, compiles to a jump to 0x0800027c– somewhere in VRAM, where Mednafen happily executes whatever is there. =P
I’m willing to look into it and dig into the GCC code but I haven’t the first thought of where to start. Any ideas?
Attachments:
Some more info for anyone other than me who may be looking at this:
Running ‘v810-objdump -t’ on the ELF file reveals that the symbol table is correct– in fact, the bogus address highlighted in the screenshot above is nowhere to be found in the symtable. I’m inclined to believe therefore that this might be a linker issue, though the linker script I’m using works fine in gccVB 2, so this is a weak theory.
Also, for what it’s worth, the difference between the bogus address and the one in the symbol table is 16766404 (FFD5C4) bytes– almost 16M.
I’m definitely interested in this topic (and squashing the bug), I’m just too dumb to add anything of use π
…although, I don’t think your “broken linker” theory is all that weak. In my experience, “what works in GCC2 should work in GCC4” is far from a valid assumption. But, again, I don’t know enough about the inner workings of either to say for sure, or to help fix it if that’s the problem.
Is there a certain pattern of code that triggers the bug, or is it just when a project gets to a certain size? Have you got a minimal example? Have you checked to see if anyone using another GCC4 target has encountered a similar bug?
blitter wrote:
Some more info for anyone other than me who may be looking at this:Running ‘v810-objdump -t’ on the ELF file reveals that the symbol table is correct– in fact, the bogus address highlighted in the screenshot above is nowhere to be found in the symtable. I’m inclined to believe therefore that this might be a linker issue, though the linker script I’m using works fine in gccVB 2, so this is a weak theory.
Also, for what it’s worth, the difference between the bogus address and the one in the symbol table is 16766404 (FFD5C4) bytes– almost 16M.
I’m reading this as well. I have just no idea how to help you. I find it interesting though.
Like RunnerPack I don’t know much about the behind the scenes stuff but I’m always up for learning something new. Any chance you can post some source code or the rom so I can have something to reproduce the error? I don’t have anything that produces this error (yet). I wouldn’t mind poking around in my free time.
More progress!
I think I’ve found a pattern to the addresses that do get generated. Basically, jal is jumping way too far ahead– in multiples of 0x400000– but sometimes still landing in a mirror of the ROM, where everything seemingly works fine. However, as the code in the mirrored ROM continues to get executed, and the logic returns to that same jal instruction, it points to an address yet another 0x400000 ahead. Eventually this takes the code path somewhere beyond the 0x07xxxxxx range of the ROM, causing a crash.
I’ve attached a ROM of a trimmed down version of my current WIP. Start it paused in Mednafen’s debugger, then look at the instruction at address 0x07001d5e, 0x07401d5e, 0x07801d5e, and so on.
I’m not going to post the source to my project right now, but I’ll try to put together a minimal example with source code in the next few days.
Seems really odd that the linker would come up with a jump of around 4M for a rom that’s only 128k in size… I found the first odd jump at 700005ee. If I understand how the linker should work I would think the linker pointer would never calculate an address larger than the total rom size. Can you post your linker script the vb.ld?
Hmm, well this is a pleasant surprise…
At your suggestion I went to grab the vb.ld file to attach to this post, but figured I should first try making a minimal version of that too. What’s attached is the result of that: a minimal linker script that actually seems to fix the problem for the moment– at least, I don’t see any bogus jal statements.
Hopefully I can close the book on this bug, but I’ll keep my eyes open. π
Yep, I spoke too soon. The new linker script I attached above certainly helps, but it does not solve the problem. Seeing bogus jal instructions again with destinations in the 0x08xxxxxx range. At this point I can’t think of anything I can do outside of digging into the linker code, so I’ll continue forward there.
Here is the linker script I’ve been using. I pulled it from one of the threads on this site but don’t remember which one. I would try it out and see if you still have the odd addresses showing up. Since making a change to it corrected some of the issues I would think at least part of the problem is the linker script. You might have more than one issue though.
Hi Greg, that’s the linker script I started with before trimming it down to what I posted earlier. Using the version you have actually makes it worse, but it’s apparent to me that it’s not just the linker script that’s the problem.
I’m actually suspicious of the following from the binutils patch file:
/* A PC relative 9 bit branch. */ + HOWTO (R_V810_9_PCREL, /* Type. */ + 2, /* Rightshift. */ + 2, /* Size (0 = byte, 1 = short, 2 = long). */ + 26, /* Bitsize. */ + TRUE, /* PC_relative. */ + 0, /* Bitpos. */ + complain_overflow_bitfield, /* Complain_on_overflow. */ + v810_elf_reloc, /* Special_function. */ + "R_V810_9_PCREL", /* Name. */ + FALSE, /* Partial_inplace. */ + 0x00ffffff, /* Src_mask. */ + 0x00ffffff, /* Dst_mask. */ + TRUE), /* PCrel_offset. */ + + /* A PC relative 22 bit branch. */ + HOWTO (R_V810_26_PCREL, /* Type. */ + 2, /* Rightshift. */ + 2, /* Size (0 = byte, 1 = short, 2 = long). */ + 26, /* Bitsize. */ + TRUE, /* PC_relative. */ + 7, /* Bitpos. */ + complain_overflow_signed, /* Complain_on_overflow. */ + v810_elf_reloc, /* Special_function. */ + "R_V810_26_PCREL", /* Name. */ + FALSE, /* Partial_inplace. */ + 0x03ffffff, /* Src_mask. */ + 0x03ffffff, /* Dst_mask. */ + TRUE), /* PCrel_offset. */
If one of the branches is 9-bit, why is Bitsize 26?
Likewise, why does the comment say “A PC relative 22 bit branch” when Type and Name are both R_V810_26_PCREL?
I think the 22 is a typo. According to the V810 manual, there are 9-bit and 26-bit branch instructions. As for the 9-bit branch having a “bitsize” of 26, I have no idea. It should probably also have a “size” of 1 (short) too, since the 9-bit branch instructions are 16-bit. The “Rightshift” value will probably also need correction.
Unless that “HOWTO” structure is documented somewhere, it’s probably going to take some experimentation to figure out what the proper settings are.
Have fun π
I would take a whack at it, but I don’t know if I have the right gcc installed to recompile gccvb from source…
I’m also curious about the HOWTO structure. If the src and dst mask are just for the address part of instruction it makes sense. But if the masks are for the full instruction I would think that the mask should be 0xffffffff. The 9bit mask doesn’t match just 9 bits so I’m scratching my head over this as well.
I did find some documentation on the reloc_howto_type structure and the HOWTO macro in this document: http://www.chameleon.synth.net/files/developer/pdf/gnu/bfd.pdf (page 40)
I kind of explained why I helped developed patches for GCC 4 in this post http://www.planetvb.com/modules/newbb/viewtopic.php?post_id=13509#forumpost13509 but for me it came down to two major things: much better code optimization, and support of the .incbin directive for easily bundling data blobs into the ROM. Plus there’s the principle of the thing: it makes very little sense to continue supporting a 12-year-old compiler when its base code is still actively maintained and the v810 patches can be updated to support the latest version.
Whelp, tried my hand at fixing up that HOWTO macro in binutils, and after recompiling gccVB and building a fresh ROM…
…
… no change. π
I’m scratching my head at this point, stumped. Back to the drawing board. For what it’s worth, I’ve attached a patch containing the changes I made.
Finding a project that exhibits this bug is kind of hit-or-miss, and I don’t want to post the code of what I showed earlier since it’s my contest entry, but I was lucky enough to find the bug in a test harness I’ve developed over the past couple weeks.
Attached is a zip file containing a work in progress of my game’s music player, source included. When compiled with GCC 4.4.2 and -O3, at 0x07000392 the call to initISR is translated to the address 0x07400310 instead of the correct 0x07000310. Since initISR is only called once, this bug doesn’t cause a crash in this case, but if this was a function called multiple times there would almost certainly be a crash.
Hopefully this helps any of you trying to track this down (*waves at dasi*) π
Attachments:
blitter Your music sample is Phenomenal!=D does this mean a composer can use the program inclosed to make music that sounds like this? I know that you say that this is only a work in progress and it has flaws but I think it’s Phenomenal! Ive tried it out on Mednafen and real hardware and i love the results of both! Kudos=D!
>B