Original Post

I was wondering if anyone knows what’s the format for these “.CL” files? More specifically, the “LD-V810.CL” and “MD-V810.CL” files.
I know the “LD-xxx” file is used by CPARSE (the parser) and the “MD-xxx” files are used by CGRIND (the code generator??), but they made absolutely no sense whatsoever when I viewed them with a hex editor.

I did some research on my own, but I haven’t been able to find much. One of the things I did manage to find however, was the following, small addition between “GMAIN.CL” (95-03-28) and “GMAIN.CL” (95-07-10)
The (95-07-10) version has the following bytes inserted at offset 0x0411: “AF B3 AC FD FF F7 D1 AF B3 AC F6 F6 FF F7 FD F2 D1”
Other files had way too many differences, so I chose not to identify what the differences were as it would take too long.

**Now, here’s some context as to why I am asking such a question**
I’ve been reverse-engineering the SNES game Earthbound for quite a while, and I’ve noticed the code looks pretty much like the output of a C compiler.
However, it doesn’t look like the output of any known C compilers for the 65816 at the time (WDC816CC or ZARDOZ), it looks like something COMPLETELY different and custom, even with its own calling convention!
Recently, I’ve found out about VUCC, and naturally I was very curious, so I opened VUCC.EXE in a text editor to search for strings. To my surprise, I found the following string:
“Copyright (c) 1994 HAL Laboratory, inc.”

It seems like HAL Laboratory, the developers for MOTHER 2 (japanese title for Earthbound), had also helped develop VUCC… But interestingly enough, HAL isn’t credited in neither CGRIND.EXE nor CPARSE.EXE, which strikes me as odd, since VUCC is just a convenience tool that just runs JUNCPP (pre-processor), then CPARSE (parser) then finally CGRIND (code generator?)
And then, when I investigated CGRIND.EXE, something REALLY surprised me. I was able to find strings representing instruction opcodes for the 65816 processor, including pseudo-instructions like “mem8”, “mem16”, “idx8” and “idx16” (although the correct ISAS syntax for these pseudo-instructions seems to be “OFF16A”, “ON16A”, “OFF16I” and “ON16I”). Some more interesting strings in CGRIND are “peep-convxz816” and “peep-merge816”, which looks like optimization steps for 65816 code-generation.
Now I know this might sound crazy, considering MOTHER 2 began development in 1992, but I do believe HAL has used some sort of “LD-816.CL” and “MD-816.CL” files for developing MOTHER 2 for the SNES. And I do believe it is possible “re-creating” these files somehow, although with a ton of effort and work…

TL;DR: HAL Laboratory possibly used (or even developed!) CPARSE.EXE and CGRIND.EXE with custom “LD-816.CL” and “MD-816.CL” files to generate 65816 machine code from C source code to develop MOTHER 2 (which is Earthbound in America), and I want to know if anyone knows about this “.CL” format, to know if it’s even possible to create such files that allow for compiling C source code for the 65816.

I would appreciate any info, even if they’re just guesses about the format.

2 Replies

If you’ve looked inside the executables, you have probably seen what looks like Lisp source code. My guess is that at least CPARSE and CGRIND are partially written in Lisp, and “.CL” may stand for “Common Lisp” or “compiled Lisp”.

If you look at the included V810 assembly files, you can see that they all start with the “ISV810” directive. If you look inside ISAS.EXE or ISAS4G.EXE, you can see that it also contains the strings “IS65” (65816?), “ISSFX” (Super FX?), “ISDMG” (Game Boy?) and (what looks like) mnemonics for those architectures.

So yes, I think the VUCC system was not specifically developed for compiling V810 code, but that V810 support was simply added as just another target architecture. The executables also seem to support a few undocumented command line options. You can start with trying to get ISAS to assemble some 65816 code and if that works, try to reverse engineer the .CL files to reconstruct the 65816 code generator (easier said than done, I know).

Thanks for the lead, this will certainly help.

That is correct, ISAS can target multiple architectures just fine: 65816 (IS65), SPC700 (ISSND), Super FX (ISSFX), V810 (ISV810)

The main problem with reverse-engineering this file format is that these are extremely old DOS programs, which I have no experience in reversing. I did understand that they use a “DOS Extender 4GW”, but that was about it.
And looking at the .CL files in a hex editor didn’t help… The only noticeable thing I did find is that every byte in the .CL files has its MSB set (0x80). Getting rid of the MSB and opening it as a text file also doesn’t reveal any text strings, so I really don’t know where to go next…

I’m not really familiar with Lisp, but does it operate on some sort of VM with its own bytecode? And even if that was the case, that wouldn’t make much sense as there’s absolutely no text strings in these files…

EDIT: My double-post reply with everything I learned about “.CL” files and the VUCC link to the C compiler used in MOTHER 2 seemed to have gone to limbo, so I’ll just put it here with an edit…

I know this is a very old topic and a double-post but… I didn’t want to leave a forum post unanswered for eternity when I finally got my question.

So, whoever might be searching about this, for whatever reason it may be, I finally know how to read the “.CL” files, and there’s new concrete evidence that VUCC was derived from the C compiler used for MOTHER 2

The answer was really simple all along: Just XOR every byte with 0xFF then add 0x20. You get fully readable LISP source code in ASCII (although no line breaks whatsoever, so you have to format the code yourself for better readability)

This simple Python script does the trick:
(please let this code block work… I don’t really know html for this)
´
# Very hackily put together

import sys

for filename in sys.argv[1:]:
cl_file = open(f'{filename}.CL’, ‘rb’)
out_file = open(f’out/{filename}.txt’, ‘wb’)

out_data = []
for c in cl_file.read():
c = (c ^ 0xFF) + 0x20

# Needs & 0xFF because some bytes end up as 0x0116. What’s up with that?
out_data.append(c & 0xFF)

out_file.write(bytes(out_data))

cl_file.close()
out_file.close()
print(f’Done with {filename}!’)

print(‘\nDone with every file!!!!’)
´

Regarding the supposed existence of “LD-816.CL” and “MD-816.CL” and link to MOTHER 2, evidence has been finally found!!

Excerpt from “CPARSE.CL”:
´
(defun sfcp (ccodef &optional debugger)
(test-parser ccodef “ld-65816.cl” debugger)
)
´

Excerpt from “CGRIND.CL”:
´
(defun sfcc (ccodef)
(test-cgrind ccodef “cparse-65816” “md-65816.cl” “-usefulenum”)
)
´

Evidence of link to the compiler used in MOTHER 2:
https://web.archive.org/web/20070317223659/http://jp.franz.com/base/seminar/2005-11-18/SeminarNov2005-Abe.pdf
https://web.archive.org/web/20170531070139/http://cl-www.msi.co.jp/reports/wblcl.pdf
https://www.4gamer.net/games/999/G999905/20151225009/

  • This reply was modified 3 years, 6 months ago by Catador. Reason: edit instead of double-post
  • This reply was modified 3 years, 6 months ago by Catador. Reason: test markup. was replaced with "`", so might as well try triple ```

 

Write a reply

You must be logged in to reply to this topic.