|I see, thank you!
I only know that the tree comes from the huge array at 0x30004 - reading three words over and over again, always discarding the third one, and once it's gone past the FF FFs, it does the same again but does an ASL twice to both words.
I don't know if I'm making much sense, I'm writing this in a hurry lmao
|It's some kind of prefix code (most likely Huffman code but you can't tell without decompressing everything and checking if the tree is optimal). Your "compression keys" are an unnecessarily large binary tree.
I was hoping you knew more about where the tree comes from, since it might be nice to replace it to improve the compression ratio.
|Sure thing, Joe.
The basic gist of it is this:
Step 1. Load 0x7F8 to register X.
Step 2. Read a compressed byte as a bitmask. Like, the byte gets ASL'd eight times and it checks for the carry bit being set, so $AF would be 1010 1111 = true false true false true true true true.
Step 3. If bit is set (true) increase register X by 2 and go to step 4. If bit is not set (false) don't touch register X and go sraight to step 4.
Step 4. Load (7E:95CE + X) to register Y
Step 5. Load register Y to register X (they're now the same)
Step 6. Load (7E:95CE + X) to register Y
Step 7. Is register Y now $FFFF? If not, go back to step 2. If it is, go to step 8.
Step 8. Shift register X right twice and write it to the output stream.
Step 9. Exit if all bytes have been written. Go back to step 1 if not.
That's a simplified version of the real decompression code found at #$C0/0800.
Here's a quick annotated snippet of the real code. (Might be better if you just set a breakpoint at $C0:0800, step through it and pay close attention to what's being done to the registers)
#$C0/0800 E2 20 SEP #$20 SEP #$20
#$C0/0802 A6 0B LDX $0B [$00:0F6E] Load $7F8 to X
#$C0/0804 EB XBA Flip A's bytes
#$C0/0805 D0 0F BNE $0F [$0816] Branch to $0816 if not equal
#$C0/0807 A7 19 LDA [$19] [$CC:4C04] Load compressed byte to A
#$C0/0809 EB XBA Flip A's bytes
#$C0/080A A9 08 LDA #$08 Set A to $08
#$C0/080C C2 20 REP #$20 REP #$20
#$C0/080E E6 19 INC $19 [$00:0F7C] Add 1 to $00:0F7C
#$C0/0810 E2 20 SEP #$20 SEP #$20
#$C0/0812 D0 02 BNE $02 [$0816] Branch to $0816 if not equal
#$C0/0816 3A DEC A A -= 1
#$C0/0817 EB XBA Flip A's bytes
#$C0/0818 0A ASL A Shift A left
#$C0/0819 90 02 BCC $02 [$081D] If carry not set, branch to $081D
#$C0/081B E8 INX If carry set, increase X value by 2
#$C0/081C E8 INX -
#$C0/081D BC CE 95 LDY $95CE,x[$7E:9DC8] Load $95CE+X to Y
#$C0/0820 BB TYX Load Y to X
#$C0/0821 BC CE 95 LDY $95CE,x[$7E:9DC2] Load $95CE+X to Y
#$C0/0824 C0 FF FF CPY #$FFFF If Y now $FFFF?
#$C0/0827 D0 DB BNE $DB [$0804] If not, go back to $0804
#$C0/0829 C2 20 REP #$20 However, if $FFFF has been found:
#$C0/082B A8 TAY Transfer A to Y
#$C0/082C 8A TXA Transfer X to A
#$C0/082D 4A LSR A Shift A right twice
#$C0/082E 4A LSR A -
#$C0/082F 87 15 STA [$15] [$7E:3DE0] Store A to outputstream
#$C0/0831 98 TYA Transfer Y to A
#$C0/0832 E6 15 INC $15 [$00:0F78] $00:0F78 += 1
#$C0/0834 C6 1D DEC $1D [$00:0F80] $00:0F80 -= 1
#$C0/0836 D0 C8 BNE $C8 [$0800] if not equal, go back to $0800
EDIT: how do i format the code snippet more nicely guhhh
Right now you might be thinking that the code makes no sense alone because it needs to read values from the magic region from $7E:95CE to $7E:wherever so it can compare those values to $FFFF and make the decision to write to output. And you're right! I call this region the "compression keys" and to utilise it I simply made a binary file that contains everything from $7E:95CE to 0x7E:9DC9 (in my first program iteration I just copied the bytes straight out of Geiger's debugger into a new binary file, but that's not wise)
Even though these crucial bytes are in RAM, the values in that region never change so you don't have to worry about that. The bytes end up in that region by a simple transfer from the ROM at 0x30004.
Finally, here's the tool I used for my decompression: https://www.dropbox.com/s/kx9vgvkasm7vb9w/unpakkun_v2.py?dl=0
• It's super verbose if you set the debugmode flag to True. Might help.
• It gives you everything out.
• The filenames are the entry headers (i.e. 775-25EBE8-427A-427A-0.bin, five parts; index, offset, outsize, insize, compressed)
• It can't recompress files.
• There's no prompts, so by default you must have an unheadered Sutte Hakkun ROM in the same folder you run the script in and it must be named sh.smc
Originally posted by Raccoon Sam
This is the part where I'd write more about the compression format but I never wrote a comprehensive specification of the algorithm, just a decompressor
Would you be able to share that decompressor? I think I know what it's doing, but I'm not all that great at reading 65816 code.
|Ooh, I just found out about this game the other day and wanted to check it out. Sweet!
Originally posted by Raccoon Sam
Link to IPS
Just a suggestion, but I would advise using a BPS patch, as they work on headered and unheadered ROM's.
|Link to IPS
NOTE: The patch is for an unheadered Sutte Hakkun (NP) ROM!
Well, it's done. After about a decade I've finally managed to get the translation to a state I'm comfortable releasing.
I've loved Sutte Hakkun since I first played it sometime in 2004. AGTP announced their work on a translation sometime in 2005 I think, and after seeing the project stagnate, I began my own work in late 2007 – In fact, my first progress video is still on YouTube ( LINK )
Back then I didn't know any Japanese and I didn't really know how to hack ROMs either, but I managed to translate parts that had obvious meanings (menus, title, etc.) and didn't require tilemap editing.
I learned my way with a hex editor and got to tilemap editing and made a second progress report much later, sometime in 2009 ( LINK ), and although I had made some more progress, I couldn't continue much further because I still had no script to work with, only Google translate, Tile Molester and a hex editor.
Sometime in 2012, someone called Veal Gins saw the latter video and sent me an e-mail with a full translated script. I was stoked and absolutely sure I could finish the translation very soon. I worked a lot with the hack during this time (notably with the Hakkun's Hut articles) but could never get the parts with compressed graphics done or nail the tutorial dialogs down. I aimed for a 2014 release (as seen in the teaser at LINK ) but ultimately I couldn't.
Enter 2017 – I'm much better in English (better translation), I know some Python (to automate the tedious parts) and some 65816 (I cracked the compression format), and I had stumbled into something very interesting*. These things finally led me to finish the hack.
It's been many years since I began working on this project so although everything(?) is translated, the hack is an unholy amalgamation of kinda-engrishy proto-translations with questionable graphics hack workarounds from the 2007 me, but also neat, well thought out tilemap repoints and perfect english from the person I am today.
If I'd do it all again, I'd do a lot of things differently (I hate the title screen, there's grammar errors in a few places, my workflow was inefficient), but nonetheless here we are.
I guess I'm done. Some screenshots:
So yeah enjoy I guess. Hit me up if you encounter any bugs.
*) This might need its own thread, but the "interesting" thing I mentioned is a massive ($3EA0 bytes, $10 bytes per entry so 1002 entries in total) array located in 0x30BFE. A similar array is present in other games in the Sutte Hakkun series (the several Satellaview versions). It is as follows:
pp pp pp pp dsize csize cflag uu uu uu uu uu uu
00 4C CC 00 00 40 C9 12 01 00 FF FF FF FF FF FF
pp = pointer to data
dsize = decompressed size
csize = compressed size
cflag = is the data compressed?
uu = unused
(dsize and csize are the same if cflag is 0)
I checked out every pointer and the data they point to. Frankly, the ROM hacking implications here are absolutely amazing. You might already realise that this massive array is essentially a file lookup table and it encompasses the whole (rest of the) ROM; everything from 0x34A9E to 0x300000 (pretty much everything except the game code, some smaller stuff and the player graphics). A Table Of Contents, if you will.
Someone should write a general-purpose file extractor/inserter. Instead of worrying with pointers, weird workarounds and editing a single blob - the ROM, you could edit real honest-to-god binary files, insert them wherever and simply update the TOC instead.
After extracting the compressed and uncompressed files and having a look at them, I was even more impressed – everything is super neat. Things are in a very logical order, even the level format can be understood by a baby, tilemaps are clean, graphics pages value readability over minimising tile redundancy... and everything is its own file. Every stage, every physmap, every tilemap, every palette, every graphics page. Additionally, the ROM is huge so there's lots of free space at the end of the ROM anyway.
Alas, although I know Python, I don't think I know if well enough to author a real editor.
If you're a competent programmer and like this fantastic puzzle game, you should definitely look into this. Sutte Hakkun is a super easy game to hack given the proper tools and it's a shame it hasn't seen much love in the ROM hacking scene.
If you're going to dive deeper into this and do some research, I advise you to work on a clean ROM instead of my translation. I repointed several entries in the TOC so you won't get 'pure' files out. I never wrote down the addresses where the TOCs are in the Satellaview games, but I can confirm that they are there.
This is the part where I'd write more about the compression format but I never wrote a comprehensive specification of the algorithm, just a decompressor (yes, I stepped through the decompression code in Geiger's SNES9X disassembler and translated the 65816 to highly unoptimised Python. It works but I'm still not 100% sure what makes it tick. But even so, it's not a terribly complex compression scheme.)
Lastly, for TCRF folks - I quickly scoured through the outfiles and although there are a few unused graphics, nothing terribly interesting. There might be unused tilemaps, though!