Register - Login
Views: 95200162
Main - Memberlist - Active users - Calendar - Wiki - IRC Chat - Online users
Ranks - Rules/FAQ - Stats - Latest Posts - Color Chart - Smilies
09-23-18 07:39:49 PM

Jul - Game Development/Mod Projects - Datamijn New poll - New thread - New reply
Next older thread
Sanqui
1750
🦉
Level: 77


Posts: 1717/1750
EXP: 4140971
For next: 50158

Since: 12-20-09

Pronouns: any
From: Czechia (NEW!)

Since last post: 7 days
Last activity: 5 days

Posted on 05-04-18 05:07:22 AM Link | Quote
I'm working on an awesome binary data description language and YOU get to be a part of it!

Datamijn is primarily a domain-specific language for describing binaries. It's meant to be as concise as possible. You know what the data looks like and you just want YAML out of it. Datamijn won't make you write any boilerplate.

A datamijn definition file is natural to write and read. Example:


position {
x u8
y u8
}

_start {
version u16
positions [8]position
}

In this .dm file, we describe a type, "coords", consisting of two bytes. We then describe how to parse the binary from the beginning.

Is position only used once? No need to pollute your namespace. In datamijn, you can always define a type anonymously.


version u16
positions [8]{
x u8
y u8
}

In this example, we haven't defined _start, so datamijn starts parsing the file from the first field.

A binary file parsed with such a definition might end up looking like this pleasant YAML.


version: 1
positions:
- x: 5
y: 6
- x: 26
y: 12
...


What can datamijn actually do?

Here's a more complex example in which I'm parsing a bunch of data from Final Fantasy 1... including strings!

ff1.dm:

ff1.yaml:


string_ptr is of particular note: it involves reading a NES pointer, doing some calculations to convert it into an absolute pointer within the NES ROM, and then reading a zero-terminated string at that location. The idea is that you won't even have to write this pointer math: nesptr should be a part of the standard library!

The end goal is that you'll describe a ROM according to what you know... and get all of the data out of it, without having write boilerplate or possibly even touching a programming language. Of course, this is all a lie, because if you'll be decompressing some data using datamijn, you'll probably have to write a for loop or two. But the idea is it should be all super concise and intuitive.

Support for writing is also on the roadmap, but only as a second class citizen. It's likely there will be some limitations.

Datamijn isn't complete and I won't be releasing it until I deem it usable, but it's already quite powerful and evolves with my own needs.

Is datamijn interesting to you? Do you have any questions or ideas? I'll probably follow this up with some thoughts on how to do decompression, but this should be enough for now.
marrub

Level: 9


Posts: 21/32
EXP: 2822
For next: 340

Since: 01-23-18


Since last post: 14 days
Last activity: 6 days

Posted on 07-06-18 03:56:39 AM Link | Quote
I'm all for more utilities like QuickBMS. Exploding binary data into readable (and processable) text data is invaluable in reverse-engineering and this seems like a pretty darn good thing for doing that.
Next older thread
Jul - Game Development/Mod Projects - Datamijn New poll - New thread - New reply




Rusted Logic

Acmlmboard - commit 5d36857 [2018-03-03]
©2000-2018 Acmlm, Xkeeper, Inuyasha, et al.

25 database queries.
Query execution time: 0.155621 seconds
Script execution time: 0.006136 seconds
Total render time: 0.161757 seconds
Memory used: 524288