Each actor's data is stored in a separate file. EnJj's data is in `data/overlays/actors/z_en_jj.data.s`, for example. At some point in the decompilation process we need to convert this raw data into recognisable information for the C to use.
2. wait until the data appears in functions, extern it, then import it at the end
Sometimes something between these two is appropriate: wait until the largest or strangest bits of data appear in functions, get some typing information out of that, and then import it, but for now, let's stick to both of these.
Both approaches have their advantages and disadvantages.
## Data first
<!-- Fig shows how to do this in his video. -->
This way is good for smaller actors with little data. It is not really suitable for EnJj, for example, because of the enormous section of data labelled as `D_80A88164`.
### Example: `EnTg`
We give a simple example of this approach with a small NPC actor, EnTg, that is, the spinning couple.
The data file looks like
<details>
<summary>
Large code block, click to show
</summary>
```
.include "macro.inc"
# assembler directives
.set noat # allow manual use of $at
.set noreorder # don't insert nops after branches
.set gp=64 # allow use of 64-bit general purpose registers
We transfer this data into the actor file by pretending it is an array of words. The InitVars have already been processed and inserted into the C file, so just need to be uncommented. Data cannot change order, so the two pieces above the InitVars must stay there. At the end of this process, the top of the file will look like
Copy this in below `D_80B18910`, delete the original words of data, change the name back to `D_80B18910`, and put `sCylinderInit` commented out above it:
A quick check of the diff shows that we just need to put the action function set to last, and it matches.
Following the function tree as usual, we find the only other place any data is used is in `func_80B1871C`. From its use in `EnTg_Draw`, we realise that this is a `PostLimbDraw` function. Giving mips2c the correct prototype, it comes out as
which clearly doesn't like the words we fed it. We see that `sp18` should be a `Vec3f` from the cast in the `Matrix_MultVec3f`, so the last three words are padding (a `Vec3f` has size `0xC`, and it's not using it like an array), and we can convert it to
(we can see from the assembly doing `lw` and `sw` rather than `lwc1` and `swc1` that it is doing a struct copy, rather than setting it componentwise).
## Extern and data last
Externing is explained in detail in the document about the [Init function](beginning_decomp.md). To summarize, every time a `D_address` appears that is in the data file, we put a
```C
extern UNK_TYPE D_address;
```
at the top of the file, in the same order that the data appears in the data file. We can also give it a type if we know what the type actually is (e.g. for colliders, initchains, etc.), and convert the actual data and place it commented-out under the corresponding line. This means we don't have to do everything at once at the end.
Once we have decompiled enough things to know what the data is, we can import it. The advantage of doing it this way is we should know what type everything is already: in our work on EnJj, for example, we ended up with the following data at the top of the file
and the only thing we don't know about is the cutscene data `D_80A88164`.
*Before doing anything else, make sure `make` gives `OK`.*
First, we tell the compiler to ignore the original data file. To do this, open the file called `spec` in the main directory of the repository, and search for the actor name. You will find a section that looks like
Find-and-replace `D_80A88CB4` and `D_80A88CE0` by `sCylinderInit` and `sInitChain` respectively. Notice the naming scheme: static data symbols always start with `s` in our style. (Unless they are inlined, but we'll worry about that later.)
We still need to deal with the cutscene data. This is special: because it's so large, it goes in its own file. Make a new file called `z_en_jj_cutscene_data.c` in the same directory as `z_en_jj.c`. Include the actor's header file and `z64cutscene_commands.h`, and put `// clang-format off` and `// clang-format on` in the file (this is so that our formatting script doesn't wreak havoc with the formatting of the cutscene macros). Thus the contents of the file is currently
```C
#include "z_en_jj.h"
#include "z64cutscene_commands.h"
// clang-format off
// clang-format on
```
Finally, we have a script to convert the cutscene data into macros, namely `csdis.py` in the tools folder. We can just give it the VRAM address that the `D_address` is referring to, and it will output the cs macros:
<details>
<summary>
(Very) long code block, click to view
</summary>
```
$ ./tools/csdis.py 80A88164
ovl_En_Jj: Rom 00E3E3D0:00E3F9E0 VRam 80A87800:80A88E10 Offset 000964
Copy this into the file we just made (given the length, you may prefer to `>` it into a file and copy it from there, rather than the terminal itself). Save and close that file: we won't need it any more.
To replace the `extern`, because the data is in a separate file, we include the file in a particular way:
```C
#include "z_en_jj_cutscene_data.c" EARLY
```
(`EARLY` is required specifically for cutscene data. See [the definition of the CutsceneData struct](../include/z64cutscene.h) for exactly why.)
Lastly, uncomment the InitVars block that's been sitting there the whole time. The data section of the file now looks like
That should be everything, and we should now be able to `make` without the data file with no issues
But running `make`, we get the dreaded Error 1:
```
md5sum: WARNING: 1 computed checksum did NOT match
make: *** [Makefile:172: all] Error 1
```
Oh no! What went wrong?
To find out what went wrong, we need to use `firstdiff.py`. This tells us where our ROM starts to differ:
```
$ ./firstdiff.py
First difference at ROM addr 0x144F4, gDmaDataTable (RAM 0x80016DA0, ROM 0x12F70, build/asm/dmadata.o)
Bytes: 00:E3:F9:D0 vs 00:E3:F9:E0
Instruction difference at ROM addr 0xE3ED48, En_Jj_InitVars (RAM 0x80A88140, ROM 0xE3ED10, build/src/overlays/actors/ovl_En_Jj/z_en_jj.o)
Bytes: 40:00:00:00 vs 00:F0:00:00
Instruction difference at ROM addr 0xE3F900, D_80A88D40 (RAM 0x80A88D30, ROM 0xE3F900, build/data/overlays/actors/z_en_jj.reloc.o)
Bytes: 00:00:09:40 vs C4:89:80:00
Instruction difference at ROM addr 0xE3F9D4, En_Js_SetupAction (RAM 0x80A88E00, ROM 0xE3F9D0, build/src/overlays/actors/ovl_En_Js/z_en_js.o)
Bytes: AC:85:02:8C vs 00:00:00:00
Instruction difference at ROM addr 0xE3F9D8, EnJs_Init (RAM 0x80A88E08, ROM 0xE3F9D8, build/src/overlays/actors/ovl_En_Js/z_en_js.o)
Bytes: 27:BD:FF:B0 vs 00:00:00:00
Instruction difference at ROM addr 0xE3FAFC, EnJs_Destroy (RAM 0x80A88F2C, ROM 0xE3FAFC, build/src/overlays/actors/ovl_En_Js/z_en_js.o)
Bytes: 27:BD:FF:E8 vs 8F:B0:00:34
Over 1000 differing words, must be a shifted ROM.
Map appears to have shifted just before D_80A88D40 (build/data/overlays/actors/z_en_jj.reloc.o) -- in En_Jj_InitVars?
```
Ignore the first line: `gDmaDataTable` is always different if the ROM is shifted. The useful lines are usually the next line, and the guess it makes at the end.
To fix this, we use a binary diff program. A suitable one is `vbindiff`: run it on the baserom and the zelda_whatever one the compiler generates:
In this, press `g` to open up goto position, and paste in the address `0xE3ED10` from the first important line of the `first_diff` output. This gives us the following:
![vbindiff for data](images/vbindiff_data_1.png)
Notice that the numbers in the bottom pane is all shifted one word to the left. We therefore need some extra padding somewhere. The real issue is where. Thankfully the guess at the bottom gives us a hint: let's try just under `InitVars`. Just put a padding variable straight after them:
Some symbols in the data have been decompiled wrongly, being incorrectly separated from the previous symbol due to how it was accessed by the actor's functions. However, most of these have now been fixed. Some more detail is given in [Types, structs and padding](types_structs_padding.md) If you are unsure, ask!
## Inlining
After the file is finished, it is possible to move some static data into functions. This requires that:
1. The data is used in only one function
2. The ordering of the data can be maintained
Additionally, we prefer to keep larger data (more than a line or two) out of functions anyway.
A .bss contains data that is uninitialised (actually initialised to `0`). For most actors all you need to do is declare it at the top of the actor file without giving it a value, once you find out what type it is.