diff --git a/docs/assets/extraction.md b/docs/assets/extraction.md
new file mode 100644
index 0000000000..f8d060fb34
--- /dev/null
+++ b/docs/assets/extraction.md
@@ -0,0 +1,8 @@
+Assets are not committed to the repo; instead, they are extracted from the ROM files as part of `make setup`.
+
+Assets are extracted to `extracted/VERSION/assets` (for example `extracted/ntsc-1.0/assets` for the `ntsc-1.0` version), based on the descriptions stored in xml files in `assets/xml/`.
+
+For details on the xml files contents, see [the assets xml specification file](../../tools/assets/descriptor/spec.md).
+
+The extraction tool can use [rich](https://github.com/Textualize/rich) if installed to make output prettier.
+If you are looking at output or errors from during extraction, consider installing rich for a better experience: `.venv/bin/python3 -m pip install rich`
diff --git a/tools/assets/descriptor/README.md b/tools/assets/descriptor/README.md
new file mode 100644
index 0000000000..6336e0d417
--- /dev/null
+++ b/tools/assets/descriptor/README.md
@@ -0,0 +1 @@
+This package serves as an abstraction layer wrapping assets xml files.
diff --git a/tools/assets/descriptor/spec.md b/tools/assets/descriptor/spec.md
index 2e248e4ee2..1988a7225c 100644
--- a/tools/assets/descriptor/spec.md
+++ b/tools/assets/descriptor/spec.md
@@ -1,7 +1,45 @@
+This document describes the expected structure of xml files describing assets.
+
+# Top elements
+
+## `Root`
+
+```xml
+
+ ...
+
+```
+
+This is the root element in the file, containing exclusively `` and `` elements as direct children.
+
+## `File`
+
+```xml
+
+ ...
+
+```
+
+A `` contains resources elements as children.
+
+- Required attributes: `Name`
+- Optional attributes: `Segment`
+
+`Name` is the name of the baserom file from which the data is to be extracted.
+
+`Segment` (decimal) is the segment number for the file.
## `ExternalFile`
-For example, if config.yml contains
+```xml
+
+```
+
+Declare the ``s in the xml may reference symbols from an external file.
+
+The external file is located by matching its name against the list of assets in the version's `config.yml`.
+
+For example, `baseroms/gc-eu/config.yml` contains
```yml
assets:
@@ -9,6 +47,57 @@ assets:
xml_path: assets/xml/objects/gameplay_keep_pal.xml
```
-then `` refers to that gameplay_keep entry.
+then `` refers to that gameplay_keep entry, which uses the `gameplay_keep_pal.xml` xml file when extracting assets for version gc-eu.
-----------
+
+# Resource elements
+
+Resource elements describe resources. Resources are pieces of data corresponding to a symbol each.
+
+Two attributes are required on all resource elements: `Name` and `Offset`.
+
+- `Name` is the name of the symbol associated to the resource.
+- `Offset` is the location in bytes from the start of the file data.
+
+## `Blob`
+
+```xml
+
+```
+
+Unstructured binary data.
+
+- Required attributes: `Size`
+
+`Size` is the size of the binary blob in bytes.
+
+## `DList`
+
+```xml
+
+```
+
+A display list.
+
+- Optional attributes: `Ucode`, `RawPointers`
+
+`Ucode` (defaults to `f3dex2`) picks the graphics microcode for which to disassemble the dlist. It may be `f3dex` or `f3dex2`.
+
+`RawPointers` (defaults to an empty value) is a comma-separated list of values the display list uses as raw pointers ("hex" instead of a symbol). The purpose of this attribute is to silence extraction warnings.
+
+## `Mtx`
+## `Texture`
+## `Array`
+## `Scene`
+## `Room`
+## `Collision`
+## `Cutscene`
+## `Path`
+## `Skeleton`
+## `LimbTable`
+## `Limb`
+## `Animation`
+## `CurveAnimation`
+## `LegacyAnimation`
+## `PlayerAnimation`
+## `PlayerAnimationData`