(De)Serialize#
Native Format#
Plain String Nodes#
Assuming we have a tree like this:
Tree<'fixture'>
├── 'A'
│ ├── 'a1'
│ │ ├── 'a11'
│ │ ╰── 'a12'
│ ╰── 'a2'
╰── 'B'
├── 'a11' <- a clone node here
╰── 'b1'
╰── 'b11'
We can serialize this tree in an efficient JSON based text format:
tree.save(path)
# or
with open(path, "w") as fp:
tree.save(fp)
Reading is as simple as:
tree = Tree.load(path)
# or
with open(path, "r") as fp:
tree = Tree.load(fp)
Additional data can be stored:
meta = {"foo": "bar"}
tree.save(path, meta=meta)
and retrieved like so:
meta = {}
tree = Tree.load(path, file_meta=meta)
assert meta["foo"] == "bar"
The result will be written as a compact list of (parent-index, data) tuples.
The parent index starts with #1, since #0 is reserved for the system root node.
Note how the 2nd occurrence of ‘a11’ only stores the index of the first
instance:
{
"meta": {
"$generator": "nutree/0.5.1",
"$format_version": "1.0",
"foo": "bar"
},
"nodes": [
[0, "A"],
[1, "a1"],
[2, "a11"],
[2, "a12"],
[1, "a2"],
[0, "B"],
[6, 3],
[6, "b1"],
[8, "b11"]
]
}
Arbitrary Objects#
Assuming we have a tree with data objects like this:
Tree<'company'>
├── Node<'Department<Development>', data_id=125578508105>
│ ├── Node<'Person<Alice, 23>', data_id={123-456}>
│ ├── Node<'Person<Bob, 32>', data_id={234-456}>
│ ╰── Node<'Person<Charleen, 43>', data_id={345-456}>
╰── Node<'Department<Marketing>', data_id=125578508063>
├── Node<'Person<Charleen, 43>', data_id={345-456}>
╰── Node<'Person<Dave, 54>', data_id={456-456}>
In order to (de)serialize arbitrary data objects, we need to implement mappers:
def serialize_mapper(node, data):
if isinstance(node.data, Department):
data["type"] = "dept"
data["name"] = node.data.name
elif isinstance(node.data, Person):
data["type"] = "person"
data["name"] = node.data.name
data["age"] = node.data.age
data["guid"] = node.data.guid
return data
def deserialize_mapper(parent, data):
node_type = data["type"]
if node_type == "person":
data = Person(name=data["name"], age=data["age"], guid=data["guid"])
elif node_type == "dept":
data = Department(name=data["name"])
return data
When we call
tree.save(path, mapper=serialize_mapper)
the above tree would be written as
{
"meta": {
"$generator": "nutree/0.5.1",
"$format_version": "1.0",
},
"nodes": [
[0, { "type": "dept", "name": "Development" }],
[1, { "type": "person", "name": "Alice", "age": 23, "guid": "{123-456}" }],
[1, { "type": "person", "name": "Bob", "age": 32, "guid": "{234-456}" }],
[1, { "type": "person", "name": "Charleen", "age": 43, "guid": "{345-456}" }],
[0, { "type": "dept", "name": "Marketing" }],
[5, 4],
[5, { "type": "person", "name": "Dave", "age": 54, "guid": "{456-456}" }]
]
}
Similarly load a tree from disk:
tree = Tree.load(path, mapper=deserialize_mapper)
Compact Format#
File size can be reduced by using a compact format that removes redundancy:
Keys like "type"
or "name"
are repeated for every node.
We can pass a key_map
argument to save()
in order
to shorten the key names:
key_map = {
"type": "t",
"name": "n",
"age": "a",
"guid": "g",
}
tree.save(path, mapper=serialize_mapper, key_map=key_map)
The result will look like this:
{
"meta": {
"$generator": "nutree/0.7.0",
"$format_version": "1.0",
"$key_map": { "type": "t", "name": "n", "age": "a", "guid": "g" }
},
"nodes": [
[0, { "t": "dept", "n": "Development" }],
[1, { "t": "person", "n": "Alice", "a": 23, "g": "{123-456}" }],
[1, { "t": "person", "n": "Bob", "a": 32, "g": "{234-456}" }],
[1, { "t": "person", "n": "Charleen", "a": 43, "g": "{345-456}" }],
[0, { "t": "dept", "n": "Marketing" }],
[5, 4],
[5, { "t": "person", "n": "Dave", "a": 54, "g": "{456-456}" }]
]
}
Still some values like "dept"
or "person"
are repeated.
We can pass a value_map
argument to save()
in order to replace repeating values for a distinct key with an index into a
list of values. Note that value_map expects unmapped key names, i.e. ‘type’
instead of ‘t’:
value_map = {
"type": ["dept", "person"]
}
tree.save(path, mapper=serialize_mapper, key_map=key_map, value_map=value_map)
The result will look like this:
{
"meta": {
"$generator": "nutree/0.7.0",
"$format_version": "1.0",
"$key_map": { "type": "t", "name": "n", "age": "a", "guid": "g" },
"$value_map": { "type": ["dept", "person"] }
},
"nodes": [
[0, { "t": 0, "n": "Development" }],
[1, { "t": 1, "n": "Alice", "a": 23, "g": "{123-456}" }],
[1, { "t": 1, "n": "Bob", "a": 32, "g": "{234-456}" }],
[1, { "t": 1, "n": "Charleen", "a": 43, "g": "{345-456}" }],
[0, { "t": 0, "n": "Marketing" }],
[5, 4],
[5, { "t": 1, "n": "Dave", "a": 54, "g": "{456-456}" }]
]
}
Note
The value_map
is only useful for keys that have a limited number of
distinct values.
If the number of distinct values is close to the number of nodes, the
value_map
will actually increase the file size.
By default key_map
is set to True
which expands to
key_map = {"data_id": "i", "str": "s"}
.
There is no default for value_map
.
For a TypedTree
the defaults are different:
key_map = key_map = {"data_id": "i", "str": "s", "kind": "k"}
value_map = {
"kind": [<distinct `kind` values>]
}
Using Derived Classes#
Instead of passing mapper
functions and args, we can also use a derived class:
class MyTree(TypedTree):
DEFAULT_KEY_MAP = TypedTree.DEFAULT_KEY_MAP | { "type": "t", "name": "n", "age": "a" }
DEFAULT_VALUE_MAP = {"type": ["person", "dept"]}
def calc_data_id(tree, data):
if hasattr(data, "guid"):
return data.guid
return hash(data)
def serialize_mapper(self, node: Node, data: dict):
if isinstance(node.data, fixture.Department):
data["type"] = "dept"
data["name"] = node.data.name
elif isinstance(node.data, fixture.Person):
data["type"] = "person"
data["name"] = node.data.name
data["age"] = node.data.age
return data
@staticmethod
def deserialize_mapper(parent: Node, data: dict):
node_type = data["type"]
print("deserialize_mapper", data)
if node_type == "person":
data = fixture.Person(
name=data["name"], age=data["age"], guid=data["data_id"]
)
elif node_type == "dept":
data = fixture.Department(name=data["name"], guid=data["data_id"])
print(f"deserialize_mapper -> {data}")
return data
tree = MyTree(name="MyTree")
...
tree.save(path)
...
tree.load(path)
Compressed Format#
Tree.save()
accepts a compress
argument that can be set to True
.
Tree.load()
can detect if the input file has a compression header and will
decompress automatically. Note that this works independently from the file
extension:
tree.save(path, compress=True) # default is False
tree_2 = Tree.load(path, auto_uncompress=True) # default is True
assert tree.compare(tree_2) == 0
compression defines an optional compression method.
Possible values are: zipfile.ZIP_STORED, .ZIP_DEFLATED, .ZIP_BZIP2, and .ZIP_LZMA.
True is uses the default compression zipfile.ZIP_BZIP2.
The default False disables compression and stores as plain json.
Though mileage may vary, ZIP_DEFLATED is usually the fastest compression method,
while ZIP_LZMA is the most effective but slower. ZIP_BZIP2 is somewhere in the
middle.
Load times of (un)compressed files is often not affected by the compression
method.
(De)Serialize as List of Dicts#
Note
While converting a tree to/from a dict is handy at times,
for standard (de)serialization the save()
/
load()
API is recommended.
to_dict_list()
converts a tree to a list of
- potentially nested - dicts.
We can pass the result to json.dump():
with open(path, "w") as fp:
json.dump(tree.to_dict_list(), fp)
The result will look similar to this:
[
{
"data": "A",
"children": [
{ "data": "a1", "children": [{ "data": "a11" }, { "data": "a12" }] },
{ "data": "a2" }
]
},
{
"data": "B",
"children": [{ "data": "b1", "children": [{ "data": "b11" }] }]
}
]
Reading can then be implemnted using from_dict()
:
with open(path, "r") as fp:
obj = json.load(fp)
tree = Tree.from_dict(obj)
See also
This example tree only contains plain string data. Read Working with Objects on how to (de)serialize arbitrary objects.