(De)Serialize¶
TL;DR
Nutree provides methods to serialize and deserialize tree structures as JSON.
Arbitrary objects can be stored and restored by implementing mappers.
The native format supports compact formatting and optional zipping.
Native Format¶
Plain String Nodes¶
Assuming we have a tree like this:
Tree<'fixture'>
├── 'A'
│ ├── 'a1'
│ │ ├── 'a11'
│ │ ╰── 'a12'
│ ╰── 'a2'
╰── 'B'
├── 'a11' <- a clone node here
╰── 'b1'
╰── 'b11'
We can serialize this tree in an efficient JSON based text format:
tree.save(path)
# or
with open(path, "w") as fp:
tree.save(fp)
Reading is as simple as:
tree = Tree.load(path)
# or
with open(path, "r") as fp:
tree = Tree.load(fp)
Additional data can be stored:
meta = {"foo": "bar"}
tree.save(path, meta=meta)
and retrieved like so:
meta = {}
tree = Tree.load(path, file_meta=meta)
assert meta["foo"] == "bar"
The result will be written as a compact list of (parent-index, data) tuples.
The parent index starts with #1, since #0 is reserved for the system root node.
Note how the 2nd occurrence of ‘a11’ only stores the index of the first
instance:
{
"meta": {
"$generator": "nutree/0.5.1",
"$format_version": "1.0",
"foo": "bar"
},
"nodes": [
[0, "A"],
[1, "a1"],
[2, "a11"],
[2, "a12"],
[1, "a2"],
[0, "B"],
[6, 3],
[6, "b1"],
[8, "b11"]
]
}
Arbitrary Objects¶
Assuming we have a tree with data objects like this:
Tree<'company'>
├── Node<'Department<Development>', data_id=125578508105>
│ ├── Node<'Person<Alice, 23>', data_id={123-456}>
│ ├── Node<'Person<Bob, 32>', data_id={234-456}>
│ ╰── Node<'Person<Charleen, 43>', data_id={345-456}>
╰── Node<'Department<Marketing>', data_id=125578508063>
├── Node<'Person<Charleen, 43>', data_id={345-456}>
╰── Node<'Person<Dave, 54>', data_id={456-456}>
In order to (de)serialize arbitrary data objects, we need to implement mappers:
def serialize_mapper(node, data):
if isinstance(node.data, Department):
data["type"] = "dept"
data["name"] = node.data.name
elif isinstance(node.data, Person):
data["type"] = "person"
data["name"] = node.data.name
data["age"] = node.data.age
data["guid"] = node.data.guid
return data
def deserialize_mapper(parent, data):
node_type = data["type"]
if node_type == "person":
data = Person(name=data["name"], age=data["age"], guid=data["guid"])
elif node_type == "dept":
data = Department(name=data["name"])
return data
When we call
tree.save(path, mapper=serialize_mapper)
the above tree would be written as
{
"meta": {
"$generator": "nutree/0.5.1",
"$format_version": "1.0",
},
"nodes": [
[0, { "type": "dept", "name": "Development" }],
[1, { "type": "person", "name": "Alice", "age": 23, "guid": "{123-456}" }],
[1, { "type": "person", "name": "Bob", "age": 32, "guid": "{234-456}" }],
[1, { "type": "person", "name": "Charleen", "age": 43, "guid": "{345-456}" }],
[0, { "type": "dept", "name": "Marketing" }],
[5, 4],
[5, { "type": "person", "name": "Dave", "age": 54, "guid": "{456-456}" }]
]
}
Similarly load a tree from disk:
tree = Tree.load(path, mapper=deserialize_mapper)
Compact Format¶
File size can be reduced by using a compact format that removes redundancy:
Keys like "type" or "name" are repeated for every node.
We can pass a key_map argument to save() in order
to shorten the key names:
key_map = {
"type": "t",
"name": "n",
"age": "a",
"guid": "g",
}
tree.save(path, mapper=serialize_mapper, key_map=key_map)
The result will look like this:
{
"meta": {
"$generator": "nutree/0.7.0",
"$format_version": "1.0",
"$key_map": { "type": "t", "name": "n", "age": "a", "guid": "g" }
},
"nodes": [
[0, { "t": "dept", "n": "Development" }],
[1, { "t": "person", "n": "Alice", "a": 23, "g": "{123-456}" }],
[1, { "t": "person", "n": "Bob", "a": 32, "g": "{234-456}" }],
[1, { "t": "person", "n": "Charleen", "a": 43, "g": "{345-456}" }],
[0, { "t": "dept", "n": "Marketing" }],
[5, 4],
[5, { "t": "person", "n": "Dave", "a": 54, "g": "{456-456}" }]
]
}
Still some values like "dept" or "person" are repeated.
We can pass a value_map argument to save()
in order to replace repeating values for a distinct key with an index into a
list of values. Note that value_map expects unmapped key names, i.e. ‘type’
instead of ‘t’:
value_map = {
"type": ["dept", "person"]
}
tree.save(path, mapper=serialize_mapper, key_map=key_map, value_map=value_map)
The result will look like this:
{
"meta": {
"$generator": "nutree/0.7.0",
"$format_version": "1.0",
"$key_map": { "type": "t", "name": "n", "age": "a", "guid": "g" },
"$value_map": { "type": ["dept", "person"] }
},
"nodes": [
[0, { "t": 0, "n": "Development" }],
[1, { "t": 1, "n": "Alice", "a": 23, "g": "{123-456}" }],
[1, { "t": 1, "n": "Bob", "a": 32, "g": "{234-456}" }],
[1, { "t": 1, "n": "Charleen", "a": 43, "g": "{345-456}" }],
[0, { "t": 0, "n": "Marketing" }],
[5, 4],
[5, { "t": 1, "n": "Dave", "a": 54, "g": "{456-456}" }]
]
}
Note
The value_map is only useful for keys that have a limited number of
distinct values.
If the number of distinct values is close to the number of nodes, the
value_map will actually increase the file size.
By default key_map is set to True which expands to
key_map = {"data_id": "i", "str": "s"}.
There is no default for value_map.
For a TypedTree the defaults are different:
key_map = key_map = {"data_id": "i", "str": "s", "kind": "k"}
value_map = {
"kind": [<distinct `kind` values>]
}
Using Derived Classes¶
Instead of passing mapper functions and args, we can also use a derived class:
class MyTree(TypedTree):
DEFAULT_KEY_MAP = TypedTree.DEFAULT_KEY_MAP | { "type": "t", "name": "n", "age": "a" }
DEFAULT_VALUE_MAP = {"type": ["person", "dept"]}
def calc_data_id(tree, data):
if hasattr(data, "guid"):
return data.guid
return hash(data)
def serialize_mapper(self, node: Node, data: dict):
if isinstance(node.data, fixture.Department):
data["type"] = "dept"
data["name"] = node.data.name
elif isinstance(node.data, fixture.Person):
data["type"] = "person"
data["name"] = node.data.name
data["age"] = node.data.age
return data
@staticmethod
def deserialize_mapper(parent: Node, data: dict):
node_type = data["type"]
print("deserialize_mapper", data)
if node_type == "person":
data = fixture.Person(
name=data["name"], age=data["age"], guid=data["data_id"]
)
elif node_type == "dept":
data = fixture.Department(name=data["name"], guid=data["data_id"])
print(f"deserialize_mapper -> {data}")
return data
tree = MyTree(name="MyTree")
...
tree.save(path)
...
tree.load(path)
Compressed Format¶
Tree.save() accepts a compress argument that can be set to True.
Tree.load() can detect if the input file has a compression header and will
decompress automatically. Note that this works independently from the file
extension:
tree.save(path, compress=True) # default is False
tree_2 = Tree.load(path, auto_uncompress=True) # default is True
assert tree.compare(tree_2) == 0
compression defines an optional compression method.
Possible values are: zipfile.ZIP_STORED, .ZIP_DEFLATED, .ZIP_BZIP2, and .ZIP_LZMA.
True is uses the default compression zipfile.ZIP_BZIP2.
The default False disables compression and stores as plain json.
Though mileage may vary, ZIP_DEFLATED is usually the fastest compression method,
while ZIP_LZMA is the most effective but slower. ZIP_BZIP2 is somewhere in the
middle.
Load times of (un)compressed files is often not affected by the compression
method.
(De)Serialize as List of Dicts¶
Note
While converting a tree to/from a dict is handy at times,
for standard (de)serialization the save() /
load() API is recommended.
to_dict_list() converts a tree to a list of
- potentially nested - dicts.
We can pass the result to json.dump():
with open(path, "w") as fp:
json.dump(tree.to_dict_list(), fp)
The result will look similar to this:
[
{
"data": "A",
"children": [
{ "data": "a1", "children": [{ "data": "a11" }, { "data": "a12" }] },
{ "data": "a2" }
]
},
{
"data": "B",
"children": [{ "data": "b1", "children": [{ "data": "b11" }] }]
}
]
Reading can then be implemnted using from_dict():
with open(path, "r") as fp:
obj = json.load(fp)
tree = Tree.from_dict(obj)
See also
This example tree only contains plain string data. Read Working with Objects on how to (de)serialize arbitrary objects.
See also
Read Graphs on how to create graphical outputs in PNG or SVG format.