I’m trying to make a model for categorizing some objects.
I’m trying to build a model that classifies objects.
I already tried using django-mptt to easily retrieve related categories, and now I’m searching different solutions to find the best one.
I have tried using django-mptt to retrieve related categories easily and now I am looking for different solutions to find the best one.
I can’t find out though what are main differences between Materialized Path, Adjacency List and Nested Set. Wikipedia didn’t give me a short answer, all I know is mptt is probably Nested Set…
I don’t know what the main difference is between concrete paths, adjacency lists, and nested settings.
Can anyone explain it to me in few words?
Can anyone explain it to me?
It’s easier to explain with examples than with a few words. Consider the sample tree, storing names:
It is easier to explain with examples than with a few words. Consider the example tree, storing name:
William Jones Blake Adams Tyler Joseph Miller Flint
Materialized Path means each node stores its full path encoded. For instance, you can store its index using a dot as a delimiter
The materialized path refers to the complete path where each node stores its encoding. For example, you can store indexes using dots as delimiters
Name Path William 1 Jones 1.1 Blake 1.2 Adams 1.2.1 Tyler 1.3 Joseph 2 Miller 2.1 Flint 2.2
So, Adams knows its full name is Wiliam Blake Adams, because it has the
1.2.1 path, pointing to the
1 node at first level — William — to the
1.2 node at level 2 — Blake — and
1.2.1 node at level 3 — Adams.
So, Adams knows that its full name is Wiliam Blake Adams, because it has a path of 1.2.1, pointing to 1 node William at the first level, pointing to 1.2 nodes for Level 2 Blake, and 1.2.1 nodes for Level 3 – Adams.
Adjacency List means the tree is stored by keeping a link to some adjacent nodes. For instance, you can store who is the parent and who is the next sibling.
Adjacency lists store trees by saving links to certain adjacent nodes. For example, you can store who is the parent node and who is the next sibling.
Name Parent Next William null Joseph Jones William Blake Blake William Tyler Adams Blake null Tyler William null Joseph null null Miller Joseph Flint Flint Joseph null
Notice that it could be as simple as just storing the parent, if we don’t need to keep the children of a node ordered. Now Adams can get all his ancestors recursively until null to find his full name.
Note that if there is no need to keep the node’s children ordered, it can be as simple as storing the parent node. Now Adams can recursively get all his ancestors until null to find his full name.
Nested sets means you store each node with some index, usually a left and right value, assigned to each one as you traverse the tree in DFS order, so you know its descendants are within those values. Here’s how the numbers would be assigned to the example tree:
Nested sets means that when traversing the tree in DFS order, each node is assigned some index (usually an lvalue and an rvalue) so that you Knows that its descendants are among these values. Here’s how to assign numbers to the example tree:
1 William 10 2 Jones 3 4 Blake 7 5 Adams 6 8 Tyler 9 11 Joseph 16 12 Miller 13 14 Flint 15
And it would be stored as:
It will be stored as:
Name left right William 1 10 Jones 2 3 Blake 4 7 Adams 5 6 Tyler 8 9 Joseph 11 16 Miller 12 13 Flint 14 15
So, now Adams can find its ancestors by querying who has a lower left AND a higher right value, and sort them by the left value.
Now, Adams can find its ancestors by asking who has a lower lvalue and a higher rvalue, and sort them by lvalue.
Each model has its strengthgths and weaknesses. Choosing which one to use really depends on your application, the database you’re using and what kind of operations you’ll be doing most often. You can find a good comparison here.
Each mode has its advantages and disadvantages. Choosing which one to use depends on your application, the database you are using, and the types of operations you use most. You can find a good comparison here.
The comparison mentions a fourth model that isn’t very common (I don’t know of any other implementation but my own) and very complicated to explain in a few words: Nested Interval via Matrix Encoding.
This comparison mentions a fourth model, which is not very common (I don’t know of any other implementations other than my own) and is used by several Very complex to explain in one word: nested intervals encoded by matrix.
When you insert a new node in a nested set you have to re-enumerate everyone who is ahead of it in the traversal. The idea behind nested intervals is that there’s an infinite number of rational numbers between any two integers, so you could store the new node as a fraction of its previous and next nodes. Storing and querying fractions can be troublesome, and this leads to the matrix encoding technique, which transforms those fractions in a 2×2 matrix and most operations can be done by some matrix algebra that isn’t obvious at first sight but can be very efficient.
When a new node is inserted into a nested set, each node preceding it in the traversal must be re-enumerated. The idea of nested intervals is that there are an infinite number of reasonable numbers between any two integers such that a new node can be stored as a fraction of its previous and next nodes. Storing and querying fractions can be cumbersome, which leads to the matrix encoding technique, which transforms these fractions in a 2×2 matrix. Most operations can be done with some matrix algebra, which is not obvious at first glance, but can be very efficient. .
Treebeard has very straightforward implementations of each one of Materialized Path, Nested Sets and Adjacency Lists, with no redundancy. django-mptt actually uses a mix of Nested Sets and Adjacency Lists, since it also keeps a link to the parent and can use it to both query children more efficiently, and to rebuild the tree in case it gets messed up by someone.
Treebeard has a very straightforward implementation of every materialized path, nested set, and adjacency list, with no redundancy. django-mptt actually uses a combination of nested sets and adjacency lists because it also retains the link to the parent node and can be used more efficiently to query the child nodes and rebuild the tree in case someone messes with it Smash.