Categories and Subcategories
The adjacency model
The fundamental structure of the adjacency model is a one-to-many relationship between a parent entry and its child entries. As with any one-to-many relationship, the child entries carry a foreign key to their parent. What makes the adjacency model different is that the parent and child entries are both stored in the same table.
create table categories ( id integer not null primary key , name varchar(37) not null , parentid integer null , foreign key parentid_fk (parentid) references categories (id) );
Here's some sample data that might populate this table, and we should be able to get an idea of the parent-child relationships (if not grasp the entire hierarchy) just by looking at the data:
Terms commonly used with the adjacency model include tree, root, node, subtree, leaf, path, depth and level. There can be one or more trees in the table, and the parent foreign key is NULL for each tree's root node. A root node is therefore at the "top" of its tree. A node is any entry, while a leaf is any node that has no children, i.e. for which there exists no other node having that node as its parent. A subtree is the portion of the tree "under" any node. The depth of a subtree is the maximum number of levels of subtree beneath that node. These may not be official terminology definitions, but they work for me.
Why is it called a tree when it grows down from the "root" which is at the top? Mere convention.
Now let's see how a tree or hierarchy can be used to implement a category/subcategory structure.
Working with categories and subcategories
Using the adjacency model to implement categories and subcategories can be reduced to two simple steps:
- manage the hierarchical data
- display the hierarchical data
Managing the hierarchy is nothing special. Just look again at the table layout. There's a primary key column (id) and a foreign key referencing it (parentid). Other than that, it's a dead simple table. Use INSERT, UPDATE, and DELETE as with any other table. Whether we actually declare the foreign key on
parentid, which is necessary for referential integrity, is secondary to the basic design. (Referential integrity means that the parent row should exist before the child row referencing it is inserted, and so on. See the article Relational Integrity in the Resources below.)
Displaying the hierarchy is challenging, but not difficult. Categories and subcategories can be handled in HTML in many ways. Current best practice is to use nested unordered lists. For further information, see Listamatic: one list, many options in the Resources below.
Displaying all categories and subcategories: site maps and navigation bars
To display the hierarchy, we must first retrieve it. The following method involves using as many LEFT OUTER JOINs as necessary to cover the depth of the deepest tree. For our sample data, the deepest tree has four levels, so the query requires four self-joins. Each join goes "down" a level from the node above it. The query begins at the root nodes.
select root.name as root_name , down1.name as down1_name , down2.name as down2_name , down3.name as down3_name from categories as root left outer join categories as down1 on down1.parentid = root.id left outer join categories as down2 on down2.parentid = down1.id left outer join categories as down3 on down3.parentid = down2.id where root.parentid is null order by root_name , down1_name , down2_name , down3_name
Notice how the WHERE clause ensures that only paths from the root nodes are followed. This query produces the following result set:
Each row in the result set represents a distinct path from a root node to a leaf node. Notice how the LEFT OUTER JOIN, when extended "below" the leaf node in any given path, returns NULL (representing the fact that there was no node below that node, i.e. satisfying that join condition).
As we can see, this result set contains all our original categories and subcategories. If the categories and subcategories are being displayed on a web site, this query can therefore be used to generate the complete site map. An abbreviated query, that goes down only a certain number of levels from the roots, regardless of whether there may be nodes at deeper levels, can be used for the site's navigation bar.
We can display this sample data using nested unordered lists like this:
What's the easiest way to transform the result set into the nested ULs? In ColdFusion, we use nested CFOUTPUT tags, with the GROUP= parameter on all but the innermost list. Very straightforward indeed. In other scripting languages, as the saying goes, your mileage may vary. Take comfort in the fact that once you've coded it, you will never have to change your site map page again.
What if the hierarchy is more than, say, three or four levels deep? What if it's fifteen levels deep? My response to this question is threefold.
First, a query with fifteen self-joins may be a little more tedious to code but most assuredly will not present any difficulty to your database engine.
Second, in certain databases such as Oracle and DB2, recursion is built in, so you can go as many levels deep as you wish—although don't fool yourself, the coding required to display an arbitrary number of levels is no picnic either. Do not make the mistake of simulating recursion by coding a script module that calls itself, because from the database perspective, this is a series of calls (a query in a loop) and the performance will reflect this.
Thirdly, if you have a tree that goes more than three or four levels deep, you may have difficulty conveying this structure satisfactorily in a visual way. You may want to go back and re-think how you expect your users to actually navigate through the hierarchy. Sometimes the best solution is simply to show no more than three levels, with some sort of visual clue that there are further levels below the nodes shown.
The path to the root: the breadcrumb trail
Retrieving the path from any given node, whether it is a leaf node or not, to the root at the top of its path, is very similar to the site map query. Again, we use LEFT OUTER JOINs, but this time we go "up" the tree from the node, rather than "down."
select node.name as node_name , up1.name as up1_name , up2.name as up2_name , up3.name as up3_name from categories as node left outer join categories as up1 on up1.id = node.parentid left outer join categories as up2 on up2.id = up1.parentid left outer join categories as up3 on up3.id = up2.parentid order by node_name
Here's the result set from this query:
Here each row in the result set is a single path, one for every node in the table. On a web site, such a path is often called a breadcrumb trail. (This name is somewhat misleading, because it suggests that it might represent how the visitor arrived at the page, which is not always the case. The accepted meaning of breadcrumb is simply the path from the root.)
In practice, we'd have a WHERE clause that would specify a single node, so in effect, the results above are all of the breadcrumbs in the table.
To display a breadcrumb trail in the normal fashion, from root to node, just display the result set columns in reverse order, and ignore the nulls. For example, let's say we run the above query for the category "companion" and get this:
The breadcrumb would look like this:
- Listamatic: one list, many options
- The power of CSS when applied to the lowly UL.
- Trees in SQL by Joe Celko
- The nested set model, alternative to the adjacency list model.
- Storing Hierarchical Data in a Database
- Modified Preorder Tree Traversal method.
- Relational Integrity
- Primary and foreign keys and stuff like that.