Posts Tagged searching
Splitted Key Tree
Looking around for a simple and efficient searching algorithm for IP addresses, I found the ‘Trie‘. The basics of this data structure was nearly what I needed. But I wanted a solution for generic objects and key. I named it the splitted key tree! I implemented only the functionality for adding and searching only. The design is based on the K.I.S.S.-design.
- We have a value V which can be transformed into a key K.
(If V1=V2 then it must be K(V1)=K(V2). In other words: if 2 value objects are equal, their keys must be equal too.)
- A key K can be splitted in several sub-keys.
- These sub-keys are stored in a hierarchically order. The path through the hierarchically structure is the ‘key-path’
- The value is stored in the end-node of its key-path (isLeaf).
Characteristics (same as tries):
- The data structure can easily be traversed for searching.
- If m is the maximum number of sub-keys of plenty of values, the maximum hierarchy of the tree is m. Only m steps are needed to put a new value into the tree, and only m steps are required to find out, whether a value object is stored inside this tree or not.
- The construction of the tree can be done in linear memory requirements and linear runtime. Because this just depends on the number of sub-keys of the main key of the value.
Let’s have a look at a simple example with strings:
Ok, that’s a very easy implementation, but you can see the principle very well. In this example a string (the value object) is splitted into its characters, which build the sub-keys. Lets see how the string ‘ana’ will be stored in the data structure, if ‘abc’ is already in there:
- Value: ‘ana’, 1st key: ‘a’, rest: ‘na’. Root has already an ‘a’-node, take it.
- 1st key: ‘n’, rest: ‘a’. The ‘a’-node hasn’t any child-node with the key ‘n’. Create it and take it.
- 1st key: ‘a’, rest: ” The ‘n’-node hasn’t any child-node with the key ‘a’. Create it and take it. Because of the empty rest, the value object has to be stored in the current node.
The search algorithm traverses the tree of sub-keys. Each value object itself is stored in the end-node of its ‘key-path’. The maximum search steps is the number of the characters of the string.
The splitted key tree is very efficient for objects with a fix length of sub-keys. Lets focus on IPv4 (e.g. 192.168.178.1). Each IPv4 can be splitted into its 5 byte arrays which build the sub-keys. Each IPv4 needs 5 steps to put it into the data structure and 5 steps to find it. A very effective searching.
The source code can be downloaded here. The file has to be renamed to ‘jsplitttedtree-0-1.zip’.
To understand the code you may have a look at the junit-tests at first.
The implementation has only one interface and one class:
- JSKTree.java: Implements the functionality to build the tree and to search within.
- KeyBuilder.java: This interface has to be implemented by the object which handles the transformation of the value to a key and the splitting of the key in its sub-keys.