If you could use only one data structure, which one would you choose? A hash table? While it supports the basic insert, find, and remove operations, it doesn’t keep the elements in sorted order. Therefore, it can’t efficiently perform some tasks that are frequently encountered, such as finding the minimum element or producing an ordered list of all elements. The standard Java packages don’t contain a data structure for ordered data.

What would you require of this ideal, sole structure? It should be easy to use (and preferably easy to implement); it should be able to hold an object of any class (as long as you provide a method for comparing objects); it should be efficient in terms of both speed and memory; and it should be thread safe.

The randomized search tree (“treap”), devised by C.R. Aragon and R. Seidel and
described in “Randomized Search Trees” (*Algorithmica*,
16(4/5):464-497, 1996), fulfills all of these requirements,
and it offers the functionality of a general-purpose sorting routine,
priority queue, hash table, stack, or standard queue.

The data structure consists of a balanced tree.
Therefore, an element is never more than about
log_{2} *n* steps away,
where *n* is the number of elements in the treap and
log_{2} *n* is the binary logarithm
(you may think of log_{2} *n*
as the number of bits needed to represent the number *n*
in binary notation).

Being able to access the elements in the treap in log *n* time
is a big improvement over some of the most common basic data structures:

InsertFindMinUnsorted array 1 n n Sorted array n log n 1 Unsorted linked list 1 n n Sorted linked list n n 1 Hash table 1 1 n Treap log n log n log n

The treap is reasonably space efficient as well.
Each element is stored in a node in a tree,
and each node contains an integer and two references,
one to the left subtree and one to the right subtree.
Hence, to store *n* elements in a treap, we need *n* integers
and 2*n* references. The standard hash-table implementation in
`java.util`

uses *n* integers for the hash values and
*n* references for the linked lists.
Furthermore, it uses an array with a size that is typically close to *n*.
For example, with a load factor of 1.0, the hash table will use the same amount
of extra space as a treap if the table is full and more space
if it’s only partly filled. One problem with the `java.util`

hash-table implementation is that the array doesn’t shrink.
Even if you remove all but one of the elements,
the size of the array is still proportional to the maximum
number of elements that were stored in the table during its lifetime.
The treap doesn’t have this problem.

Java doesn’t have a general method for comparing two elements.
Therefore, you must make sure that an object that will act
as a key in a treap implements a comparison method.
To assure this, the treap package contains an interface,
`Ordered`

, that declares the abstract method
`compareTo()`

.

/* An interface for ordered objects. A class that is to be used * as a key in a treap must implement this interface. */ public interface Ordered { /* Compares two ordered objects. Returns the value 0 if this object equals * the argument; a value less that 0 if this object is less than argument; * and a value larger than 0 if this object is greater than the argument. */ int compareTo(Ordered anotherOrderedObject); }

Any class that is to be used as a key in the treap must implement this interface.

To illustrate, I’ll write a simple class, `Year`

,
that implements the interface `Ordered`

.
You are free to design the class as you see fit.
The only requirement is that you implement the `compareTo()`

.
This is easy in this case: You just compare the integers representing the years.
You need to use an explicit cast to tell the compiler
that the argument is an instance of the class `Year`

,
because the comparison method implements the abstract method
`compareTo()`

, whose argument is of type `Ordered`

.

import order.*; import java.util.Enumeration; class Year implements Ordered { int year; Year(int y) { year = y; } public int compareTo(Ordered y) { if (year > ((Year) y).year) return 1; else if (year < ((Year) y).year) return -1; else return 0; } public String toString() { return Integer.toString(year); } } class TreapDemo { public static void main(String[] args) { Treap treap = new Treap(); treap.put(new Year(1935), "Elvis Presley"); treap.put(new Year(1926), "Chuck Berry"); treap.put(new Year(1941), "Bob Dylan"); treap.put(new Year(1936), "Roy Orbison"); treap.put(new Year(1915), "Muddy Waters"); Enumeration k = treap.keys(); Enumeration e = treap.elements(); while (k.hasMoreElements()) { System.out.print(k.nextElement() + ": "); System.out.println(e.nextElement()); } } }

With this out of the way, you can use the treap
in the same way as a Java hash table.
In this example, you use objects of the class `Year`

as keys,
and strings as values. The call to the method `keys()`

returns an enumeration of the treap. The enumeration is the standard
Java enumeration class, whose only purpose is to return
the keys of the treap in sorted order.
Similarly, the method `elements()`

returns an enumeration of the values.

If you want to add an element to the middle of an array, you have to move a large number of other elements to make room for the new one. The most common way to avoid this problem is to use a linked list, where the elements don’t reside at consecutive memory positions but are instead linked together: Each element in the list has a reference to the next one. In a linked list, it’s possible to insert an element in between two other elements just by changing two references. However, in a linked list, you can’t directly access an element in the middle of the list, you have to follow a large number of links to find it. One standard technique to overcome this problem with arrays and linked lists is to use a search tree.

A search tree consists of a number of nodes. Each node contains a key and has two references, one to the left subtree and one to the right:

E / \ B H / / \ A F K

Note that the tree is ordered. All keys in the left subtree are smaller than the key in the node, and all keys in the right subtree are larger. This property holds for every node in the tree. Therefore, it’s possible to search the tree in a simple manner. For example, to check if the key F is present in the tree, you start by comparing F to the key in topmost node (the root) of the tree. F is larger than E and you know that F must be in the right subtree (if it’s in the tree at all).

Next compare F to H, the root of the right subtree. It’s smaller, and you turn to the left, where you find the key you are looking for. You can use the same technique to insert a new node. If the key of the new node isn’t already present in the tree, the search will take you to a position at the bottom of the tree where the new node can be inserted.

But you still have a potential problem. The algorithm above is efficient only if the tree is well balanced. For example, if each node in the tree has only one child, the structure will behave just like a linked list: You have to visit every node in the tree to find the one at the bottom. There are plenty of algorithms available that try to keep a search tree balanced. Most of them are rather complicated. The treap offers a simple solution.

You will use rotations (the simplest possible rebalancing operation) to help keep the tree balanced. The idea is to change the form of the tree without disturbing the order of the keys. The left tree in this figure is transformed into the right one by a “right rotation.”

B A / \ / \ A <-> B / \ / \

This operation is easy to implement – only two references need to be updated, and the order of the keys is preserved. The corresponding balancing operation that transforms the right tree into the left one is called a “left rotation.”

Now comes the stroke of genius.
You want the tree to look as if it was constructed from a random sequence of updates.
(As R. Sedgewick and P. Flajolet describe in *An Introduction to the Analysis of Algorithms*,
Addison-Wesley, 1996, random trees are nicely balanced.
The expected length of an average search path is roughly
1.4 log_{2} *n* - 2,
and the expected length of the longest search path is about
4.3 log_{2}* n*.)
To achieve this, you assign a random priority number to each node
before you insert it into the tree,
and you make sure that each node in the tree has higher priority than its children.
If you do this, the tree will have the same form as if the nodes
had been added in priority order (that is random order).

The insertion algorithm is best explained by an example.

(a) E3 (b) E3 / \ / \ B5 H7 B5 H7 / / \ / / \ A6 F9 K8 A6 G2 K8 \ / G2 F9 (c) E3 (d) G2 / \ / \ B5 G2 E3 H7 / / \ / \ \ A6 F9 H7 B5 F9 K8 \ / K8 A6

In Figure A the node G (with priority 2) has just been added to the treap. It has been placed in the correct position with respect to the ordering of the keys, but it violates the priority requirement: It has higher priority than its parent. To fix this you perform a left rotation, Figure B, to move it higher up in the tree. G still has higher priority than its parent and we perform a right rotation; see Figure C. Finally, G is rotated up to the root of the tree; Figure D. Now the tree fulfills both the key order and the priority order requirements.

At first this algorithm may seem hard to implement. It looks as if you would have to walk up the tree, which is hard since the references point in the other direction. However, this problem is easily overcome if you formulate the insertion algorithm recursively

iftree is emptyreturnnodeifnode.key < tree.key Insert node into tree.leftiftree.prio > tree.left.prio rotate tree to the rightelseInsert node into tree.rightiftree.prio > tree.right.prio rotate tree to the leftreturntree

Compared to those of other balanced search tree schemes, the delete operation is also simple.

- If the node to be deleted is at the very bottom of the tree, you remove it directly and no rebalancing is necessary.
- If the node has only one nonempty subtree, you remove the node and replace it with the root of the subtree.
- If the node has two nonempty subtrees, you look at the priority numbers of the roots of the subtrees. If the left subtree has the highest priority, you rotate the root of this subtree to the left; otherwise, rotate the root of the other subtree to the right. This will move the node to be deleted further down the tree and you have reduced the original problem to a smaller one that may be solved by calling the delete algorithm recursively.

The only piece of code in the Java treap implementation that is
somewhat complicated is the enumeration – the object that returns
the nodes in the tree in sorted order.
A simple solution would be to have the enumeration store the latest key
that was returned and find the next one by searching the tree starting at the root.
To traverse the complete tree in this fashion would require at least
*n* log *n* time.
But it’s possible to do it in linear time.
The idea is to use an explicit stack to keep track not only of the current key
but also the path from the root of the tree down to this node.
To find the next node, you continue down the tree to the right
(pushing the nodes onto the stack).
If there are no elements in the right subtree,
you instead backtrack on the path using the stack.
The code is complicated because you need to check
whether you arrived at the current node from the left or from the right.

To turn the treap into a really useful general-purpose data structure, you should add some more functionality, such as methods for reporting the size of the treap, removing the minimum (or maximum) element, and creating a clone or a string representation of the treap. You should also wrap up the classes and interfaces in a package and add Java style documentation. This is all straightforward pedestrian programming. The complete package is available electronically.

There are many ways to further optimize the code. Most of them will yield only minor improvements. For example, eliminating the recursion from the insert and delete methods will make the code harder to maintain and the gain is likely to be only marginal, since simple function calls are efficiently implemented on most modern architectures.

The one optimization that is most likely to significantly improve
the speed is to minimize the cost of the comparisons.
A drastic way to do this is to use integers, rather than objects, as keys.
The modification of the code is slight.
All variables of type `Ordered`

should be changed
to `int`

, and all calls to `compareTo()`

should be replaced by an integer comparison.
This modification will have to be done manually,
since Java doesn’t (yet) offer a way to write code that operates
on both basic data types and objects.

The failure to use an appropriate data structure to maintain an ordered dynamic set of elements is one major reason why so many programs are slow. This problem can be solved by using the right data structure (and the treap is a very good choice).