As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers.
Alternatively you can here view or download the uninterpreted source code file.

1 \documentclass[twocolumn]{article} 2 3 \title{The Dynamic Binary search tree} 4 \author{} 5 6 \begin{document} 7 8 \maketitle 9 10 \section{Introduction} 11 12 Binary trees are an important data organization in which searching for 13 an element is performed in $O(\log{n})$ steps. Dynamic trees were new 14 elements can be added and removed may become unbalanced and thus more 15 sophisticated binary tree implementations have been suggested to keep 16 the tree in $O(\log{n})$ limits. 17 18 Our goal is controlling the height of a tree to small values. A balanced 19 tree fulfills this goal but a tree need not necessarily be balanced to have 20 its height controlled. 21 22 The algorithm for an efficient implementation of dynamic tree is described, 23 along with comparisons with existing dynamic tree implementations. 24 25 \section{Description of the tree} 26 27 The basic idea is using a simple binary search tree and setting a limit on 28 it's height. Let us use for the rest of this study the number 32 as height 29 limit. This restricts the total number of nodes to no more than $2^{32}$ as 30 long as duplicate elements are not allowed. 31 We should notice that due to the logarithmic nature of complexity, 32 in real-life applications a balanced tree will never reach such values for 33 it's height. 34 35 Supposing we can efficiently fulfill this condition, locating an element in 36 the tree will be performed in no more than 32 steps. Typically 37 $O(32)<O(\log{n})$. Actually it may result in a few more steps. 38 In practice it is an unimportant 39 difference and therefore it is as fast as any balanced tree. 40 41 Removing an element from the tree can be easily done without the necessity 42 of a balancing operation as long as by removing an element the 43 height of the tree is not increased. This condition can be implemented. 44 45 If the node to be removed has two child nodes then the procedure is to find 46 the biggest element from the smaller ones or the smallest from the bigger 47 ones and replace the node to be removed with this element. A sample code is 48 illustrated in \ref{app:rmv} 49 50 Adding a new element to the tree is the most important operation. While 51 inserting the element the number of comparisons performed is counted. The 52 simple rule, when this number reaches 32 the tree must be rebalanced, is 53 very efficient. 54 55 Balancing the tree is accomplished in $n$ steps. However, the frequency at 56 which the tree needs to be balanced again is a function of $\frac{1}{2^n}$. 57 As a result adding a new element is in the great majority of the cases done 58 in less than 32 steps; but rebalancing is more rare as the number of nodes 59 increases 60 \footnote{A wosre-case behavior were a lot of elements have to be inserted 61 in an empty tree and they arrive sorted is solved in \ref{init}}. 62 63 \subsection{Balancing the tree} 64 65 The binary tree can be entirely reconstructed in $n$ steps using a simple 66 algorithm. First, an array of $n$ pointers is allocated an all the elements 67 of the tree are placed in this array in sorted order. Let us remember than an 68 always-balanced tree implementation requires extra memory for the node 69 structures and therefore the extra allocation of the array is yet efficient; 70 furthermore, in modern multithreaded database systems an operation which 71 alters part of the data is given full control of the hardware and the memory 72 usage burst will in this case be safely performed. 73 74 The next step is to reset the root of the tree and start inserting elements 75 in the order at which they will result in a perfectly balanced tree. This is 76 done by first inserting the middle element $N/2$ of the array, then 77 elements $N/4$ and $3 N / 4$, then $N/8$, $3 N / 8$, $5 N / 8 \ldots$ 78 and so on. 79 80 By this procedure $2^k - 1$ where $k$ the integer for which 81 $2^k \leq N < 2^{k + 1}$ nodes can be inserted. For the remaining $N-(2^k -1)$ 82 nodes, if any, various algorithms of $O(n)$ steps can be applied. One way 83 that is used in the sample code is to proceed normally as if there were 84 $2^{k+1}$ nodes and exclude the elements $i$ for which the modulo of the 85 division $i * 2^k / N$ is greater or equal to $N-(2^k -1)$ or zero; in other 86 words excluding those numbers $a$ for which there is a number $b$ so that 87 \begin{equation} 88 \left\lfloor \frac{a N}{2^{k+1}} \right\rfloor = 89 \left\lfloor \frac{b N}{2^k} \right\rfloor 90 \: , \;\; 0 < b < 2^k 91 \end{equation} 92 93 The complexity of the above algorithm will be linear if when inserting nodes 94 in the new tree redundant comparisons are avoided. 95 96 A sample implementation is provided at \ref{code:balance}. 97 98 \section{Characteristics of the tree} 99 100 The main feature of the this tree is that after is has been balanced, the 101 more nodes it contains, the more new elements be distributed in external 102 nodes and thus it will be much more unlikely to become ``unbalanced''. 103 104 \begin{itemize} 105 \item 106 Memory requirements for this tree are minimal. The standard memory needed 107 is as much as any simple binary tree and the same as a double linked 108 list. Memory usage bursts occur when 109 balancing is performed but this is generally a rare and very fast action. 110 Memory due to recursion is not trivial since tree height is limited 111 to 32. 112 \item 113 Locating elements will happen in less than 32 steps. Generally after 114 balancing has been performed locating elements is a matter of $\log{n}$. 115 \item 116 Removing elements is also a matter of less than 32 steps. 117 However, there are no rotations performed and therefore removing an 118 element is possibly performed faster than in an always-balanced tree. 119 \item 120 Adding elements in most of the cases is a matter of less than 32 very 121 fast steps. 122 Balancing is rare and becomes even more rare as the tree grows and it 123 requires $n$ simple steps. 124 125 In long term performance, for a growing tree the number of nodes $n$ 126 divided with the number of steps spent for balancing, tends to be 127 independent of $n$. 128 \end{itemize} 129 130 Tests have shown that the tree will need to balance 2 times for 131 1,000,000 elements arriving in random order, once after 20,000 elements 132 and once again at about 400,000. 133 134 \section{Database initialization} 135 \label{init} 136 137 One case were the suggested tree would prove inefficient is inserting a lot 138 of elements in sorted order in an \emph{empty} tree. 139 If the tree is not empty and has been balanced before then the arriving 140 elements will be distributed on the external nodes. 141 142 This is a case which would occur when initially loading the data from 143 storage devices. The workaround to this problem is similar to the way the 144 tree is balanced. The -sorted- elements will have to be inserted in the tree 145 in the order that they will make a balanced tree, first the middle element, 146 $N/4$, $3*N/4$ and so on. 147 148 Once a perfectly balance tree is constructed there should be no worry for 149 this worse-case occurring unless most of the elements are removed. 150 151 Generally, in cases were a long sequence of elements is to be inserted 152 in the tree, if the tree has few nodes it may be preferable to sort the 153 sequence of elements and then insert the right elements first. 154 This procedure is worth the tradeoff if the arriving elements are almost 155 sorted, and in this case special sorting algorithms can be applied. 156 157 Another solution is to make the elements really random, by using a random 158 tree. In a \emph{random tree}, the comparison of elements is heads or tails. 159 After the sequence of elements has been inserted in a random tree (which 160 will never be unbalanced), they can be read in-order and sent to the dbs 161 tree. 162 163 \section{Conclusions} 164 165 The tree structure with the characteristics that have been presented is 166 efficient for specific database requirements. 167 168 When the database is idle or the specific tree is not used, is consumes the 169 least memory compared to other balanced tree structures. 170 That is the primary reason for the design of this structure. 171 172 When the tree is just accessed for locating an element this is as fast as 173 any balanced tree. 174 175 Theoretically, balanced tree implementations which perform rotations after 176 a node is inserted or removed from the tree are better. In practice, and 177 more specificly in a database environment limiting the height of the tree to 178 a `small' number is more than adequate. 179 Except from memory efficiency, the suggested tree structure may possibly 180 behave faster as well; the worse case scenario will either occur at the 181 initialization of the database or else is can be easilly prevented. 182 183 \onecolumn 184 \section{Sample code} 185 186 A sample implementation in C of the presented tree is provided. 187 For the sample code the simple definition of a node with a pointer to 188 data is used. The function {\tt compare ()} is the application-specific 189 function which compares the data of two nodes. 190 191 \begin{verbatim} 192 #define MAX_HEIGHT 32 193 typedef struct tnode node; 194 struct tnode { 195 char *data; 196 node *less, *more; 197 }; 198 node *root; 199 \end{verbatim} 200 201 \subsection{Removing a node} 202 \label{app:rmv} 203 204 When removing an element which has two children, 205 both the smallest from the bigger elements and the 206 biggest from the smaller elements are located. By convention the deepest one 207 is selected to replace the node to be removed. 208 209 \begin{verbatim} 210 void remove (node *n) 211 { 212 node *np, *nl, *nr, *nlp, *nrp; 213 int isroot; 214 215 if (!(isroot = (np = root) == n)) 216 for (;;) 217 if (compare (np->data, n->data) > 0) 218 if (np->less == n) break; 219 else np = np->less; 220 else 221 if (np->more == n) break; 222 else np = np->more; 223 224 if (!(n->less && n->more)) 225 { 226 if (isroot) root = (n->less) ? n->less : n->more; 227 else 228 if (np->less == n) np->less = (n->less) ? n->less : n->more; 229 else np->more = (n->less) ? n->less : n->more; 230 return; 231 } 232 for (i = 0, nlp = NULL, nl = n->less; nl->more; i++) 233 nlp = nl, nl = nl->more; 234 for (j = 0, nrp = NULL, nr = n->more; nr->less; j++) 235 nrp = nr, nr = nr->less; 236 if (i >= j) 237 { 238 if (isroot) root = nl; 239 else 240 if (np->less == n) np->less = nl; 241 else np->more = nl; 242 if (nlp) 243 { 244 nlp->more = nl->less 245 nl->less = n->less; 246 } 247 nl->more = n->more; 248 } else { 249 if (isroot) root = nr; 250 else 251 if (np->less == n) np->less = nr; 252 else np->more = nr; 253 if (nrp) 254 { 255 nrp->less = nr->more 256 nr->more = n->more; 257 } 258 nr->less = n->less; 259 } 260 } 261 \end{verbatim} 262 263 \subsection{Balancing} 264 \label{code:balance} 265 266 The algorithm for reconstructing a balanced tree. 267 {\tt alloca} is a stack allocation mechanism. 268 269 Special care should be taken that the variables {\tt i, j, D, k} in 270 {\tt balance()} are 64 bit integers. If not then the algorithm will fail for 271 trees with more than about 65000 nodes. 272 273 \begin{verbatim} 274 unsigned int suint; 275 node **snpp; 276 277 void count_nodes (node *n) 278 { 279 ++suint; 280 if (n->less) count_nodes (n->less); 281 if (n->more) count_nodes (n->more); 282 } 283 void tree_to_array (node *n) 284 { 285 if (!n) return; 286 tree_to_array (n->less); 287 *snpp++ = n; 288 tree_to_array (n->more); 289 } 290 void balance () 291 { 292 node **npp; 293 long int i, j, D, k, k2, k3; 294 295 /** Generate a sorted array **/ 296 suint = 0; 297 count_nodes (root); 298 npp = snpp = (node**) alloca (sizeof (node*) * suint); 299 tree_to_array (root); 300 /** Insert the top 2^k -1 nodes **/ 301 root = npp [suint / 2]; 302 for (i = 4; i <= suint + 1; i *= 2) 303 for (j = 2; j < i; j += 4) 304 { 305 k2 = suint * j / i; 306 npp [k2]->less = npp [suint * (j - 1) / i]; 307 npp [k2]->more = npp [suint * (j + 1) / i]; 308 } 309 /** Test whether there are nodes left **/ 310 if ((k = suint + 1 - i / 2)) == 0) 311 { 312 for (i /= 2, j = 1; j < i; j += 2) 313 k2 = suint * j / i, 314 npp [k2]->less = npp [k2]->more = NULL; 315 return; 316 } 317 /** Proceed normally but for specific nodes **/ 318 for (j = 2; j < i; j += 4) 319 { 320 k3 = suint * j / i; 321 D = (k2 = suint * (j - 1) / i) * (i / 2) % suint; 322 if (D >= k || D == 0) /* k2 Excluded */ 323 npp [k3]->less = NULL; 324 else { /* k2 Inserted */ 325 npp [k3]->less = npp [k2]; 326 npp [k2]->less = npp [k2]->more = NULL; 327 } 328 D = (k2 = suint * (j + 1) / i) * (i / 2) % suint; 329 if (D >= k || D == 0) 330 npp [k3]->more = NULL; 331 else { 332 npp [k3]->more = npp [k2]; 333 npp [k2]->less = npp [k2]->more = NULL; 334 } 335 } 336 insert (npp [0]); 337 } 338 \end{verbatim} 339 340 \subsection{Inserting a node} 341 342 The procedure is similar to that of a simple binary search tree. 343 344 \begin{verbatim} 345 int insert (node *n) 346 { 347 node *np; 348 int h; 349 350 n->less = n->more = NULL; 351 if (!(np = root)) 352 { 353 root = n; 354 return 0; 355 } 356 for (h = 0;; h++) 357 if (compare (np->data, n->data) > 0) 358 if (!np->less) { np->less = n; break; } 359 else np = np->less; 360 else 361 if (!np->more) { np->more = n; break; } 362 else np = np->more; 363 if (h == MAX_HEIGHT) 364 { 365 balance (); 366 return 0; 367 } else if (h > MAX_HEIGHT) { 368 // If that ever happens please send me a mail and I'll send 369 // you 10.000$ for having a tree with 4bill nodes 370 // or make MAX_HEIGHT 34 and go to 16bills 371 printf ("Tree full. Reached 2^%i nodes\n", MAX_HEIGHT); 372 return -1; 373 } 374 return h; 375 } 376 \end{verbatim} 377 378 \end{document}