Red–black tree

Red–black tree
Red–black tree
Type	Tree
Invented	1978
Invented by	Leonidas J. Guibas and Robert Sedgewick
Complexities in big O notation
Space
Space complexity
Space
Time complexity
Function
Search
Insert
Delete

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.^[1]

When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently.

The (re-)balancing is not perfect, but guarantees searching in $O(\log n)$ time, where $n$ is the number of entries in the tree. The insert and delete operations, along with tree rearrangement and recoloring, also execute in $O(\log n)$ time.^[2]^[3]

Tracking the color of each node requires only one bit of information per node because there are only two colors. The tree does not contain any other data specific to it being a red–black tree, so its memory footprint is almost identical to that of a classic (uncolored) binary search tree. In some cases, the added bit of information can be stored at no added memory cost.

History

In 1972, Rudolf Bayer^[4] invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or even 2–3 trees.^[5]

In a 1978 paper, "A Dichromatic Framework for Balanced Trees",^[6] Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree.^[7] The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC.^[8] Another response from Guibas states that it was because of the red and black pens available to them to draw the trees.^[9]

In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations.^[10]

In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function needed to take care of only 4 unbalanced cases and one default balanced case.^[11]

The original algorithm used 8 unbalanced cases, but Cormen et al. (2001) reduced that to 6 unbalanced cases.^[1] Sedgewick showed that the insert operation can be implemented in just 46 lines of Java code.^[12]^[13] In 2008, Sedgewick proposed the left-leaning red–black tree, leveraging Andersson’s idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.^[14]^[15]

Terminology

Example of a red–black tree

Figure 1: ... with explicit NIL leaves

Figure 2: ... with implicit left and right docking points

A red–black tree is a special type of binary search tree, used in computer science to organize pieces of comparable data, such as text fragments or numbers (as e.g. the numbers in figures 1 and 2). The nodes carrying keys and/or data are frequently called "internal nodes", but to make this very specific they are also called non-NIL nodes in this article.

The leaf nodes of red–black trees ( NIL in figure 1) do not contain keys or data. These "leaves" need not be explicit individuals in computer memory: a NULL pointer can—as in all binary tree data structures— encode the fact that there is no child node at this position in the (parent) node. Nevertheless, by their position in the tree, these objects are in relation to other nodes that is relevant to the RB-structure, it may have parent, sibling (i.e., the other child of the parent), uncle, even nephew node; and may be child—but never parent—of another node. It is not really necessary to attribute a "color" to these end-of-path objects, because the condition "is NIL or BLACK" is implied by the condition "is NIL" (see also this remark).

Figure 2 shows the conceptually same red–black tree without these NIL leaves. To arrive at the same notion of a path, one must notice that e.g., 3 paths run through the node 1, namely a path through 1_left plus 2 added paths through 1_right, namely the paths through 6_left and 6_right. This way, these ends of the paths are also docking points for new nodes to be inserted, fully equivalent to the NIL leaves of figure 1.

Instead, to save a marginal amount of execution time, these (possibly many) NIL leaves may be implemented as pointers to one unique (and black) sentinel node (instead of pointers of value NULL).

As a conclusion, the fact that a child does not exist (is not a true node, does not contain data) can in all occurrences be specified by the very same NULL pointer or as the very same pointer to a sentinel node. Throughout this article, either choice is called NIL node and has the constant value NIL.

The black depth of a node is defined as the number of black nodes from the root to that node (i.e. the number of black ancestors). The black height of a red–black tree is the number of black nodes in any path from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the black depth of any leaf node).^[16]^{: 154–165} The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a NIL node shall be set to 0, because its subtree is empty as suggested by figure 2, and its tree height is also 0.

Properties

In addition to the requirements imposed on a binary search tree the following must be satisfied by a red–black tree:^[17]

Every node is either red or black.
All NIL nodes (figure 1) are considered black.
A red node does not have a red child.
Every path from a given node to any of its descendant NIL nodes goes through the same number of black nodes.
(Conclusion) If a node N has exactly one child, it must be a red child, because if it were black, its NIL descendants would sit at a different black depth than N's NIL child, violating requirement 4.

Some authors, e.g. Cormen & al.,^[17] claim "the root is black" as fifth requirement; but not Mehlhorn & Sanders^[16] or Sedgewick & Wayne.^[15]^{: 432–447} Since the root can always be changed from red to black, this rule has little effect on analysis. This article also omits it, because it slightly disturbs the recursive algorithms and proofs.

As an example, every perfect binary tree that consists only of black nodes is a red–black tree.

The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast, the modifying operations insert and delete easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation.

The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height $h$ of the tree, this upper bound on the height allows red–black trees to be efficient in the worst case, namely logarithmic in the number $n$ of entries, i.e. $h\in O(\log n)$ , which is not the case for ordinary binary search trees. For a mathematical proof see section Proof of bounds.

Red–black trees, like all binary search trees, allow quite efficient sequential access (e.g. in-order traversal, that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in $O(\log n)$ search time.

Analogy to 2–3–4 trees

Red–black trees are similar in structure to 2–3–4 trees, which are B-trees of order 4.^[18] In 2–3–4 trees, each node can contain between 1 and 3 values and have between 2 and 4 children. These 2–3–4 nodes correspond to black node – red children groups in red-black trees, as shown in figure 3. It is not a 1-to-1 correspondence, because 3-nodes have two equivalent representations: the red child may lie either to the left or right. The left-leaning red-black tree variant makes this relationship exactly 1-to-1, by only allowing the left child representation. Since every 2–3–4 node has a corresponding black node, invariant 4 of red-black trees is equivalent to saying that the leaves of a 2–3–4 tree all lie at the same level.

Despite structural similarities, operations on red–black trees are more economical than B-trees. B-trees require management of vectors of variable length, whereas red-black trees are simply binary trees.^[19]

Applications and related data structures

Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees. For example, many data structures used in computational geometry are based on red–black trees, and the Completely Fair Scheduler and epoll system call of the Linux kernel use red–black trees.^[20]^[21] The AVL tree is another structure supporting $O(\log n)$ search, insertion, and removal. AVL trees can be colored red–black, and thus are a subset of red-black trees. The worst-case height of AVL is 0.720 times the worst-case height of red-black trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910.^[22] The performance of WAVL trees lie in between AVL trees and red-black trees.^{[citation needed]}

Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. The persistent version of red–black trees requires $O(\log n)$ space for each insertion or deletion, in addition to time.

For every 2–3–4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2–3–4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2–3–4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2–3–4 trees just before red–black trees, even though 2–3–4 trees are not often used in practice.

In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree^[23] by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–black trees can be made isometric to either 2–3 trees,^[24] or 2–3–4 trees,^[23] for any sequence of operations. The 2–3–4 tree isometry was described in 1978 by Sedgewick.^[6] With 2–3–4 trees, the isometry is resolved by a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and moves to the parent node.

The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–black trees as part of its data structure.^[25]

As of Java 8, the HashMap has been modified such that instead of using a LinkedList to store different elements with colliding hashcodes, a red–black tree is used. This results in the improvement of time complexity of searching such an element from $O(m)$ to $O(\log m)$ where $m$ is the number of elements with colliding hashcodes.^[26]

Operations

The read-only operations, such as search or tree traversal, on a red–black tree require no modification from those used for binary search trees, because every red–black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red–black tree, the restoration of which is called rebalancing so that red–black trees become self-balancing. It requires in the worst case a small number, $O(\log n)$ in Big O notation, where $n$ is the number of objects in the tree, on average or amortized $O(1)$ , a constant number,^[27]^: 310 ^[16]^: 158 of color changes (which are very quick in practice); and no more than three tree rotations^[28] (two for insertion).

If the example implementation below is not suitable, other implementations with explanations may be found in Ben Pfaff’s^[29] annotated C library GNU libavl (v2.0.3 as of June 2019).

The details of the insert and removal operations will be demonstrated with example C++ code, which uses the type definitions, macros below, and the helper function for rotations:

// Basic type definitions:

enum color_t { BLACK, RED };

struct RBnode {     // node of red–black tree
  RBnode* parent;   // == NIL if root of the tree
  RBnode* child; // == NIL if child is empty
    // The index is:
    //   LEFT  := 0, if (key < parent->key)
    //   RIGHT := 1, if (key > parent->key)
  enum color_t color;
  int key;
};

#define NIL   NULL // null pointer  or  pointer to sentinel node
#define LEFT  0
#define RIGHT 1
#define left  child
#define right child

struct RBtree { // red–black tree
  RBnode* root; // == NIL if tree is empty
};

// Get the child direction (∈ { LEFT, RIGHT })
//   of the non-root non-NIL  RBnode* N:
#define childDir(N) ( N == (N->parent)->right ? RIGHT : LEFT )
Zdroj:https://en.wikipedia.org?pojem=Red–black_tree
Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]