Binary Search Trees: What are they and how they work
Jul 28, 2024
9 min read
Binary Search Trees (BSTs) are a fundamental data structure in computer science. They provide efficient methods for storing, searching, and managing data. In this article, we'll explore what BSTs are, how they work, and how to implement them.
What is a Binary Search Tree?
A Binary Search Tree is a binary tree where each node has at most two children, referred to as the left child and the right child. For each node:
- The left subtree contains only nodes with values less than the node's value.
- The right subtree contains only nodes with values greater than the node's value.
This property makes BSTs useful for operations like searching, insertion, and deletion.
Okay, now that the basics are covered, let's see how to code and work with a BST! I'll use Python for this example, just because it's very popular and looks like pseudo-code, but the concepts are the same for other programming languages.
Code definition
To create a BST data structure, we just need to model the nodes of the graph. Each node will have a value, and pointers to the left and right children, like so:
class Node:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
So to create a tree, we just create the root node with a value:
tree = Node(10) # 10 is the root node
And that's it! Now, let's see how to implement the basic operations of a BST. We don't like anemic models, so we'll implement the insertion, search, and deletion operations as methods of the Node
class. Let's go!
Basic Operations
As I like to learn by example, let's take a look at how to implement a BST in Python.
Insertion
To insert a value into a BST, we start at the root and compare the value to be inserted with the current node's value. If the value is less, we move to the left child; if greater, we move to the right child. We repeat this process until we find an empty spot where the new value can be inserted.
def insert(self, root, key):
# If the root is None, create a new node with the key and return it
if root is None:
return Node(key)
else:
# If the key is already present in the tree, return the root
if root.value == key:
return root
# If the key is greater than the root's value, insert it in the right subtree
if root.value < key:
root.right = self.insert(root.right, key)
# If the key is less than the root's value, insert it in the left subtree
else:
root.left = self.insert(root.left, key)
# Return the root after insertion
return root
Let's see this in action!
tree = Node(10)
tree.insert(tree, 5)
tree.insert(tree, 15)
tree.insert(tree, 3)
tree.insert(tree, 7)
tree.insert(tree, 12)
tree.insert(tree, 18)
Search
To search for a value in a BST, we start at the root and compare the value with the current node's value. If the value matches, we return the node. If the value is less, we search the left subtree; if greater, we search the right subtree. We repeat that algorithm until we find the node or the node or we end in a leaf node.
def search(self, root, key):
# Base case: return the node if it's found or None if it's not
if root is None or root.value == key:
return root
# If the key is greater than the root's value, search in the right subtree
if root.value < key:
return self.search(root.right, key)
# If the key is less than the root's value, search in the left subtree
return self.search(root.left, key)
This will return the node we're looking for, or None
if the node is not found.
node = tree.search(tree, 12)
print(node and node.value) # 12
Deletion
Deleting a node from a BST involves three main cases:
- The node to be deleted is a leaf node.
- The node to be deleted has one child.
- The node to be deleted has two children.
If the node to be deleted is a leaf node, we just need to remove the node from the tree. However, if the node has one or two children, we need to handle the deletion differently. For a node with one child, we replace the node with its child. For a node with two children, we find the inorder successor (the smallest node in the right subtree), replace the node's value with the inorder successor's value, and then delete the inorder successor.
The inorder successor is the smallest node in the right subtree of the node to be deleted (i.e., the leftmost node in the right subtree, or the following node if we order them from lowest to highest).
def delete(self, root, key):
# Base case: if the root is None, return None
if root is None:
return root
# If the key to be deleted is smaller than the root's key,
# then it lies in the left subtree
if key < root.value:
root.left = self.delete(root.left, key)
# If the key to be deleted is greater than the root's key,
# then it lies in the right subtree
elif key > root.value:
root.right = self.delete(root.right, key)
# If the key is the same as the root's key, then this is the node
# to be deleted
else:
# Node with only one child or no child
if root.left is None:
return root.right
elif root.right is None:
return root.left
# Node with two children: Get the inorder successor
# (smallest in the right subtree)
temp = self.minValueNode(root.right)
# Copy the inorder successor's content to this node
root.value = temp.value
# Delete the inorder successor
root.right = self.delete(root.right, temp.value)
return root
def minValueNode(self, node):
# Loop to find the leftmost leaf
current = node
while current.left is not None:
current = current.left
return current
Let's see this in action, this time, with a bigger tree. This is how our BST initially looks like:
Now, let's delete the node with value 5
.
tree = tree.delete(tree, 5)
The resulting tree is a follows:
Caveats
The efficiency of a BST depends on its balance. If the tree is unbalanced, the search, insertion, and deletion operations can take O(n) time, where n is the number of nodes in the tree. Deletions can affect the balance of the tree, but also insert orders. THe resulting tree will differ depending on the order of insertions. For example:
tree = Node(10)
tree.insert(tree, 5)
tree.insert(tree, 15)
tree.insert(tree, 3)
tree.insert(tree, 7)
tree.insert(tree, 12)
tree.insert(tree, 18)
tree.insert(tree, 6)
tree.insert(tree, 8)
tree.insert(tree, 2)
tree.insert(tree, 4)
will produce a tree like this:
But if we change the order of insertion like so:
tree = Node(10)
tree.insert(tree, 18)
tree.insert(tree, 15)
tree.insert(tree, 12)
tree.insert(tree, 8)
tree.insert(tree, 7)
tree.insert(tree, 6)
tree.insert(tree, 5)
tree.insert(tree, 4)
tree.insert(tree, 3)
tree.insert(tree, 2)
We end up with a quite different tree:
Sometimes, it might be needed to balance the tree when there is a lot of unbalanced nodes, like in the last example. A simple way to improve the balance of a tree is the following:
def balanceBST(self, root):
# Helper function to store the in-order traversal of the BST
def storeInOrder(node, inorder_nodes):
if not node:
return
storeInOrder(node.left, inorder_nodes)
inorder_nodes.append(node.value)
storeInOrder(node.right, inorder_nodes)
# Helper function to build a balanced BST from sorted nodes
def buildBalancedBST(nodes, start, end):
if start > end:
return None
mid = (start + end) // 2
node = Node(nodes[mid])
node.left = buildBalancedBST(nodes, start, mid - 1)
node.right = buildBalancedBST(nodes, mid + 1, end)
return node
# Store the in-order traversal of the BST
inorder_nodes = []
storeInOrder(root, inorder_nodes)
# Build and return the balanced BST
return buildBalancedBST(inorder_nodes, 0, len(inorder_nodes) - 1)
Conclusion
Binary Search Trees are a powerful tool for managing ordered data. They provide efficient algorithms for insertion, search, and deletion, making them a fundamental component in computer science. Understanding BSTs and their operations is crucial for any programmer looking to master data structures and algorithms.
All the code from this article can be found on Github.