Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
Advanced Python
Logo
Advanced Python
  • Revisit Python Basics
  • Building a Python Package
  • Set up a GitHub Repository
  • Create a Folder Structure
  • Writing a new Function
  • Function parameters
  • Variable Scope
  • Generators
  • Map-Filter-Reduce
  • Decorators
  • Collections
  • Comprehensions
  • Enumerations
  • Demographics
  • Refactoring
  • Structure of a Python script
  • Command-line Arguments
  • Modules
  • Packages
  • Namespaces
  • Classes
  • Inheritance
  • Object Composition
  • Class Diagrams
  • Operator Overloading
  • Properties, classmethods and staticmethods
  • Abstract Base Classes
  • Decorator Classes
  • Metaclasses
  • OOP Principles
  • Static Code Checks
  • Test Automation
  • Continuous Integration
  • Debugging
  • Interactive Python Debugger
  • Handling Exceptions
  • Warnings
  • Logging
  • Measuring Performance
  • Concurrency
  • The Dungeon Explorer Game
  • The Prototype
  • Code Reviews
  • Factorials
  • Sorting Algorithms
  • Dice
  • Tennis
  • Memory
  • Magic Square
  • Josephus’ Problem
  • Binary Search
  • Tree Traversal
  • Graph Traversal
  • Backpack Problem
  • Chained List
  • Traveling Salesman
  • Blockchain
  • Decorators with a Metaclass
  • Project: Implement a Game
  • Hall of Fame
    • Breakout
    • Tetris
  • Links
Back to top
View this page
Edit this page

Binary Search¶

🎯 Measure the performance of an algorithm.

A search problem¶

Suppose you have a list of 1 million numbers:

from random import randint, seed

seed(42)
N = 1_000_000
ids = [randint(1_000_000, 9_000_000) for i in range(N)]

We want to know at which position the number 8997173 occurs – or whether it occurs at all.

Brute-force solution¶

Of course, you could loop through all the numbers:

def search(query, ids):
    """brute-force search"""
    for i in ids:
        if i == query:
            return i
    return -1

search(8997173, ids)

Because of the for loop, the time cost of this function grows linearly. If you search twice as much data, it will take twice as long. The Big-O-Notation for this would be O(N).

Recursive solution¶

An alternative approach is binary search, one of the most fundamental recursive algorithms:

def binary_search(query, ids, start, stop):
    """recursive binary search"""
    if ids[start] == query:
        return start
    elif stop - start <= 1:
        return -1

    split = start + (stop - start) // 2
    if ids[split] > query:
        return binary_search(query, ids, start, split)
    else:
        return binary_search(query, ids, split, stop)

ids.sort()
binary_search(8997173, ids, 0, len(ids))

The function requires the data to be sorted. The search itself runs in O(log N) or logarithmic time. If you search twice as much data, only one extra recursion is necessary.

Visual Explanation¶

BINÄRY SEARCH

source: idea-instructions.com, CC-BY-NC-SA 4.0

Challenge¶

Measure the time it takes both algorithms to run on 1 million numbers. In IPython/Jupyter, you can use the magic function %time:

In [2]: %time search(8997173, ids)

Compare the time required by the brute-force and binary implementations. Then increase the size of the data by factor 10 and repeat the measurement.

Questions¶

  • Which of the searches is faster?

  • How much time does the sorting take?

  • When does binary search pay off?

  • For what data would the brute-force algorithm be faster?

Next
Tree Traversal
Previous
Josephus’ Problem
Copyright © 2024, Kristian Rother
Made with Sphinx and @pradyunsg's Furo
On this page
  • Binary Search
    • A search problem
    • Brute-force solution
    • Recursive solution
    • Visual Explanation
    • Challenge
      • Questions