Skip to main content

Command Palette

Search for a command to run...

Day 1: Big-O Analysis and Essential Python Collections

Updated
7 min read
Day 1: Big-O Analysis and Essential Python Collections
S

Developer based in India. Passionate learner and blogger. All blogs are basically Notes of Tech Learning Journey.

Welcome to this blog collection where we cover the DSA with Python. This is a scheduled 45 day plan, keeping in mind that we might lose the hands-on skill of programming as we may tend to over-rely on AI for our code related tasks.

As an AI Engineer myself, I decided to restart my daily learnings around the topic of DSA keeping python in the center stage. So lets begin.

Before solving any DSA problem, it is essential to have a basic understanding of two things:

  1. How efficient your code is (Big-O Analysis)

  2. Which Python data structure to use (Collections)

Most interview failures occur not because the candidate cannot solve the problem, but because they choose the wrong data structure.

For example

# Slow
if x in my_list:

# Fast
if x in my_set:

Both produce the same result, but one may take thousands of times longer on large datasets.


Why should we even care?

In AI systems, you constantly work with:

  • Vocabulary lookups

  • Feature mappings

  • Cache systems

  • Retrieval pipelines

  • Token IDs

  • Graph relationships

These all rely heavily on:

  • Hash tables (dict)

  • Sets (set)

  • Queues (deque)

  • Priority queues (heapq)

Understanding them will improve both interview performance and real-world engineering.


What is Big O?

Big-O measures how an algorithm grows as input size increases.

Suppose if you declare a variable as n = 10, and some time later, you declare as n = 100000. In this case, will your program become:

  • 10x slower?

  • 100x slower?

  • 1,000,000x slower?

Big-O helps answer that.

Complexity Name Good?
O(1) Constant Excellent
O(log n) Logarithmic Excellent
O(n) Linear Good
O(n log n) Linearithmic Good
O(n²) Quadratic Usually Bad
O(2ⁿ) Exponential Very Bad

Lets see a few examples of various complexities.

Example 1: O(1)

arr = [10, 20, 30]

print(arr[1])

Python immediately knows where index 1 is. No searching needed. So as a result. the time complexity stands O(1).

Example 2: O(n)

arr = [3,5,2,9,1]

for x in arr:
    print(x)

In this case, it is required to visit each and every element in the array for printing purpose. So it depends on number of elements in the array. So here, time complexity is O(n).

Example 3: O(n²)

for i in range(n):
    for j in range(n):
        pass

For every element, we visit every other element. It is a very time consuming process though.


Essential Python Collections

Python interviews heavily rely on four structures:

dict
set
deque
heapq

We will see all these Python collections one by one.

Dictionary : dict

It usually stores key value pairs.

student = {
    "name": "Alice",
    "age": 22
}

# To access
student["name"]

# Output 
# Alice

# Time Complexity : O(1)

But what if dictionaries aren't even there?

students = [
    ("Alice", 22),
    ("Bob", 25)
]

# You will need to search the student manually
# Time complexity : O(n)

# Using Dictionary
students["Alice"]

# Then time complexity becomes : O(1)

Example:

student_scores = {
    "Alice": 95,
    "Bob": 88,
    "Charlie": 91
}

print(student_scores["Bob"])

# Output: 88

Set

Set stores unique values.

numbers = {1,2,3}

Suppose, 1,000,000 numbers exist and we need to know about 999999.

We will consider 2 options here.

# Using list
999999 in numbers_list   # O(n) 

# Using set
999999 in numbers_set    # O(1)

It drastically reduces the time complexity.

visited = {1, 2, 3}

print(2 in visited)   # True
print(10 in visited)  # False

Deque

It basically means, double ended queue, in which we can insert / remove from both ends efficiently.

Before using deque, you will need to import it.

from collections import deque

How is it different then?

Lets say we have a list and we have to pop an element, lets say arr.pop(0), this actually shifts every thing after popping. This is not the case with deque.

from collections import deque

queue = deque()

queue.append(10)  # this pops out 
queue.append(20)
queue.append(30)

print(queue.popleft())  # 10

Heap / Priority Queue

Heap automatically keeps smallest element on top. These are typically useful for:

  • Top K problems

  • Scheduling

  • AI search algorithms

  • Beam Search

Before running, you might need to run this:

import heapq

An example:

import heapq

nums = [7, 1, 5, 3]

heapq.heapify(nums)

print(heapq.heappop(nums))

# Output : 1

This is just a refresher, now lets jump to the real problems. For the first day, let us start with a simple problem.

Question : Given a list of integers, determine whether any number appears more than once.

Example : [1 , 2 , 3 , 4 , 5 , 6 , 7 ] => False

This means we need to deal with unique numbers here, so set seems to be the good option.

  • If number already seen → duplicate found.

  • Else add it to set.

# Function checks whether duplicates exist
def contains_duplicate(nums):

    # Create an empty set
    # This will store numbers we have already seen
    seen = set()

    # Visit every number in the list one by one
    for num in nums:

        # If num already exists in the set,
        # we have found a duplicate
        if num in seen:
            return True

        # Otherwise remember this number
        # for future comparisons
        seen.add(num)

    # If loop finishes,
    # no duplicate was found
    return False


# Test Case 1
numbers1 = [1, 2, 3, 4]

# Test Case 2
numbers2 = [1, 2, 3, 1]

# Print results
print("Test Case 1:", contains_duplicate(numbers1))
print("Test Case 2:", contains_duplicate(numbers2))

# Output
# Test Case 1: False
# Test Case 2: True

A simple function which utilizes sets to determine if the number is repeated or not. Loop runs "n" times, while for each set lookup, it takes O(1). So the total time complexity is O(n).

Space Complexity is also O(n); as set may hold all items.


Few questions to be ready for:

Q1) Why sets in place of lists?

Set lookup : O(1), list lookup : O(n)

Q2) Difference between dict and set?

dict contains key value pairs, set contains only value.

Q3) Why is dictionary lookup O(1)?

Because Python uses a hash table internally.

Q4) When would you use deque?

When frequent insertions/removals occur at the beginning of a sequence.

Q5) When would you use heap?

When needed these kinds of elements : maximum, minimum, top-k, priority


Cheatsheet of the day

Big-O

O(1)      Best
O(log n)  Great
O(n)      Good
O(n log n) Good
O(n²)     Expensive

Dictionary

d = {}

d["a"] = 1

d["a"]

"a" in d

Average:

O(1)

Set

s = set()

s.add(10)

10 in s

Average:

O(1)

Deque

from collections import deque

q = deque()

q.append(x)

q.popleft()

Both:

O(1)

Heap

import heapq

heapq.heappush(heap, x)

heapq.heappop(heap)

Complexity:

O(log n)

Conclusion:

Today you learned:

  • What Big-O means

  • Why efficiency matters

  • Dictionaries (dict)

  • Sets (set)

  • Deques (deque)

  • Heaps (heapq)

  • How to solve a duplicate-detection problem in O(n) using a set

These four Python structures will appear throughout the next 44 days. Mastering them now will make every later topic easier.

Homework

  1. Solve:

    • Contains Duplicate

    • Valid Anagram

    • Two Sum (attempt, don't worry if you can't finish)

  2. Memorize:

    • dict

    • set

    • deque

    • heapq

  3. Be able to answer:

    • Why is set lookup O(1)?

    • When should I use a heap?

    • Why is deque better than list.pop(0)?

Tomorrow (Day 2) we'll begin Arrays & Hash Maps, including the famous Two Sum problem and the hash-map pattern that powers a large fraction of coding interviews.