Day 1: Big-O Analysis and Essential Python Collections

Welcome to this blog collection where we cover the DSA with Python. This is a scheduled 45 day plan, keeping in mind that we might lose the hands-on skill of programming as we may tend to over-rely on AI for our code related tasks.

As an AI Engineer myself, I decided to restart my daily learnings around the topic of DSA keeping python in the center stage. So lets begin.

Before solving any DSA problem, it is essential to have a basic understanding of two things:

How efficient your code is (Big-O Analysis)
Which Python data structure to use (Collections)

Most interview failures occur not because the candidate cannot solve the problem, but because they choose the wrong data structure.

For example

# Slow
if x in my_list:

# Fast
if x in my_set:

Both produce the same result, but one may take thousands of times longer on large datasets.

Why should we even care?

In AI systems, you constantly work with:

Vocabulary lookups
Feature mappings
Cache systems
Retrieval pipelines
Token IDs
Graph relationships

These all rely heavily on:

Hash tables (dict)
Sets (set)
Queues (deque)
Priority queues (heapq)

Understanding them will improve both interview performance and real-world engineering.

What is Big O?

Big-O measures how an algorithm grows as input size increases.

Suppose if you declare a variable as n = 10, and some time later, you declare as n = 100000. In this case, will your program become:

10x slower?
100x slower?
1,000,000x slower?

Big-O helps answer that.

Complexity	Name	Good?
O(1)	Constant	Excellent
O(log n)	Logarithmic	Excellent
O(n)	Linear	Good
O(n log n)	Linearithmic	Good
O(n²)	Quadratic	Usually Bad
O(2ⁿ)	Exponential	Very Bad

Lets see a few examples of various complexities.

Example 1: O(1)

arr = [10, 20, 30]

print(arr[1])

Python immediately knows where index 1 is. No searching needed. So as a result. the time complexity stands O(1).

Example 2: O(n)

arr = [3,5,2,9,1]

for x in arr:
    print(x)

In this case, it is required to visit each and every element in the array for printing purpose. So it depends on number of elements in the array. So here, time complexity is O(n).

Example 3: O(n²)

for i in range(n):
    for j in range(n):
        pass

For every element, we visit every other element. It is a very time consuming process though.

Essential Python Collections

Python interviews heavily rely on four structures:

dict
set
deque
heapq

We will see all these Python collections one by one.

Dictionary : dict

It usually stores key value pairs.

student = {
    "name": "Alice",
    "age": 22
}

# To access
student["name"]

# Output 
# Alice

# Time Complexity : O(1)

But what if dictionaries aren't even there?

students = [
    ("Alice", 22),
    ("Bob", 25)
]

# You will need to search the student manually
# Time complexity : O(n)

# Using Dictionary
students["Alice"]

# Then time complexity becomes : O(1)

Example:

student_scores = {
    "Alice": 95,
    "Bob": 88,
    "Charlie": 91
}

print(student_scores["Bob"])

# Output: 88

Set

Set stores unique values.

numbers = {1,2,3}

Suppose, 1,000,000 numbers exist and we need to know about 999999.

We will consider 2 options here.

# Using list
999999 in numbers_list   # O(n) 

# Using set
999999 in numbers_set    # O(1)

It drastically reduces the time complexity.

visited = {1, 2, 3}

print(2 in visited)   # True
print(10 in visited)  # False

Deque

It basically means, double ended queue, in which we can insert / remove from both ends efficiently.

Before using deque, you will need to import it.

from collections import deque

How is it different then?

Lets say we have a list and we have to pop an element, lets say arr.pop(0), this actually shifts every thing after popping. This is not the case with deque.

from collections import deque

queue = deque()

queue.append(10)  # this pops out 
queue.append(20)
queue.append(30)

print(queue.popleft())  # 10

Heap / Priority Queue

Heap automatically keeps smallest element on top. These are typically useful for:

Top K problems
Scheduling
AI search algorithms
Beam Search

Before running, you might need to run this:

import heapq

An example:

import heapq

nums = [7, 1, 5, 3]

heapq.heapify(nums)

print(heapq.heappop(nums))

# Output : 1

This is just a refresher, now lets jump to the real problems. For the first day, let us start with a simple problem.

Question : Given a list of integers, determine whether any number appears more than once.

Example : [1 , 2 , 3 , 4 , 5 , 6 , 7 ] => False

This means we need to deal with unique numbers here, so set seems to be the good option.

If number already seen → duplicate found.
Else add it to set.

# Function checks whether duplicates exist
def contains_duplicate(nums):

    # Create an empty set
    # This will store numbers we have already seen
    seen = set()

    # Visit every number in the list one by one
    for num in nums:

        # If num already exists in the set,
        # we have found a duplicate
        if num in seen:
            return True

        # Otherwise remember this number
        # for future comparisons
        seen.add(num)

    # If loop finishes,
    # no duplicate was found
    return False


# Test Case 1
numbers1 = [1, 2, 3, 4]

# Test Case 2
numbers2 = [1, 2, 3, 1]

# Print results
print("Test Case 1:", contains_duplicate(numbers1))
print("Test Case 2:", contains_duplicate(numbers2))

# Output
# Test Case 1: False
# Test Case 2: True

A simple function which utilizes sets to determine if the number is repeated or not. Loop runs "n" times, while for each set lookup, it takes O(1). So the total time complexity is O(n).

Space Complexity is also O(n); as set may hold all items.

Few questions to be ready for:

Q1) Why sets in place of lists?

Set lookup : O(1), list lookup : O(n)

Q2) Difference between dict and set?

dict contains key value pairs, set contains only value.

Q3) Why is dictionary lookup O(1)?

Because Python uses a hash table internally.

Q4) When would you use deque?

When frequent insertions/removals occur at the beginning of a sequence.

Q5) When would you use heap?

When needed these kinds of elements : maximum, minimum, top-k, priority

Cheatsheet of the day

Big-O

O(1)      Best
O(log n)  Great
O(n)      Good
O(n log n) Good
O(n²)     Expensive

Dictionary

d = {}

d["a"] = 1

d["a"]

"a" in d

Average:

O(1)

Set

s = set()

s.add(10)

10 in s

Average:

O(1)

Deque

from collections import deque

q = deque()

q.append(x)

q.popleft()

Both:

O(1)

Heap

import heapq

heapq.heappush(heap, x)

heapq.heappop(heap)

Complexity:

O(log n)

Conclusion:

Today you learned:

What Big-O means
Why efficiency matters
Dictionaries (dict)
Sets (set)
Deques (deque)
Heaps (heapq)
How to solve a duplicate-detection problem in O(n) using a set

These four Python structures will appear throughout the next 44 days. Mastering them now will make every later topic easier.

Homework

Solve:
- Contains Duplicate
- Valid Anagram
- Two Sum (attempt, don't worry if you can't finish)
Memorize:
- dict
- set
- deque
- heapq
Be able to answer:
- Why is set lookup O(1)?
- When should I use a heap?
- Why is deque better than list.pop(0)?

Tomorrow (Day 2) we'll begin Arrays & Hash Maps, including the famous Two Sum problem and the hash-map pattern that powers a large fraction of coding interviews.

Command Palette

Why should we even care?

What is Big O?

Essential Python Collections

Cheatsheet of the day

Big-O

Dictionary

Set

Deque

Heap

Homework

Comments