Problem Solving with Algorithms and Data Structures - Chapter 2 - Analysis

Table Of Contents

This book is amazing, last section of the last chapter blew my mind1.

2.1. Objectives

  • To understand why algorithm analysis is important.
  • To be able to use “Big-O” to describe execution time.
  • To understand the “Big-O” execution time of common operations on Python lists and dictionaries.
  • To understand how the implementation of Python data impacts algorithm analysis.
  • To understand how to benchmark simple Python programs.

2.2. What Is Algorithm Analysis

Write readable code.

perf.py:

import time

def perf_wrapper(method,num):
    start = time.time()
    theSum = method(*num)
    end = time.time()
    return theSum, end-start

def sumOfN(n):
    theSum = 0
    for i in range(1,n+1):
        theSum = theSum + i
    return theSum

def sumOfN_op(n):
    theSum = n * (n + 1) / 2
    return theSum

def sum_perf_check(method, num):
    for i in range(5):
        print("Sum is %d required %10.7f seconds" % perf_wrapper(method,num))

print("Before the algorithm is optimized:")
check_set=[10**n for n in range(2,8)]
for num in check_set:
    sum_perf_check(sumOfN,[num])

print("After the algorithm is optimized:")
for num in check_set:
    sum_perf_check(sumOfN_op,[num])

Output of perf.py:

Before the algorithm is optimized:
Sum is 5050 required  0.0000110 seconds
Sum is 5050 required  0.0000076 seconds
Sum is 5050 required  0.0000069 seconds
Sum is 5050 required  0.0000069 seconds
Sum is 5050 required  0.0000069 seconds
Sum is 500500 required  0.0000796 seconds
Sum is 500500 required  0.0000989 seconds
Sum is 500500 required  0.0000939 seconds
Sum is 500500 required  0.0001440 seconds
Sum is 500500 required  0.0001280 seconds
Sum is 50005000 required  0.0013959 seconds
Sum is 50005000 required  0.0008447 seconds
Sum is 50005000 required  0.0008399 seconds
Sum is 50005000 required  0.0008893 seconds
Sum is 50005000 required  0.0013807 seconds
Sum is 5000050000 required  0.0081816 seconds
Sum is 5000050000 required  0.0075161 seconds
Sum is 5000050000 required  0.0074959 seconds
Sum is 5000050000 required  0.0074937 seconds
Sum is 5000050000 required  0.0074308 seconds
Sum is 500000500000 required  0.0759306 seconds
Sum is 500000500000 required  0.0759892 seconds
Sum is 500000500000 required  0.0759845 seconds
Sum is 500000500000 required  0.0772898 seconds
Sum is 500000500000 required  0.0759218 seconds
Sum is 50000005000000 required  0.7791717 seconds
Sum is 50000005000000 required  0.7657561 seconds
Sum is 50000005000000 required  0.7915492 seconds
Sum is 50000005000000 required  0.7761276 seconds
Sum is 50000005000000 required  0.7596998 seconds
After the algorithm is optimized:
Sum is 5050 required  0.0000050 seconds
Sum is 5050 required  0.0000010 seconds
Sum is 5050 required  0.0000010 seconds
Sum is 5050 required  0.0000007 seconds
Sum is 5050 required  0.0000005 seconds
Sum is 500500 required  0.0000012 seconds
Sum is 500500 required  0.0000007 seconds
Sum is 500500 required  0.0000007 seconds
Sum is 500500 required  0.0000007 seconds
Sum is 500500 required  0.0000005 seconds
Sum is 50005000 required  0.0000007 seconds
Sum is 50005000 required  0.0000005 seconds
Sum is 50005000 required  0.0000007 seconds
Sum is 50005000 required  0.0000005 seconds
Sum is 50005000 required  0.0000005 seconds
Sum is 5000050000 required  0.0000007 seconds
Sum is 5000050000 required  0.0000010 seconds
Sum is 5000050000 required  0.0000007 seconds
Sum is 5000050000 required  0.0000007 seconds
Sum is 5000050000 required  0.0000007 seconds
Sum is 500000500000 required  0.0000005 seconds
Sum is 500000500000 required  0.0000005 seconds
Sum is 500000500000 required  0.0000007 seconds
Sum is 500000500000 required  0.0000007 seconds
Sum is 500000500000 required  0.0000005 seconds
Sum is 50000005000000 required  0.0000005 seconds
Sum is 50000005000000 required  0.0000010 seconds
Sum is 50000005000000 required  0.0000005 seconds
Sum is 50000005000000 required  0.0000005 seconds
Sum is 50000005000000 required  0.0000005 seconds

2.3 Big-O Notation

It's a method to characterize an algorithm’s efficiency in terms of execution time.

Example:

a=5
b=6
c=10
for i in range(n):
   for j in range(n):
      x = i * i
      y = j * j
      z = i * j
for k in range(n):
   w = a*k + 45
   v = b*b
d = 33

T(n) = 3 + 3n2 + 2n + 1 = 3n2 + 2n + 4

The fragment of the code: O(n2)

2.4. An Anagram Detection Example

  • Solution 1: Checking Off O(n2)
  • Solution 2: Sort and Compare O(nlogn) or O(n2) due to sort
  • Solution 3: Brute Force (Never good)
  • Solution 4: Count and Compare O(n)

2.5. Performance of Python Data Structures

In later chapters you will see some possible implementations of both lists and dictionaries and how the performance depends on the implementation.

2.6. Lists

list_perf.py:

from timeit import Timer
def test1():
    l = []
    for i in range(1000):
        l = l + [i]

def test2():
    l = []
    for i in range(1000):
        l.append(i)

def test3():
    l = [i for i in range(1000)]

def test4():
    l = list(range(1000))

t1 = Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
t4 = Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "milliseconds")

Result of list_perf.py:

concat  1.9500927650005906 milliseconds
append  0.11545232799835503 milliseconds
comprehension  0.062062365002930164 milliseconds
list range  0.0495129260016256 milliseconds
OperationBig-O Efficiency
index []O(1)
index assignmentO(1)
appendO(1)
pop()O(1)
pop(i)O(n)
insert(i,item)O(n)
del operatorO(n)
iterationO(n)
contains (in)O(n)
get slice [x:y]O(k)
del sliceO(n)
set sliceO(n+k)
reverseO(n)
concatenateO(k)
sortO(n log n)
multiplyO(nk)

popperf.py:

import timeit
popzero = timeit.Timer("x.pop(0)",
                       "from __main__ import x")
popend = timeit.Timer("x.pop()",
                      "from __main__ import x")

x = list(range(2000000))
print("x.pop(0) execution time: %fs" % popzero.timeit(number=1000))

x = list(range(2000000))
print("x.pop() execution time: %fs" % popend.timeit(number=1000))

Result of popperf.py:

x.pop(0) execution time: 2.256239s
x.pop() execution time: 0.000111s

popperf_2.py:

from timeit import Timer

popzero = Timer("x.pop(0)",
                "from __main__ import x")
popend = Timer("x.pop()",
               "from __main__ import x")
print("pop(0)   pop()")
for i in range(1000000,100000001,1000000):
    x = list(range(i))
    pt = popend.timeit(number=1000)
    x = list(range(i))
    pz = popzero.timeit(number=1000)
    print("%15.5f, %15.5f" %(pz,pt))

Result of popperf_2.py:

pop(0)   pop()
        0.94833,         0.00010
        2.04151,         0.00011
        3.24372,         0.00011
        4.37107,         0.00011
        5.42274,         0.00012
        6.68484,         0.00011
        7.64100,         0.00011
        8.23122,         0.00011
        9.99637,         0.00011
       11.55539,         0.00012
       12.92389,         0.00026
       13.31022,         0.00012
       14.33949,         0.00011
       16.14405,         0.00011
       17.22286,         0.00014
       17.30630,         0.00013
       18.30070,         0.00014
       19.25390,         0.00013
       20.32429,         0.00017
       21.64347,         0.00014
       22.62368,         0.00013
       23.68417,         0.00013
       24.95661,         0.00011
       25.84395,         0.00011
       27.02810,         0.00011
       29.19760,         0.00011
       30.58301,         0.00011
       31.83124,         0.00011
       32.58386,         0.00016
       33.32165,         0.00011
       35.25702,         0.00011
       36.03567,         0.00011
       36.56381,         0.00012
       38.86707,         0.00011
       39.97921,         0.00011
       39.82716,         0.00011
       41.63275,         0.00011
       43.04904,         0.00011
......

2.7. Dictionaries

operationBig-O Efficiency
copyO(n)
get itemO(1)
set itemO(1)
delete itemO(1)
contains (in)O(1)
iterationO(n)

dict_perf.py:

import timeit
import random

for i in range(10000,1000001,20000):
    t = timeit.Timer("random.randrange(%d) in x"%i,
                     "from __main__ import random,x")
    x = list(range(i))
    lst_time = t.timeit(number=1000)
    x = {j:None for j in range(i)}
    d_time = t.timeit(number=1000)
    print("%d,%10.3f,%10.3f" % (i, lst_time, d_time))

Result of dict_perf.py:

10000,     0.119,     0.001
30000,     0.366,     0.002
50000,     0.612,     0.002
70000,     0.914,     0.003
90000,     1.193,     0.003
110000,     1.381,     0.003
130000,     1.708,     0.003
150000,     1.984,     0.002
170000,     2.283,     0.003
190000,     2.660,     0.003
210000,     2.880,     0.002
230000,     2.918,     0.002
250000,     1.782,     0.001
270000,     1.881,     0.001
290000,     2.034,     0.001
310000,     2.122,     0.001
330000,     2.254,     0.001
350000,     2.473,     0.002
370000,     2.502,     0.001
390000,     2.844,     0.002
410000,     3.044,     0.001
430000,     3.268,     0.001
450000,     3.170,     0.002
470000,     3.398,     0.002
490000,     3.607,     0.002
510000,     3.740,     0.001
530000,     3.844,     0.002
550000,     4.131,     0.001
570000,     4.102,     0.002
590000,     4.418,     0.002
610000,     4.413,     0.002
630000,     4.829,     0.002
650000,     4.717,     0.003
670000,     5.037,     0.001
690000,     5.114,     0.002
710000,     5.599,     0.002
730000,     5.634,     0.002
750000,     5.669,     0.002
770000,     5.701,     0.001
790000,     5.745,     0.002
810000,     5.985,     0.002
830000,     6.167,     0.001
850000,     6.496,     0.002
870000,     6.966,     0.002
890000,     6.906,     0.001
910000,     6.333,     0.001
930000,     7.055,     0.001
950000,     6.709,     0.001
970000,     7.176,     0.002
990000,     6.942,     0.001

Dictionary is consitently faster than list.

The only dictionary operations that are not O(1) are those that require iteration.