This post covers abit of everything that improved my understanding of computer science and Python.
If you’re like me, someone from a non tradional STEM background (Psychology) and have never taken a formal computer science class then you may not notice huge gaps in your knowledge. This was me before being exposed to data structure and algorithms. Even the most obvious things did not occur to me because much of it has been abstracted in an interpreted language such as Python. This means the lower level implementation is hidden. For example Python’s built in list is written in C where the source code is actually compiled to machine code.
Lets take another simple example. If you want to search for a number in a list in Python, what could you do. Your may be thinking of using Python’s ‘in’ operator.
array = [2, 3, 3, 7, 9]
7 in array While this does provide you with the solution, how does ‘in’ actually work? Python had to somehow look for 7 in the array. These are the abstractions that exist within Python that you do not often see in other languages.
Linear Search
The search algorithm could be implemented with a single for loop also known as linear search. This is because the operations it takes to find the target value is dependent on stepping to the next pointer N number of times until the length of the array has been traversed.
def linearSearch(array, target):
for i in range(len(array)):
if array[i] == target:
return True
return FalseIn Python you actually have access to the data itself. The example below is considered as more Pythonic but both examples do the exact same thing.
def linearSearch(array, target):
for i in array:
if i == target:
return True
return FalseIn linear search you would expect it to take a long time to find a target that presents itself at the end of an extremely long array. This is therefore also known as the naive solution / brute force approach to a problem, as it is often the most obvious but does not perform well with lots of data. But this shows the implementation of how the ‘in’ operator works, resulting in searching for an item in a list taking n time (with n being the length of the array). Searching for an item using ‘in’ is actually constant O(1) for a hashtable also known as a dictionary due to the properties of the data structure.
As we are working with arrays, how do we find a better solution than linear search.
Binary Search
This was quite puzzling to solve because how could you find all the numbers in an array without iterating over all of the data. After some research, I discovered that there were specific algorithms dedicated to solving this, with one of the best known being Binary Search. If you undertake online courses such as Harvard’s famous CS50 then you may be familiar with the alogithm as it is a foundational concept in introductory classes.
One of the properties of a sorted array of integers is that it allows for binary search. If an array is not sorted, you can use Python’s array.sort()method which is also an abstraction of sorting algorithms (you may be noticing a trend here) which is out of the scope of this post i.e. I am not that familiar with them. Anyway, sorted arrays make it easier to search for a value because we can ignore data that is above or below our guess of where the data is located.
For example in the array earlier [2, 3, 3, 7, 9], if we were without any information as to the numbers in the array, we can take a random guess that 7 is in the middle. 3 is discovered. With the knowledge that the array is sorted, we can ascertain that 7 will be located on the right hand side of the middle index as it is greater than 3. This would mean that any number that is below the middle index could be ignored. This is the power of binary search. We keep taking random guesses and updating the span of the array that is within possibility until we find out whether or not the number exists.
Code Implementation
For efficiency we take the middle point as our random guess through the use of a left and right pointer, these pointers will also be the head and tail of the array that is being updated based on whether the target value is above or below the middle point.
def binarySearch(array, target):
left = 0
right = len(array) - 1
while left <= right:
middle = (left + right) // 2
if target == array[middle]:
return True
elif target < array[middle]:
right = middle - 1
elif target > array[middle]:
left = middle + 1
return False Comparisons
While our example does not illustrate the efficiency of Binary Search, a larger array shows Binary Search to be much more efficient, with the number of operations plateauing as n increases.
Comparing y = n and y = log(n) with matplotlib