Friday, March 28, 2014

Random Sort

Below I will give my implementation of random sort that you may find interesting :D

import random

def is_sorted(l):
    length = len(l)
    for i in range(length):
        if i == length-1:
            return True
        elif (l[i] < l[i+1] or l[i] == l[i+1]):
            continue
        else:
            return False
   
def rand(l):
    length = len(l)
    if is_sorted(l):
        return l
    else:
        dic = {key:True for key in l}
        count = 0
        l2 = []
        while True:
            if count == length:
                break
            else:
                ran = random.randint(0,length-1)
                if dic[l[ran]]:
                    l2.append(l[ran-1])
                    count += 1
                    dic[l[ran]] = False            
        return rand(l2)

It is kind of exciting when you think that the best case runtime of this algorithm is better than any other, while the worst case is probably the worst.
         
     

Tuesday, March 25, 2014

Closest Pair of Points

The aim of this algorithm will be finding two points with the smallest distance between them. Notice that we will be considering that the points are on a 2D plane. The algorithm I have developed is a divide and conquer algorithm.

0. Append the vectors to a list
1. Sort the list by the x-axis (presumably with mergesort) )  O(nlogn) only once

1. Divide the list into two from the midpoint according to the x-axis values
2. Recursively divide the list into halves logn times
3. In each division into half, check the relationship between 3 closest pairs in each side

The distance between two points is checked with the use Pythagorean theorem.
The runtime of this algorithm turns out to be O(nlogn) if we use the Master Theorem that was introduced in the previous blog entry.

The obvious bruteforce method takes O(n^2).

Now I will compare the runtime of my algorithm with the bruteforce approach below:
10000 pairs
Bruteforce: 0:00:08.874909
Divide and Conquer: 0:00:00.581441
50000 pairs
Bruteforce: 0:00:09.365906
Divide and Conquer: 0:00:04.014672

You might notice the leap in the runtime of the divide and conquer algorithm. That is caused mostly by the runtime of the mergesort that sorts our list before our actual pair finding algorithm starts working.




Sunday, February 16, 2014

Sorting Algorithms

We have recently learned about sorting algorithms in the class.

Here is a list of them:
-Insertion Sort
-Selection Sort
-Merge Sort
-Quicksort

Insertion and selection sort are pretty straightforward in how they work, but they both take O(n^2) time.
The last two are both divide-and-conquer algorithms that I have introduced in the past month. The main idea about them is dividing the input lists into halves and sorting these smaller parts. Afterwards, these parts are combined. For mergesort, the combination step is done by a helper function.
Both of the functions have the worst case runtime of O(nlogn).

A common method to find the runtime of divide and conquer algorithms is by using the Master Theorem. I will cite the Master Theorem (http://en.wikipedia.org/wiki/Master_theorem) from Wikipedia:

T(n) = a(Tn/b) + f(n) where a > 1 or a = 1, b > 1

  • n is the size of the problem.
  • a is the number of subproblems in the recursion.
  • n/b is the size of each subproblem. (Here it is assumed that all subproblems are essentially the same size.)
  • f (n) is the cost of the work done outside the recursive calls, which includes the cost of dividing the problem and the cost of merging the solutions to the subproblems.

1)If k < log_b{a}, then T(n) = O(nlog_b{a}).
2)If k = log_b{a}, then T(n) = O(nklog n).
3)f k > log_b{a}, then T(n) = O(nk).
Where k is the exponential of the asymptotic runtime of f(n)
Source: http://www.cs.toronto.edu/~liudavid/csc236/resources/notes.pdf

Now I will apply the Master Theorem on merge sort.
Merge sort:
- Divide the lists into halves
- Sort each of them seperately
- Merge the halves together

Define a function T(n) that represents the runtime of merge sort based on a list of size n. Then we can recursively define T(n) by using the given information above.

T(n) = 2T(n/2) + cn        (cn is a result of merging that costs linear time)

Now, let's compare this to the special form introduced above.
a = 2, b = 2, k = 1
Then we know that k = log_b{a}
Therefore we will be using 2)
Then T(n) = O(nlogn)

In my following blog post I will be talking more about efficiency and a sorting algorithm I have worked on with my friend.

Binary Trees

In the previous week we have learned about binary trees. A binary tree is, not surprisingly, a specific kind of abstract data type called trees. Below I will give a recursive definition of a binary tree:

- A binary tree consists of a root.
- A root can be empty or it can have two parents as a child.
- A parent can either be empty or it can have two parents as a child.

So, a tree will have a root, which may or may not have a left and a right child. Moreover, those child nodes may or may not be parents. We also learned three ways of traversing a tree. Which are preorder, postorder and inorder traversal.
Here is how each of them traverse recursively:

Preorder

  1. root
  2. left
  3. right
Postorder
  1. left
  2. right
  3. root
Inorder
  1. left
  2. root
  3. right

-----------more will be added soon-----------

Few Divide and Conquer Examples

Hello! Today I will be giving two examples of the divide and conquer algorithm paradigm. The first one being an everyday example that we all know, and the second one being a relatively harder problem that could be made really easy with divide and conquer.

I will start with the everyday example. Think about the last time you used the index section of a book to find a keyword or the time when you looked through your phone's contact list to find a specific person. In both of the cases the words were alphabetically sorted.

A brute force approach in this case would be to start from the beginning of the index or the contact list and then proceed reading every single word until we find the specific one we were looking for. However, since we know how the list of the words are sorted we actually apply divide and conquer algorithms ourselves!
Think of it this way: instead of searching for the whole words you decide to quickly go through each words based on the first two or three letters. So now, our problem is divided into a sub-problem. The way we solve this problem is to skip pages until the first letter meets, and then skip some more until the second, and then the third ... whichever letter we want to stop at matches. This was a recursive solution. We started from the left letter of the words and recursively solved for each one of them. After we match adequate number of letter we will now have the result of our solutions to divided problems: the combination of the matches, solutions, which is equal to the words we were looking for in the beginning.
If you think of it this way, we inadvertently solve many problems in our everyday life with several different algorithm paradigms! Now we will try to solve a harder problem.


Let's say you have a list called A. Consider all the possible list slices that could be taken from A and find the list slice with the maximum sum of elements. Notice that the sum of an empty list is 0 and sum of a list with one element is that element itself. Also notice that the sum of a list of positive numbers is the largest slice, so the problem only gets interesting when we have negative numbers inside our list.

Now let's revise our steps for divide and conquer:
1- Divide problem into sub-problems
2- Solve sub-problems recursively
3- Combine solutions from 2nd step to get the final answer

We should now think about the recursive idea. It is always easier to deal with a half list than a full list. So let's assume that we divided a list into half, and then found the maximum sum on the left side and then the maximum sum on the right side. If we recursively continue this thought process, we will eventually reach many separate list slices of length = 1. We already know that the maximum sum of a list of length 1 is equal to the only element in the list.
Let's try to trace this recursive idea. Assume that we have a list of size 2. Then the list will be separated into two lists of 1 element. Then our main list's maximum will be equal to the maximum of either of those lists. Now let us assume that we have a list of size 4. Then we will have two lists of size 2. We do not need to trace more since we already did it in the previous case. Our main lists maximum will be the maximum of the either length 2 lists, which will have a maximum of one of the 4 1 element lists.
Even though the recursive idea is correct, we are missing something here. What if the maximum of a list includes few elements from the left side and the right side of the list? This adds a third sum to check for every sub-list. Now we need to think about the best way to deal with this third sum. We already know that there are two maximum sums on either side of the list. Let us call them max left and max right. And let us call the third one max middle. We know for a fact that max middle is not going to include any number that comes before the sub-list that makes up the max left in our original list, and we also know that there won't be any numbers in max middle which come after the max right. Why? Well, we know that any number that comes before the max left or comes after the max right is not beneficial to the number sum, hence they are not included in those sums.
Now we know a little more about the interval we need to be looking for the max middle. It is any numbers between the sub-list that makes max left and the sub-list that makes up the max right. We can go further on and say that max middle has two components: max middle left and max middle right. We can run a simple for loop in each one of our recursive calls to calculate the max middle left and max middle right, and then add them up to find the max middle. Since we are constantly mentioning positions in a list, it would probably be a good idea to carry index information with each recursive call.
Now I will type down the general idea:
We will take a list of size n, and we will divide it into two lists of size n/2. In each recursive call we will divide the lists into half to calculate the max left, right and middle associated with them. Each recursive call will return the maximum of max left, max right and max left, because as we discussed earlier a maximum subsequence of a sub-list will come in handy when we are looking for the maximum subsequence of the main list.

I think I have discussed about my algorithm enough. So I will share my python code to solve the problem. Where A is our input list.
def MSS(first, last, A):
    """
    Precondition: A is a list
    Postcondition: max subsequence sum will be returned
    first is the starting point from list A
    while last is the ending point
    """
    if len(A[first:last+1]) == 0:
        return 0
    if len(A[first:last+1]) == 1:
        return A[first]
    else:
        middle = (first + last) // 2
        last_sum_left = A[middle]
        temp_sum = 0
        for i in range(middle, first-1, -1):
            temp_sum += A[i]
            if temp_sum > last_sum_left:
                last_sum_left = temp_sum
        last_sum_right = A[middle+1]
        temp_sum = 0
        for i in range(middle+1, last):
            temp_sum += A[i]
            if temp_sum > last_sum_right:
                last_sum_right = temp_sum
        max_middle = last_sum_left + last_sum_right
        return max((MSS(first, middle, A), max_middle, MSS(middle+1, last,                               A)))



Thursday, February 6, 2014

Different ways of recursion - divide and conquer

As Danny taught us in the class, the way to understand recursion is similar to the principal of induction in mathematics. The way to trace a recursive code goes through tracing easier cases and then substituting their results into our case.

The principal of induction is used to prove various properties in mathematics. There are two steps in the proof: base case and the induction step. In base case we show that the property holds for a small natural number and in the induction step we show why the property holding for a natural number n implies that it will also hold for a natural number n+1. And that is also the key point with the recursion.

It is impossible to trace the recursive code with some depth, since human cannot think as systematically as computers. So solving a problem on an easier case and seeing the connection between a higher case and our case is enough to trace the code.

Now that I have talked about the recursion more, I can share one of the algorithm paradigms based on recursion:

Divide and conquer algorithm

Divide and conquer algorithms are mostly used with multiple recursive steps to solve a difficult problem. As its name suggests, a hard problem is divided into smaller sub-problems which can be solved and then combined to reach the answer of the harder problem. Here are the steps to a generic divide and conquer algorithm:

1- Divide problem into sub-problems
2- Solve sub-problems recursively
3- Combine solutions from 2nd step to get the final answer

I will give few divide and conquer algorithm examples in my following post!

Recursion

Recursion is one of the most widely used tools in the field of computer science. Recursion literally stands for repetition of an application. Sometimes it is much easier to define a function recursively instead of a closed-form, even though it may take a little bit of practice to get the recursion.

Recursive functions are defined by base cases and recursive cases. Base cases simply tell the function when to stop and recursive cases indicate how to do the repetition. One of the recursive functions that we are all used to is the factorial. I will define the factorial function below:

Factorial(x):
    if x = 0:
        then 1
    if x = 1:
        then 1
    else:
        then factorial(x-1) * x

Another use of recursion can be seen in fibonacci numbers. The definition of fibonacci numbers includes two base cases and one recursive step just like the factorial. Fibonacci numbers are defined as the following: the first fibonacci number is equal to 1, the second fibonacci number is also equal to 1 and any other fibonacci number is equal to the sum of the two previous fibonacci numbers.

In the following blogs I will be talking about different strategies/paradigms in which the recursion is the main agent. I will start with the algorithmic paradigm called divide and conquer method.