Algorithinking: August 2011

Thursday, August 4, 2011

Maximum Subsequence

6 interns sat in front of their laptops with enthusiasm flashing on their faces brighter than the laptop light.And why not,it was the first day of their internship & they were eager to start off.Their wait ended after their professor entered & without wasting a single moment,gave them a problem to solve.He called it the maximum sub sequence problem.It sounded trivial enough.

Given a sequence of numbers in a array can you find a continuous sequence of numbers such that they give you the maximum sum?If all the numbers are positive then the max subsequence would be the whole array.problem is when we encounter negative numbers.Completing the problem statement,if all the numbers turn out to be neg then the max sub sequence sum would be taken as zero.

If you are still not clear about the problem.It is equivalent to placing two sticks in the array in such a position that the sum between them is the highest compared to any other positions.

Within no time they had an algorithm ready.The only thing in their mind was to solve it as quickly as possible and never really gave any thought about bettering the algorithm.

Again the brute force came to their rescue.Simply do it for all positions and one with the highest sum would be the answer.Turns out the complexity was quadratic.Well atleast the problem was solved.

But the professor was adamant that the problem could be tacked in linear time.

Yup thats the professor!:)

How can it be said there can be linear time algorithm for this?

The awesome proof follows.Forget about the algorithms for a while and think this way.

Say there is an oracle which gives you the answer for array of numbers given.But there's a problem.It takes only $n-1$ inputs.Oops but we have $n$ numbers in the array.But think for a while,there has to be a way in which it can be used.

I'll just give the oracle the whole array except for the last element and the oracle returns with the answer for the $n-1$ elements.Now we have two numbers in hand.One is the last element which was left out and the other is the max subsequence sum for $n-1$ elements.Can we with these two components infer the actual max sub sequence sum of the whole array?And guess what yes we can!

There are just two scenarios

1.The last element is negative-This is the easiest one.In this whatever the oracle gave becomes the final answer

2.The last element is positive-Two cases arrive here

The max subsequence given by the oracle includes the $n-1$th element-In this case since the last no is positive we include the last $n$th element too to the max subsequence and arrive at the final answer
The max subsequence does not include the last $n-1$ element.This is the most tricky one.In this we never know,the max subsequence might be vry different and the one with the $n$th element.So what we can do is just find the max sub sequence including the last element($n$th).In simpler words what can be the maximum sum you can obtain fixing one position at the end.Compare this with the sub sequence sum given by the oracle.Which ever is greater will fetch us the answer.

So the oracle is making our task very easy.But you may wonder where is this discussion heading.How do u think the oracle will solve the problem?Dont you think the oracle having $n-1$ elements can further give all elements except one to another oracle.

Doesnt this thought trigger something?Its nothing but the decrease and conquer told differently!and as you can see this is done in linear time.

Interestingly this problem has a history too.It first arose in two dimensional form and inorder to simplify it was converted in one dimentional form.First a cubic algol was formulated after which a quadratic and a logarithmic time followed.The best part was the linear algol was designed in less than a minute by a person who attended a seminar where this problem was told about.

How does it really matter?

You may always say that my computer is very fast and so it doesnt really matter.ok whats the harm in comparing anyway.Lets say a computer is as fast as 10GHz. That is nothing but it can perform$10^{10}$ steps in one second.

$N$	$N^{2}$
$10$	$100$
$100$	10000
$1000$	$10^{6}$
$10000$	$10^{8}$
$10^{5}$	$10^{10}$
$10^{9}$	$10^{18}$

Lets calculate the time for the last instance.

Time taken by the linear one would be=$10^{9}/10^{10}$=0.1seconds

Time taken by the quadratic one would be=$10^{18}/10^{10}$=$10^{8}$seconds

Converting that into years it would approximately 3.5yrs!!!!!!

You see what difference it makes.One takes less than a second to solve the problem whereas the other takes years.And this is just between linear and quadratic.Imagine the scenario for the cubic and the further ones.

Of course it is inavitable to use those when u cant find a linear time algol or when there is no such algol.But effort has to made to reduce the complexity and one way of bettering would be using the apporopriate algorithm design techniques.

"Science is what we understand well enough to explain it to a computer".The technology may advance even more tomorrow and there will be new programming languages formed,but independent of all this the essence of algorithms will always be the same.Thats what makes it the most important part computer science.Dijkstra was right when he said "Computer Science is no more about computers than astronomy is about telescope".

What do you think is the real foundation of computer science?:)

So what happened to our 6 friends?Well they had their best beginning and continued on what became the amazing 2 months of their learning experience.

Moral of the story-Between a fast computer and a fast algorithm always go for the latter.you know why :):)

Nikitha Shenoy

HOW SIMILAR ARE WE ;)

The evolution of life on this planet is based around DNA replication . Over 400 million years, glitches in DNA replication lead to genetic mutations, leading in turn to a variety of life forms. Molecular evidence suggests that between 8 and 4 million years ago, first the gorillas and then the chimpanzees split off from the line leading to the humans. So how similar are we and our closest pal, chimpanzee? The answer must obviously lie in the DNA. Will it surprise you if told that you can extract this solution from a DNA strand using algorithms ? ;) Then good, this blog will let you know how..:) The power of algorithms!!.. :)
Consider a DNA strand each from 2 different organisms. Let the strands be X=ACCGGTCGAGTGCGCGGAAGCCGGCCGAA and Y=GTCGTTCGGAATGCCGTTGCTCTGTAAA. As is our aim ,let us try finding similarity between X and Y. Similarity can be defined in different ways. If one strand is the substring of another, then we can say that they are similar. As you can see neither X nor Y is the substring of the other. Another way is ,if the number of changes required to turn one strand to other is small then the strands are similar. And yet another way, this the one we are interested in, is to find a subsequence common to both and whose length is the largest. Okay ,now major thought in most of your minds will be “ how the hell will we know what you meant by subsequence, forget about being 'common' ”.. Right isn’t it?.. :D So now lets get into the learning of what is a subsequence.

Subsequence of a sequence is nothing but a sequence obtained by deleting 0 or more elements. Confused !!.. As always examples to the rescue.
Given a sequence X= [A,B,C,D,B], Z=[B,D,B] is one of the subsequences with index sequence [2,4,5].
Given 2 sequence, the subsequence similar to both is called common subsequence.
Eg: Given X = [ A,B,C,B,D,A,B] ,Y=[B,D,C,A,B,A] some of the common subsequences are
1. X = [ A , $\bf{B}$ , C , B , D , $\bf{A}$ , $\bf{B}$ ] ,
Y = [ $\bf{B}$ , D , C , $\bf{A}$ , $\bf{B}$ , A ]
$\rightarrow$ [B,A,B]
2. X = [ A , $\bf{B}$ , $\bf{C}$ , B , D , $\bf{A}$ , $\bf{B}$ ] ,
Y = [ $\bf{B}$ , D , $\bf{C}$ , $\bf{A}$ , $\bf{B}$ , A ]
$\rightarrow$ [B,C,A,B]
and so on.
Of these commom subsequences the one with maximum length is called LONGEST COMMON SUBSEQUENCE [LCS]. As you can notice now,LCS that can be obtained from the given X and Y is of length 4 and is not unique. [B,C,A,B] and [B,D,A,B] are both subsequences of length 4. Hence they are the LCS.
Now that we understand what is LCS, finding a common subsequence of maximum length between X and Y shows us , to what extent they are similar. So now our aim is , given 2 sequences X = [ $x_1$ , $x_2$ ,.., $x_m$ ] and Y = [ $y_1$ , $y_2$ ,.., $y_n$ ] find a maximal length common subsequence of X and Y.
Before jumping into finding the LCS we need to know something more. Given a sequence X = GMAIL,
$X_1$ = G
$X_2$ = GM
$X_3$ = GMAI
$X_4$ = GMAIL
are also subsequences of X. They are called prefixes which are denoted by name of the sequence followed by a subscript indicating the number of characters the prefix contain. A prefix is a sequence with end deleted. To be precise, we define the ith prefix of X where i=0,1,2,…,m as X = [ $x_1$ , $x_2$ ,.., $x_i$ ]. $X_0$ is an empty string. Another point worth mentioning is LCS(X,Y) gives the actual sequence whereas lcs(X,Y) denotes the max length of the subsequence.
A straight forward method of finding LCS(X,Y) is by taking all the subsequences of X and checking if they are subsequences of Y too and keep the longest. If X is of length $m$ then there are $2^m$ subsequences. Hence calculating by this method will take exponential time which is a drawback if $m$ is large.So brute force method ruled out. What other technique can we apply. Divide and conquer eh? Okay lets give it a shot.
Consider X= MAN456 , Y= 456SIN,using divide and conquer technique we need to find LCS of MAN,456, 456,SIN separately and combine them. LCS of each one is an empty string, combining which we get LCS(X,Y) also as an empty string , which is a wrong answer because LCS(X,Y)=456 . We cannot split the given inputs X and Y into half ,find their LCS and concatenate the results of fist half and the second half to give the right answer. So the divide and conquer technique does not work here as of now. [Hirschberg hs given an algorithm based on this technique which reduces the space complexity of LCS problem.A link to it is given towards the end.]
Instead of blindly trying out our hands at different techniques , let us try to look into the properties of this LCS.
Suppose 2 sequences end with same characters i.e NEWTON and NEUTRON , then LCS will definitely contain the common characters coming at the end. So its better if we find LCS of the subsequence obtained by deleting these common last characters, and then append the deleted words to it.
1. In NEWTON and NEUTRON , go on removing the last common letters. So we end up having NEWT and NEUTR.
2. Now find the LCS of NEWT and NEUTR which by inspection is NET.
3. Then all we have to do is append the deleted common letters .i. e ON
4. So we get LCS as NETON.

So what we basically did here is, found the LCS of ‘prefixes’ and then appended the common ending elements. By doing so we are reducing the the length of the sequence whose LCS is to be found. Now that’s a plus point.Finding LCS by shortening the sequence.

The above property can be generalized as :
1. X = [ $x_1$ , $x_2$ ,.., $x_m$ ] and Y = [ $y_1$ , $y_2$ ,.., $y_n$ ] if $x_m$ = $y_n$,
then LCS(X,Y) = LCS( $X_{m-1}$ , $Y_{n-1}$ ) , $x_m$
Remember $X_{m-1}$ and $Y_{n-1}$ are prefixes and $x_m$ denotes the $m^{th}$ element in X

Now what if X and Y do not end with same characters. Then LCS(X,Y) is longest of the
LCS( $X_{m-1}$ , $Y$ ) and LCS( $X$ ,$Y_{n-1}$).
Consider this example: X= PQRSTUV ,Y=QRSVW. The LCS of 2 sequences should either end with V or not
Case 1: The LCS ends with V. Then W does not come in the LCS. In that case LCS is nothing but LCS of $X_m$ , $Y_{n-1}$.

Case 2:LCS does not end with V. Hence we can remove the last element from the sequence leading us to LCS(X , Y)=LCS ( $X_{m-1}$ , $Y_n$ )
Whatever be the case, LCS(X , Y) should be the longest. Hence the longest of
LCS ( $X_{m-1}$ , $Y_n$ ) and LCS ( $X_m$ , $Y_{n-1}$ ) should be the answer.
What if length of either X or Y is 0? Then LCS is definitely 0. Koi doubt math rakhna.. :D

As we can infer from above discussions , LCS(X , Y) can be found by finding the LCS of smaller subproblems. Solution to the main problem depends on solution to the smaller sub-problems,which depends on smaller subsubproblems. Also the subproblems are overlapping. Now which technique can we apply to solve such problems.Dynamic programming would be the right answer…

Lets begin the solving process. What say?... let c[i,j] contain the length of the
LCS ( $X_i$ , $Y_j$ )i.e lcs ( $x_i$ , $y_j$ )].Using the observations done above we can write

c[i,j]= 0 $\Rightarrow$ if i = 0 or j = 0
c[i,j]=c[i-1,j-1]+1 $\Rightarrow$ if i,j> 0 and $x_i$ = $y_j$
c[i,j]=max{c[xi,yj-1],c[xi-1,yj]} $\Rightarrow$ if i,j> 0 and $x_i$ != $y_j$

Notice that in this recursive formulae we are restricting which subproblems to consider. That is when $x_i$ = $y_j$ we solve the subproblem LCS[ $X_{i-1}$ , $Y_{j-1}$ ] and other subproblems for different conditions… Using the bottom up approach we can find the LCS of all prefix pairs of $X_i$ and $Y_j$ , from shortest to longest. We iterate through these prefix pairs filling a rectangular table with the information needed to continue and to eventually reconstruct an LCS.Consider another table b[i,j] which points to subproblem used to compute c[i,j]. Combining the tables into one , we can find the solution.
Solving using an example makes the matter clear once and for all.
Let X = [A, B, C, B, D, A, B] and Y = [B, D, C, A, B, A]. Writng X as row header and Y as column header , we calculate c[i,j] for all i,j along with b[i,j]. The first row in the table contains lcs ( $x_0$ , $y_j$ ) which turns out to be 0. Filling the next row,
c[1,0] = 0
To find c[1,1]..here $x_1$ = B and $y_1$ = B , since $x_1$ = $y_1$,
c[1,1] = c[0,0]+1= 1 $\Rightarrow$ since c[0,0] was used to compute c[1,1] ,b[1,1] points towards the cell c[0,0]..

Going onto c[1,2].. here $x_1$ = B and $y_2$ = D , Since $x_1$ != $y_2$ we try to find the max
c[1,2] = max{c[0,1],c[1,1]} = 1 $\Rightarrow$ c[1,1] was used to compute c[1,2] hence b[1,2] points to the cell c[1,1]
What if c[i-1,j] and c[i,j-1] turns out to be equal. Then some of them follow the convention of making b[i,j] point to both the cells c[i-1,j] and c[i,j-1].But here we will choose any one cell.
Now proceeding this way we get a table as shown above which contains the lcs ( $x_i$ , $y_j$ ) and the pointers.

But how do we get LCS. Simple!! JUST trace back from bottom most last cell following the direction of arrows. Wherever $x_i$ = $y_j$ the arrow points to $\nwarrow$ .That means in that $i^{th}$ position of X and $j^{th}$ position of Y the elements are common.So they definitely form the part of LCS. Hence print them out. So following the arrows and printing out elements wherever we find $\nwarrow$ we get LCS in reverse order. Reverse this to get our LCS we are looking for.From the table above LCS(X,Y) turns out to be [B, C, B, A] of length 4.There, that’s how we do it.Here is an algorithm for the same.

LCS-LENGTH(X, Y )
1. m $\leftarrow$ length[X]
2. n $\leftarrow$ length[Y]
3. for i $\leftarrow$ 1 to m
4. $\,$do c[i, 0] $\leftarrow$ 0
5. for j $\leftarrow$ 0 to n
6. $\,$do c[0, j ] $\leftarrow$ 0
7. for i $\leftarrow$ 1 to m
8. $\,$do for j $\leftarrow$ 1 to n
9. $\,$$\,$do if $x_i$ = $y_j$
10.$\,$$\,$$\,$ then c[i, j ] $\leftarrow$ c[i − 1, j − 1] + 1
11.$\,$$\,$$\,$$\,$ b[i, j ] $\leftarrow$ “ $\nwarrow$ ”
12.$\,$$\,$$\,$ else if c[i − 1, j ] $geq$ c[i, j − 1]
13.$\,$$\,$$\,$$\,$then c[i, j ] $\leftarrow$ c[i − 1, j ]
14.$\,$$\,$$\,$$\,$ b[i, j ] ← “ $\uparrow$ ”
15.$\,$$\,$$\,$ else c[i, j ] $\leftarrow$ c[i, j − 1]
16 $\,$$\,$$\,$$\,$$\,$b[i, j ] $\leftarrow$ “ $\leftarrow$ ”
17. return c and b

If the size of the inputs are $m$ and $n$ , then there are m $\times$ n entries in the table, and each entry takes O(1) time to compute. Printing out LCS by retracing can be done in linear O(m + n) time, hence overall, running time of this algorithm is O(mn) and requires O(mn)space. If the input size ,m and n, are equal,then algorithm becomes quadratic. We can reduce the asymptotic space requirement of LCS-LENGTH,as you can notice that LCS-LENGTH is computed row by row,depending only on the row being computed and the row above. This reduces the space requirement to linear. But if we want to reconstruct the LCS,then we would require the complete table occuping quadratic space. Luckily Hirschberg has provided a linear space algorithm which combines the techniques of dynamic programming we used above with the classic divide and conquer approach. If ur interested to knoe more here is a link for the same..

http://www.cs.zju.edu.cn/people/yedeshi/Alinearspace-p341-hirschberg.pdf

LCS can be applied in comparing 2 DNA sequences for similarity , 2 English words for spelling, 2 Java files for repeated codes, gas chromatography, bird song analysis and so on. An equivalent problem to LCS is the “minimum edit distance” problem, where the legal operations are insert and delete. The minimum edit distance to transform X into Y is achieved by doing |X| − LCS(X,Y) deletes and |Y| − LCS(X,Y) inserts. Its the basis of unix “diﬀ” command(a file comparison program that outputs the difference between 2 files), where X and Y are ﬁles, and the elements of X and Y are lines of text

Remember our initial DNA strand problem of X=ACCGGTCGAGTGCGCGGAAGCCGGCCGAA and Y=GTCGTTCGGAATGCCGTTGCTCTGTAAA. Their LCS turns out to be GTCGTCGGAAGCCGGCCGAA

And what about our pal chimpanzees ?? How similar are we?..:) I know most of you are waiting for that answer. Well here it is..Its found that human and chimpanzee DNA are 96% similar... :)Now that’s very close… So next time someone shouts and calls you out as chimpanzee or a gorilla in anger,then they are almost right, so tell them “yup that’s right!!do you know HOW SIMILAR ARE WE??”

Shobitha Shetty

FACE TO FACE!!!

Facebook. It needs no introduction. It has become a grand face of the social networking era. From a school going kid to a retired man, everyone is facebooking. Its home page is the most frequently visited webpage on the entire internet. Many brands make their biggest impacts through this famous digital marketing tool. With its amazing applications, facebook has acquired over 500 million users. So, what is the connect with ‘Algorithinking’?

As soon as you log in, news feed is the first page that you see. Ever wondered how news feed shows you the relevant content? Do you think its some random event? If facebook had showed you all contents generated by your friends, well, you know what would have happened. Especially when it comes to brands, facebook ensures that specific brands are seen by the right people through news feeds. Facebook recently showed us some insight into the algorithm that credits directly to the ‘News Feed Optimization’, the edgerank algorithm.

There are two distinct parts of the facebook’s news feed. 1.’Top news’ and 2. ‘Most recent’ . The news feed that you get to see, is always, by default, the top news. According to facebook, top news is “based on an algorithm[that] uses factors such as how many friends are commenting on a Post to aggregate content that [a person] will find interesting. It displays stories based on their relevance rather than in chronological order.”
On the shallow level, there exists something called an object.
Facebook views all the inputs as “Objects”. And those “objects” may be, status updates, video links, pictures, etc.
When you have an ‘object’ in your news feed, whenever some other user interacts with this object, ( say he/she comments on a status update) they are creating what is known as an Edge.

The edge rank formula:
$\sum_{edge e}U_e W_e D_e$
The image below shows the News feed optimization expression.
It is the snapshot of the algorithm, posted by a researcher at MIT.

There are three main components related to an Edge: Affinity, weight and time.
Affinity: You will have more affinity towards some friends on facebook than others. The more often you like, tag, comment, view, and as well click on a friend’s object, the so-called affinity score between you and your friend increases. And the affinity is always a one way path. If your friend has liked or commented on your posts, then there is a high probability that your news feed has your friend’s posts.
For example, I like The Beatles and I frequently visit its pages and comment on its posts. So, my news feed has its contents because of the high affinity I have to it, on facebook.

Weight: This measure deals with the value of an object’s edges. More the number of edges an object has, more is its weight and so will be its Edge rank. An edge differs from the other based on whether its created due to commenting or liking or tagging that post.
It is obvious that more weight will be given to an edge if it’s a comment on the respective object compared to when it is liked.

Time: Your news feed is most likely to contain today’s news rather than yesterday’s. Hence facebook has this criteria add to its edge rank. Affinity and weight are not the only criteria that facebook looks into. Because time is the most valuable factor that might change the way your pages look. For example if Karnatka’s CM has been replaced on Wednesday and it is the top most news, everyone is commenting on, and let say it has received some 500 comments, it won’t be on your news feed after a week despite of having large number of comments. ‘Time decay score’ is the major factor here playing its role and the reason behind our news feed getting the latest news.

Now, is this really important?

If it wasn’t showing you relevant information, then the chances of you missing on something really important would be high. Or you might just become bored with all kinds of news, all the time, both relevant and irrelevant which might become one of the reasons for facebook’s breakdown!

When it comes to business and brands, a user’s affinity towards your brand is your key ingredient for success. If you want your brand’s contents on almost everyone’s news feed then you have to create some, that will have high edge rank based on all the three components.
A white paper has been written to explain Facebook's edge rank algorithm. You can refer to it for more information.
http://forms.buddymedia.com/whitepaper-form_facebooks-edgerank.html

Happy facebooking :-)

Shruti Ranade

Monday, August 1, 2011

In Quest of the Best !!

Jerry, a cute looking mouse, extremely intelligent(Stupid at times !! ) is eating cheese.. He hears some crackling noise ou

t and goes out to check. He reads a note that says,

“ Dear Jerry,

Please take care of Tuffy over the Thanksgiving holiday. P.S : he loves to eat! “ :P

Jerry being a little lazy frowns over the idea of spending a nice holiday with Tuffy. A little later, “A Nanny!!” ,he exclaims to himself and thinks “I can always hire a nanny as long as I have enough cheese to pay her ;)”. But Jerry is a perfectionist. He wants to hire ‘The Best’ nanny for his dear nephew and was quite excited about the idea. But on what basis would he know who the Best was? He had an answer for that too. He said “Whoever has scored the highest marks in ‘Mouse-keeping’ school will acquire this post”. By the way, you have to know this, Jerry loves bossing around. Ah...ha…you guessed it right!! That is another reason why he wanted to ‘hire’ someone :P

Much to Jerry’s delight, he saw a bunch of them waiting for the post. Soon he murmured to himself, “Dude it is not going to be easy to choose the best”. He planned to do the following:

(Are you wondering "Is Jerry not smart enough to find out the highest-scorer among the lot?". Well, let me tell you. Jerry has no clue about anyone's marks. Only when a person enters the room for interview process, he gets to know the score and either he has to be hired or fired with no other option left. Also assuming that ‘Mouse-keeping’ school doesn’t give out the highest –scorer)

• Ask them to maintain a queue and let everyone in one by one in the order A,B,C and so on.

• Blindly hire ‘A ‘and ‘pay’ some cheese.

• Look at ‘B’. In case if ‘B’ has a score better than the previously chosen one i.e. ‘A’, then he fires ‘A’ (along with which, the paid cheese is gone :P) and hires ‘B’. Not to forget, he has to pay cheese to ‘B’ for hiring.

He plans to do this until he interviews everyone in order to find the best among the lot.

“Oh dear God!! Won’t I run out of cheese if I do that? What if I feel restless in the middle of the whole process?? ” . Yes…finally...he realizes how much he loves cheese and how lazy he is!! So he contacts his good old mathematician friend Mr.X to help him out with this problem.

Mr.X thinks over the problem a little and speaks to Jerry. The conversation goes like this.

Mr. X : I can help you if you are ready to compromise a little.

Jerry : Oh ! I don’t mind as long as it finds me The Best nanny and I get a larger share of the cheese.

Mr.X : Ok I can’t promise you that I will find the best Nanny, but I promise you to save large amount of your cheese and get you a Nanny closest to the Best.

Jerry : Nice. Alright then, give out the solution.

Mr.X : Given $n$ members, find the best among 1 to $k$ by hiring and firing process. Let us call her $\beta$ . Hire $\beta$. Find the first person among $k+1$ to $n$ who beats $\beta$ with her score, say $B1$. There you are!! She is the nanny you want to hire.

Jerry (with a weird look) : Really?? You think this is gonna work?? And how do I know $k$?

Mr.X : Well ,$ k$ = number of people attending the interview($n$) /exponential constant ($e$)

Jerry : Oh great !! I will try that out.

Hurray !! he verifies that the method actually works !! He hires $B1$ and she really does take very well care of tuffy .

As usual, happy Ending of the story. :) :) :)

But wait a minute, what exactly happened behind the scene? How was he so sure that this will work out? Any miracle? By the way,Who is this Mr.X?

Mr.X is none other than the ‘Probability’ !!!!!

Let us see how probability finds us a solution for this !! Summarizing the method :

1. A $k$ is found. This $k$ is the point at which you can AFFORD to feel restless.

2. What is the use of this $k$?

Hire the best person $\beta$ in the range 1 to $k$ by ‘hiring and firing’ process. Find the person in the range $k+1$ to $n$ who beats $\beta$. Let us call it $B1$. There are high chances of $B1$ being the Actual best. If this works, then u save a lot of money, time and yet you get a best or closest to best. But how?? Perplexed??

Points to be noted is that ‘best’ i.e. $Bi$ has to occur after $k$ and most of the times it does !! So proceeding with this is worth a try. But the question is how to find a $k$ so that this method works?

$Bi$ : Probability of best occurring in ith p

osition i.e. $\frac{1}{n}$

$Wi$ : Probability of $\beta$ occurring in right position $\frac{k}{i}$

$\beta$ can occur in any one of the k positions. k has to occur before i.

Hence for the method to be a success, both these conditions needs to be satisfied. As both the events are independent, the probabilities have to be multiplied. Hence, by the product rule, the probability of this happening is $B_i.W_i$ $B_i.W_i$

$i$ has to occur after $k$, i.e. anywhere between $k+1$ to $n$

$\sum_{i=k+1}^{n}\; W_i B_i$

$\sum_{i=k+1}^{n}\frac{k}{i-1} \frac{1}{n}$

$\frac{k}{n}\sum_{i=k}^{n}\; \frac{1}{i}$

We know tat ,

$\ln n= 1\: + \frac{1}{2}\: + \frac{1}{3}\: + \frac{1}{4}+\cdots +\frac{1}{n}$

$f(k) = \frac{k}{n}\;\left ( \ln n - ln k \right )$

(Click on the image to get a clear view)

A tangent is drawn at the peak. As the slope is 0, f'(k) is equated to 0 in order to find the value of k.

f '(k) = 0

f '(k)=$ \frac{ln n}{n} -\frac{1}{n}\;\left ( k.\:\frac{1}{k}\;+\; ln\:k \right )$

$\frac{ln n}{n} - \frac{1}{n} - \frac{ln k}{n} = 0$

$\frac{1}{n} \left(ln \frac{n}{k} - 1 \right) = 0$

$\left(ln \frac{n}{k} - 1 \right) = 0 $

$ln \frac{n}{k} = 1$

$\frac{n}{k} = e $

$k = \frac{n}{e}$

Wow is that it?? At $k = \frac{n}{e}$, there is a high probability of beta being very close to ‘Best’ which is our aim. You can afford to stop your interviewing process at ‘k’ and also not compromise with your principle of finding the ‘Best’.

If there are n candidates for interview, you can stop the process at n/e and still find the Best with high probability.

Not convinced?? Let us work out the problem given :

(A python generated output :P you can trust this :P )

Work out this on your own and tally the answers given. This way, you will get more acquainted with the method.

Let us say there are 100 candidates for the interview (assuming that there is only one ‘Best’). Their marks are as follows.(These marks are 'random'ly generated by python)

Lets follow Jerry's method i.e. hiring and firing process...That way, the total number of hiring is 5. But once you reach '100' you dunno if that is the best or not. You end up interviewing everyone with the hope that you can find someone better than the one you have already found.

The main aim of the method proposed is to avoid the interviewing process for everyone and yet find a best one. Basically what it means to tell is that you can afford to get restless at $k$

According to the method proposed,

1.K=n/e =100/2.714 =36 (represented by the red line)

2.Number of hirings before k = 4

3.The Obtained Best within k = $\beta$ = 98

4. The ‘Best’ chosen after k = 100 !!!! Which is the actual Best !!

*************************************************************************************

Consider this case where you a $B1$ closest to $Bi$

1. K=n/e =100/2.714 =36 (represented by the red line)

2. Number of hirings before k = 6

3. The Obtained Best within k = $\beta$ = 91

4. The ‘Best’ chosen after k = 98 (represented with yellow)!!!! Which is closest to the actual Best (represented with yellow)!!

*************************************************************************************

Worst Case: If the Actual Best occurs within the range 1 to k, then the method fails and you

1. K=n/e =100/2.714 =36 (represented by the red line)

2. Number of hirings before k = 6

3. The Obtained Best within k = $\beta$ = 100 (Disaster :P)

4. The ‘Best’ chosen after k = 17 (represented with blue)!!!! If none of the values beat $\beta$ then by default the last value is chosen to show that you have traversed the complete list.

*************************************************************************************

Probability at its 'Best' ;) Hope you have your eyebrows raised ;) If not, you need to go through it once again ;)

FYI : The class of algorithms where input is fed in serial fashion and you don't have the prior knowledge about the entire input at the time of processing are called Online Algorithms. Also, this problem is the same as 'The Secretary Problem'. You can google and take a look at it :) :)

Shruthi R Nayak

FOR THE SAKE OF PANCAKE!

It was 12 in the afternoon and I was badly hungry. "One more hour for lunch break!”, I thought. One of the greatest teachers I strongly admire, started off with his lecture on some of the concepts of theoretical computer science. As expected we all were drowned in the world of his logics and were enjoying his lecture. It was as though my hypothalamus ceased to function and I forgot I was hungry. After a while he presented in front of us, a problem, that reminded me of my hunger once again!

It was about the yummiest south Indian food, dosa. The problem goes like this. A chef in some famous south Indian diner is preparing lots and lots of dosas, all of different sizes and is piling them up. But before delivering them, he wants to rearrange them such that the smallest one is on the top of the stack and the largest at the bottom. He has only one spatula, and he can place it somewhere in the stack, lifting the portion above the spatula, flipping it, and then placing it back on the stack. If there are $n$ dosas, what is the minimum number of flips performed by the chef to rearrange the stack? This, is the Indian version of the famous pancake problem.

Jacob Goodman, under the name Harry Dweighter, proposed the pancake problem in the year 1975. This scenario of a chef rearranging the pancakes was proposed by him, keeping in mind the minimum number of flips needed to obtain a correctly arranged stack of pancakes. One might wonder about the need to formulate such problem. It has many applications in the field of genetics, in sorting out the genome sequence.

The simplest solution to this problem requires $2n-2$ flips for $n$ pancakes. This can be explained with an example. The chef can place his spatula below the largest pancake, flip it, and place it back on the stack to obtain a newly arranged stack. He can then place his spatula below the second largest pancake and follow the same procedure for all the $n-1$ pancakes, because the last one left will be the smallest and it is evident that it will be on the top. Hence no flip is required for the smallest. From the above argument you can easily make out that each pancake is associated with two flips.

$2(n-1)+ 0= 2n-2$

The $0$ in the above equation indicates the number of flips for $n=1$.For remaining $n-1$ pancakes, the number of flips should be $2(n-1)$ flips, i.e $2n-2$ flips.

The above upper bound can be further reduced to $2n-3$. Consider you have only 2 pancakes with you. Now only one flip is required to re-arrange it. If there are $n$ pancakes, then apply the procedure mentioned above for $n-2$ pancakes and this will fetch you:

$2(n-2)+1=2n-3$ flips.

Here $+1$ is for the $n=2$ condition and the remaining $n-2$ pancakes, each being associated with 2 flips will amount to $2(n-2)$ flips, giving a total of $2n-3$ flips.

It occurred to some that the above algorithm can be improved and some attempts were made in the same.

In the year 1979, an algorithm was proposed by, the very famous Bill Gates of Microsoft fame ( perhaps the only technical paper that Gates came up with!) and Christos.H.Papadimitriou. The different sorting algorithm had lower and upper bounds to be $\frac{5n+5}{3}$ and $\frac{17n}{16}$ respectively.

Terminologies required to understand Gates’ and Papadimitriou’s algorithm:

$\sigma$: It is the permutation of numbers ranging from $1$ to $n$, you are provided with. The numbers in this permutation indeed signify the size of the pancakes. Larger the number greater is the size of the pancake.

For example, $\sigma$=( 5 1 4 2 3) denotes the arrangement of pancakes such that the largest i.e, number 5 is on the top of the stack with the third largest at the bottom and so on. The aim is to re-arrange it such that you get the identity permutation, which is $i$=( 1 2 3 4 5).

$S_n$ is the set of all permutations of numbers ranging from $1$ to $n$.

Adjacency: For a given stack of pancakes, an adjacency is an adjacent pair of pancakes in the stack such that there is no such pancake whose size is intermediate between the two.If the largest pancake is at the bottom this also counts as one extra adjacency.

For example if $\sigma$=( 2 5 3 7 6 1 4), (7,6) is an adjacency. Because they are adjacent to each other in the given $\sigma$ and as well as there are no pancakes of size in between 7 and 6.

Consecutive: If two pancakes are consecutive to each other in the identity permutation, then they are said to be consecutive.

For example: Given $\sigma$=( 3 5 1 2 4) : 1 is consecutive to 2, 2 is consecutive to 3, 3 is consecutive to 4, 4 is consecutive to 5, 5 is consecutive to 1.

A block: Sticking with the definition in the paper a block is a maximum length sublist $x=\sigma_i \sigma_{i+1}{.......}\sigma_{j-1} \sigma_j=y$ such that there is an adjacency between $\sigma_p$ and $\sigma_{p+1}$ for all $i\leq p \leq j$.

End points: The initial and final elements of the block are called end points.

Singleton: An element that does not belong to any block is called as a singleton.

For example consider $\sigma$=( 3 4 5 1 6 7 2):

In the above permutation:

(3 4 5) is a block since the consecutive elements form an adjacency and so are ( 6 7). They form an adjacency as well as a block since its evident that every adjacency is also a block.

1 and 2 are singletons since they do not belong to any block.

A table has been given in the paper whose link is given below:
You can refer the paper for many cases which are used below
http://etd.ohiolink.edu/send-pdf.cgi/Armstrong%20Alyssa.pdf?wuhonors1242223287

In the following example $B$ denotes the initial element of the permutation, i.e., the leftmost element.The separation between the blocks and singletons is marked by the symbol $\__$.If B begins with any block then we denote it as $B\simC$. We do the same thing for the right most element of the initial block and denote the element consecutive to it as $D$. Remaining all elements between blocks and singletons are denoted by $\__$.

Example: let $\sigma=(3 4 1 5 2)$:

( 3 4) is a block denoted by B$\sim$C.

A=2, since the element consecutive to 3 ( other than 4 which is already in the block) is 2.

D=5, since 4 is the rightmost element of the block and 5 is consecutive to 4 which is also a singleton.

So we denote this permutation as B$\sim$C$\__$D$\__$A.

Having these definitions in mind, we can proceed with the following example:

Let $\sigma \in S_8$, i.e., $\sigma=(2 3 4 7 6 1 5)$ and the identity permutation $i_8=( 1 2 3 4 5 6 7 8)$:

According to Gates' algorithm:

Begins with block ( 2 3 4) and 2 is consecutive to 1 and is also a singleton.Hence, by case 4,we have
B=2, A=1
So, we reverse at 6 to get ( 6 7 4 3 2 1 5).

Now, the permutation begins with the block ( 6 7 ). 6 is consecutive to 5 which is a singleton. So, by case 4,:
B=6, A=5:
We reverse at 1 to get ( 1 2 3 4 7 6 5).

Now, the initial block is ( 1 2 3 4) and 1 is consecutive to 7 of the block ( 7 6 5). By case 5 of the table,
B=1, A=7
So, we reverse at 4 to get ( 4 3 2 1 7 6 5).

Now, we got a single block, in such cases we go for the trivial algorithm,
The largest is 7, so reverse at 7 to get ( 7 1 2 3 4 6 5) and further reverse at 5 to get ( 5 6 4 3 2 1 7).

Again we have got a single block, apply the trivial algorithm,
Reverse at 6 to get ( 6 5 4 3 2 1 7) and at 1 to get (1 2 3 4 5 6 7).

Thus, we got the identity permutation after 7 reversals.

Using this algorithm Gates and Papdimitriou showed that the lower bound for the algorithm is 1.667. In 1997, Mohammad H. Heydari and I. Hal Sudborough improved the lower bound to $\frac{15n}{14}$ and computed the pancake numbers up to 13.

The above image shows the team of UT Dallas computer science students and their faculty adviser who improved upon a longstanding solution to the pancake problem.

An interesting variant of the problem, proposed by Gates and Papadimitriou, concerns pancakes that are burnt on one side. Here the pancakes must be sorted with the burnt side down.

Applications:

The pancake problem is relevant to the construction of networks of parallel processors, in which pancake sorting can provide an effective routing algorithm between the processors.

Analysis of Genomes: In evolutionary processes, changes in DNA sequences can cause a new species to get separated from an existing one, resulting in diversified life-forms. If you want to reconstruct such changes, you can look into genome sequences of related species. An event where a certain sequence is deleted and is replaced with its reverse sequence can be looked for. Thus analyzing the transformation from one species to the other is analogous to the problem of finding the shortest series of reversals.

KMP

Legendary Computer Scientist Donald Knuth.The 'K' of the KMP

"People think that computer science is the art of geniuses but the actual reality is the opposite, just many people doing things that build on each other, like a wall of mini stones."-Knuth

I never really gave a thought about string matching unless of course i opened the 1313 pages Coreman ebook and pressed Ctrl+f to search for “String Matching” :P.when you discover things like these you automatically start looking for it.So the next time i used the search option to look for a mail or used the find and replace button or when i wondered what page rank would do if there was never really an algorithm for searching strings,the KMP was always on my mind :-)

String matching the word says it all.It is simply finding occurrence of a pattern in a given text.why do we need this?Other than the things already said String Matching also finds its application in identification of patterns in a DNA sequence.

They say a problem well formulated is half solved.Keeping the solving part aside for while lets concentrate on the formulating part.We assume that the text is an array $T (1 . . n)$ of length $n$ and that the pattern is an array $P(1 . . m)$ of length $m$.

And we say pattern P occurs with shift $s$ in text T if

$P[1 .. q] match T[s+1 . . s+q]$

If P occurs with shift $s$ in T , then we call $s$ a valid shift,otherwise we call $s$ an invalid shift. The string-matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T

For most of the problems there is always a naive method of solving it.One can call it The Brute Force.But why call it naive?Its just the easiest way out & necessarily(most of the time :P) not the best.So using this method if we have to find the occurrence of the pattern,we check for all possible shifts.After each shift, we one by one check characters at the current shift until a mismatch occurs and if all characters match then prints the match.

Google goggles

Say the pattern here is “goggles”

We can easily instruct the computer to do the following

Shift 1-

Google goggles

Goggles

Shift 2-Google goggles

goggles

Shift 3- Google goggles

goggles

Shift 4-Google goggles

goggles

Shift 5- Google goggles

goggles

Shift 6-Google goggles

goggles

Shift 7- Google goggles

goggles

Shift 8-Google goggles

goggles

Thus we say the pattern occurs at shift 8.

So what are we actually doing?Simply trying to match the pattern for all possible shifts.The algorithm is simple enough

NAIVE-STRING-MATCHER (T,P)

1. $n = T.length$ // Store the length

2. $m = P.length$

3. $for 0 to n-m$ // no of shifts

4. if $P (1 .. m) == T (s+1 .. s+m)$

5 .Print “pattern occurs with shift”,s

Just take a moment and think here.What can be the worst case?If you cant think of the worst case then take this example.Modifying the first example a tiny bit

T=Google google

P=Googles

In this case only the first shift changes.During the first shift you happily go on comparing when char after char matches only to be disappointed in the end when the last char turns out to be a mismatch(s and space are matched)*Damn!*.We can thus conclude The Naive pattern searching algorithm doesn't work well in cases where we see many matching characters followed by a mismatching character.

Hence in worst case we go through all the shifts which is $n-m+1$ & in each shift we end up comparing all the characters which is $m$.Hence the time complexity of this algorithm is $O((n-m+1)(m))$

In 1987 when two computer scientists fresh from their Turing Award looked at this problem they managed to give an altogether different approach.Their main idea was to represent the data in a different way.They used powerful computer science techniques like pre processing & hashing today.They converted both the pattern & and the sub-strings of the text into decimal numbers.The task of string matching would be then to simply check the equality between the numbers.It used to match the hash value of the pattern with the hash value of current sub-string of text, and if the hash values match then only it starts matching individual characters.Though the problem seemed to be simplified after the preprocessing,the main hinge was the amount of time taken in generating the hash values for the sub-strings of the text.Nevertheless its time complexity was better in average cases though it turned out to be same as the Naive method in the worst case scenario.

How can we say there is scope for improvement from the naive method?

From our example,we know that in shift 1 the first two characters “Go” match,can u not say confidently that shift 2 wont work.And this was just a small example.When dealing with thousands of words to match a pattern you can imagine the information wasted when a mismatch occurs using this method.The question that pops up now is whether it is possible to skip some shifts when we are confident enough from the previous shifts that it wont work.

You must have heard the proverb “The only real mistake is the one from which we learn nothing” or in this case it should be ”The only real mistake is the one from which we learn nothing from a mismatch:P”.

It is this ingenious idea that lead to what we call as KMP.

First published by Donald E. Knuth, James H. Morris and Vaughan R. Pratt, : “Fast Pattern Matching in Strings.” In SIA Journal on Computing.

So how do we go about formulating this idea?What they did was just this.Given that pattern characters P[1 .. q] match text characters T[s+1 .. s+q]

what is the least shift s' > s such that for some k < q,

P[1 .. k]=T[s'+1 .. s'+k]

You first take the pattern and calculate the value for 'k' for all cases like,if first char matches,first two char matches,first three matches and so on till all the char matches.Now we need to store this information for reference.Since we are not going to modify the data and access should be easy,we go for the array data structure.To make it appear more refined,lets call this array the prefix function.come on lets calculate the prefix function for the following.

P=Goggles

A[1]=Case where the first character matches.So we just shift by one pos

$s-k=1$

$1-k=1$

$k=0$

A[2]=The first 2 char “Go” matches.Now we are sure that shifting by one position wont work as G will be matched with o

$2-k=2$

$k=0$

A[3]=when “gog” matches.though shifting by one wont work,shifting by 2 may as it matches g with g

$3-k=2$

$k=1$

Similarly,

A[4]=1

A[5]=0

A[6]=0

Formulating the previous example(according to the algo below)

Shift 1- Google goggles

goggles

2 characters match-A[2]=k=0

shift by $q-k$ position,where $q$ is no of characters matched

2-0=2 positions

Shift 2-Google goggles

goggles

no characters match.simply move to next.

Shift 3-Google goggles

goggles

1 character has matched.and A[1]=0

1-0=1

shift by 1

Shift 4-Google goggles

goggles

no characters match.move next.

Shift 5-Google goggles

goggles.

Shift 6-Google goggles

goggles

And we are done :)

We can infer one thing.That the best case is when $k$ turns out to be 0.That is say $q$ characters have matched before the mismatch and $k$ turns out to be zero.Then we simply skip shifts from s,s+1......s+q.Look at shift 1 for instance in the above example.

Though given the trivial example above one may argue both methods being the same.But imagine the larger scenario where in we skip all wasteful shifts.

There are essentially two parts in the Algol.One to do the necessary preprocessing & the other to the matching task.

KMP-MATCHER (T,P)

1.$n$ =$T.length$

2.$m$=$P.length$

3.A=COMPUTE - PREFIX -FUNCTION(P)// Returns the array containing value of k

4 $q =0$

5 for i = 1 to n / / scan the text from left to right

6 while q > 0 and P[q+1] \neq T[i]

7 $q=A[q]$ / next character does not match

8 if P[q+1] == T[i]

9 q=q+1 / next character matches / is all of P matched?

10 if q == m

11 print “Pattern occurs with shift” i m

12 q=A[q] / look for the next match

COMPUTE -PREFIX-FUNCTION(P)

1 $m = P.length$

2 let A[1 .. m] be a new array

3 $A[1]=0$

4 $k = 0$

5 for q = 2 to m

6 while k > 0 and $P [k+1] \neq P[q]$

7 k=A[k]

8 if P[k+1]==P[q]

9 k=k+1

10 A[q]=k

11 return A

In the first one we taken the pattern and compute the value needed.This is done in O(m) time.

Give a thought about the second component.We learn from each mismatch & shift accordingly.what happens when the worst Case scenario in naive method occurs.That is,almost all but last character in the pattern matches.We shift by the ideal amount without wasting time over use less shifts.Thus matching can be done in at most O(n) time

Total time complexity=O(m+n) compared to the worst case in naive method i.e O(mn)

Reducing from the quadratic to a linear time!!! Isn't that awesome:-)

A great idea and the world follows.Thats just what happened here.If you think even you would have thought that way,then go,there are plenty of research problems out there waiting to be solved and if you succeed,who knows i may end up blogging about your algorithm too one day:)

Nikitha Shenoy