Algorithinking: CHAIN MATRIX MULTIPLICATION

“Neo: What is the Matrix?
Trinity: The answer is out there, Neo, and it's looking for you, and it will find you if you want it to’’…You guys remember this famous quote from the movie ‘MATRIX’? Yes? No?
Anyways, if Neo would have bothered to ask us the same question instead of asking Trinity, most would have given an answer we learnt since 4th grade…’Matrix is a rectangular collection of elements arranged in rows and columns. These elements can be numbers, expressions or symbols’…Don’t you agree?
Among the various operations that can be applied on matrices, let us concentrate on matrix multiplication. Given 2 matrices everybody knows how to multiply them. If we need to multiply matrix A with matrix B then the first thing we do is check if the number of columns in A equals rows in B. If it’s not so, then matrix multiplication is invalid. The product AB is nothing but the dot product of corresponding rows of A with columns of B. Here is an algorithm for matrix multiplication.

MATRIX_MULTIPLICATION(A,B)
1.if columns(A) != rows(B)
2.$\hspace{0.2in}$ “invalid”
3.else for i $\leftarrow$ 1 to rows(A)
4.$\hspace{0.2in}$$\hspace{0.2in}$for j $\leftarrow$ 1 to columns(B)
5.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$do C[i, j] $\leftarrow$ 0
6.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$for k $\leftarrow$ 1 to columns(A)
7.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$do C[i, j] $\leftarrow$ C[i, j] + A[i, k] $\times$ B[k, j]
8.return C
The basic operation in the above algorithm is in the inner most ‘for’ loop.
i.e C[i, j] = C[i, j] + A[i, k] $\times$ B[k, j] which contains both addition and scalar multiplication. Since multiplication is costlier than addition we can safely say that cost of multiplying 2 matrices depends on number of such scalar multiplications. So what is the cost? Here is how you find it. If A is a matrix of order p$\times$q and B is of order q$\times$r, then to find each C[i, j] you have to conduct q scalar multiplications[True isn’t it? Check out the inner most for loop. It executes columns(A) times, which is nothing but q times] and there are p$\times$r elements in product C. Hence in total there are pqr scalar multiplications. Hence cost of multiplying matrix AB is pqr.
This is when you have 2 matrices to be multiplied. What if you have a sequence of, say ‘N’, matrices to be multiplied? Given a chain of matrices ABCD, how do you multiply them? If one of the matrix is Boolean we may start by multiplying that matrix because it seems easier to start with or some of us may just start multiplying the matrices in the order they have been presented or may follow some other pattern. Multiply in any order, it doesn’t matter, (“do you agree? Please remember your answer to this”) matrix multiplication is associative. Multiply it either as (AB)(CD) or (A(B(CD))) or (((AB)C)D) or ((A(BC))D) or (A((BC)D)). We get the same product.

Now check this out. Consider the problem of multiplying the chain ABC where A is of order 10$\times$100, B is of order 100 $\times$ 5 and C is of order 5$\times$50. Now we can multiply ABC by grouping them in 2 different ways, (A(BC)) and ((AB)C). People always say try applying what you have studied. Common let us give it shot. Sometime back we learnt that cost of multiplying 2 matrices is pqr. Remember!!! Applying this to find the cost or number of scalar multiplications in (A(BC)) we get the result as follows:
1. Cost of multiplying BC is 100 $\times$ 5 $\times$ 50 = 25000
2. Next matrix A of order 10 $\times$ 100 is multiplied with product BC of order 100 $\times$ 50 and the cost turns out to be 10 $\times$ 100 $\times$ 50 = 50000
3. Total cost of multiplying (A (BC)) is : 25000+50000 = 75000
Following the same procedure the cost of multiplying ((AB)C) is:
10 $\times$ 100 $\times$ 5 + 10 $\times$ 5 $\times$ 50 = 7500.
Did you see!!! Same chain of matrices multiplied to obtain the same product, but the cost of multiplying (A (BC)) is 10 times higher than the cost of multiplying ((AB) C). Everyone who agreed previously that the order in which matrix is multiplied doesn’t matter, you were wrong. Because it does matter and you know the reasons too. [There are 2 ice-creams, same flavor, amount and everything, just that the chocolate layer in one comes before the vanilla layer in the other. Both the ice-creams cost different. Which one do you choose? P.S: Imagine both the combo gives the same yummy taste. ]

Chain-Matrix multiplication problem is nothing but given a chain <$A_1$,$A_2$,…$A_n$> of n matrices, where for i=1,2,…n , $A_i$ has dimensions $p_{i-1}$ $\times$ $p_i$ fully parenthesize the product $A_1$$A_2$…$A_n$ such that it minimizes the number of scalar multiplications. So now we have to find the optimal way of parenthesizing the matrices so that number of scalar multiplications and hence the cost is minimum. The time spent on finding the optimal parenthesizations is well paid for the time spent later on actual multiplication. How do we find out the optimal parenthesization? The first idea that strikes our mind is to find all different ways of grouping or parenthesizing the chain of matrices and finding the cost in each case. Then choose the grouping which has the minimum cost. That sounds right and it is right. So let us find out the number of ways of grouping the chain of matrices. Let n be the matrix count. If P(n) represents the number of ways of grouping then

P(1) = 1 [Because there is only one matrix and there is only one way to parenthesize it]

Now consider ABC. You can parenthesize them as (A(BC)) wherein you are actually parenthesizing 2 sub products A and BC or can parenthesize as ((AB)C) wherein the sub products are AB and C. You can notice that we are trying to split it into 2 sub products. Let us consider this split position as ‘k’ where for $A_{i}$...$A_{j}$ i $\leq$ k < j. For n=3, split can happen at position 1 and 2.Therefore k =1, 2 for n=3.After splitting into sub products, each sub product will again be a chain of matrices and again they need to be parenthesized and the process goes on recursively. To generalize, recurrence relation is: P (1) = 1 for n=1 P (n) = $\sum_{k=1}^{n-1}$ P(k)P(k+1) for n $\geq$ 2 The solution to the above relation turns out to be $\Omega$ ($2^n$).This means the number of ways of grouping is exponential in ‘n’ and if we apply the first idea that struck into our mind(hope you remember it… i. e Finding all possible grouping…), which is nothing but brute force method we won’t end up in an efficient algorithm. There must be some other way out. Let us look a little deeper into the grouping method. As mentioned before by parenthesizing we get 2 sub products which again need to be grouped and that grouping should also be optimal. That is nothing but while grouping we get sub problems and sub problems must be solved by taking it as another chain matrix multiplication problem. So can we say that optimal solution to the problem is obtained by finding optimal solution to the sub problem? You bet we can!!! Let us keep this as one of the observations. Ok going to the next observation. Given ABCD. If we apply the brute force method then we have to check out the cost of all possible grouping for ABCD. If at first we tried (AB)(CD).This includes cost of computing AB + CD. Next we try (((AB)C)D).This again includes finding cost of multiplying AB!!. Instead of recalculating it, we can minimize the cost if we just store the already computed value… [I think by now most of you know which method to apply to solve this problem $\smile$] .So as we can observe there are only few distinct sub problems. Let’s pen down the observations. Observation: 1. Optimal solution to the problem is obtained by finding optimal solution to the sub problems. 2. There are only few distinct sub problems From the above discussions and observations we can conclude that dynamic programming can be applied to solve the chain matrix multiplication problem…Store the calculated values, find the cost for different values of k. Choose the k that results in minimum cost…There you go. That is the main funda. For 1 $\leq$ i $\leq$ j $\leq$ n , Let m[i, j] denote the minimum number of scalar multiplications required to find the product $A_{i...j}$ [From now on product of $A_{i}$....$A_{j}$ is denoted as $A_{i...j}$] . If final optimal solution of $A_{i}$....$A_{j}$ involves splitting into $A_{i}$....$A_{k}$ and $A_{k+1}$…$A_{j}$, then total minimum cost equals sum of minimum cost of parenthesizing of $A_{i...k}$ and $A_{k+1...j}$ and the cost of multiplying $A_{i...k}$ and $A_{k+1...j}$. Cost of multiplying is $p_{i-1}$$p_k$$p_j$ where $A_i$ has dimensions $p_{i-1}$ $\times$ $p_i$. Based on this we can develop the following recurrence relation: m[i, j] = 0 $\Rightarrow$ if i=j [Since there is only one matrix no multiplication required] m[i, j] = $\underset{i <= k < j}{min}${m[i, k] + m[k+1 ,j] + $p_{i-1}$$p_k$$p_j$ } otherwise... For a particular i and j, there are j-i different values of k. Out of these ,one k gives the minimum cost which is stored in m[i ,j]. But then we lose the value of k where split or grouping should occur so that the cost is least of all. So let us take s[i, j] which stores the value of k for which m[i, j] is minimum. Using the knowledge we gained so far let us try building an efficient algorithm using dynamic programming that computes optimal cost in tabular form using bottom up fashion. [Try to be like Uncle scrooge when you are developing an algorithm. Try the one which costs the least $\smile$. Just as in ice cream example. ] We will require 2 tables, one for m [i, j] and another for s[i, j]. Also the table should be filled in such a manner that when an m [i, j] is calculated using recurrence relation m [i, k] and m [k+1, j] is already calculated. For both the cases, the corresponding length of the matrix-chain are less than j-i+1. Hence, the algorithm should fill the table in increasing order of the length of the matrix chain. As is always the case, how much ever someone blabbers nothing explains best as examples do...So consider the problem of multiplying the following matrices with their dimensions as follows: • $A_1$: 30 $\times$ 35 • $A_2$: 35 $\times$ 15 • $A_3$: 15 $\times$ 5 • $A_4$: 5 $\times$ 10 • $A_5$: 10 $\times$ 20 • $A_6$: 20 $\times$ 25 Based on what is said above, [please be sure that you understood the above points. If not, just go back and read the above para…Common don’t be lazy , Off u go… $\smile$] to find the optimal grouping of $A_1$$A_2$$A_3$$A_4$$A_5$$A_6$ we find m[i, j] and the corresponding s[i, j] in the following order : m[1,1], m[2,2], m[3,3] … m[6,6] m[1,2], m[2,3] … m[5,6] m[1,3],.......m[4,6] m[1,4],...m[3,6] m[1,5],m[2,6] m[1,6] Hence if we start filling the table in this order, m [i, i] will be 0.They all come up in one row. Next comes m[1,2]…. m[1,2] =m[1,1] + m[2,2] + $p_0$$p_{1}$$p_{2}$ $\hspace{0.2in}$=0+0+15750 $\Rightarrow$ [notice that m[1,1] and m[2,2] is taken from the table ] s[1,2]=1 $\Rightarrow$ [only choice] Similarly we find for all sub products of length 1.Next we go for sub products of length 2 ,thus filling the 3rd row of table m from bottom. Go on filling the table to get it in a final form as shown below.

The table stores the costs of computing for each sub product and also the corresponding split positions so there is no need for recalculation. What we needed is minimum cost of multiplying the given 6 matrices and it is nothing but
m[1,6]= 15125 with split at position s[1,6]=3.
So given $A_1$$A_2$$A_3$$A_4$$A_5$$A_6$ how do you parenthesize it? Check out the table s[i, j]
s[1,6]=3 $\Rightarrow$ ( $A_1$$A_2$$A_3$ ) ( $A_4$$A_5$$A_6$ )
s[1,3]=1,s[4,6]=5 $\Rightarrow$ ( $A_1$ ( $A_2$$A_3$ ) ) ( ( $A_4$$A_5$ ) $A_6$)
Done!!! That’s how you do the grouping. Wasn’t that simple enough!!!
Now we are left with only one job…. writing the algorithm. I think you guys can easily build it up using the concepts above and the algorithms. Anyways algorithm is given below
MATRIX-CHAIN-ORDER(n)
1.for i $\leftarrow$ 1 to n
2.$\hspace{0.2in}$do m[i, i ] $\leftarrow$ 0
3.for d $\leftarrow$ 2 to n [ d is the chain length]
4.$\hspace{0.2in}$do for i $\leftarrow$ 1 to n − d + 1
5.$\hspace{0.2in}$$\hspace{0.2in}$do j $\leftarrow$ i + d − 1
6.$\hspace{0.2in}$$\,$m[i, j ] $\leftarrow$ $\infty$
7.$\hspace{0.2in}$$\hspace{0.2in}$$\,$for k $\leftarrow$ i to j − 1
8.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$do q $\leftarrow$ m[i, k]+m[k + 1, j ]+$p_{i-1}$$p_k$$p_j$
9.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$$\,$ if q < m[i, j ] 10.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$$\,$then m[i,j] $\leftarrow$ q 11.$\hspace{0.2in}$$\hspace{0.2in}$$\hspace{0.2in}$$\,$$\,$s[i, j ] $\leftarrow$ k 12.return m and s Time complexity of this algorithm is O($n^3$) which is way better than the exponential time taken for brute method. Voila!!! Uncle Scrooge will be way to happy with this business!!

So here was a brief [OK, actually not brief… ] explanation about the chain matrix multiplication .Hope it was helpful. So remember, it is good to be like Uncle Scrooge while building an algorithm, Happy multiplying!!!! $\smile$

Shobitha Shetty

7 comments:

Shruthi NayakAugust 1, 2011 at 10:11 PM
A great start !! :) Nice that you have explained from the scratch..:) If a person reading this knows that something called 'Matrix' exists, then he is eligible to read and understand this :)
First look at the blog made me feel it is too lengthy..but as i went on reading, i didn't realise that i had already finished half the content.. :) There was a flow..beautifully woven :)
Ordering of the matrices matters when it comes to matrix multiplication --- Nicely proved :)
Is this an example of divide and conquer? If yes, include it in the title of your blog :)
Can make the pictures more interesting :)
Formatting and Enumeration can be done to make it more attractive :)
Sir was right in telling that you have good presentation skills ;)
The blog was interactive :)
Phrase of the blog : How much ever someone blabbers nothing explains as best as examples do :)
Shruti Vinayaka RanadeAugust 3, 2011 at 12:31 AM
matrix movie fans are sure to love this blog and i am a fan too :) :)A clear explanation of all basic concepts starting from what a matrix is.nice example of chocolate and vanilla ice-creams :P
"Try to be like uncle scrooge", funny :)
Quoting nayak,"First look at the blog made me feel it is too lengthy..but as i went on reading, i didn't realise that i had already finished half the content.. :) "...
as you have mentioned the examples did make a clear way for understanding
From exponential to n-cubed...is definitely this blogs high point :)
Nikitha ShenoyAugust 5, 2011 at 12:39 AM
A very nice start:)This blog refreshed the memory of the talk you had given.Nicely shown how the grouping matters.loved that icecream example given.
hadnt realized there was already one blog on dyanamic programming before reading this :)
One more picture towards the end and perhaps a little formatting would just make it perfect:)
and very nice explanation with all the examples.can see the amount of effort taken:)
"happy multiplying"-Neat! :)
shobithaAugust 8, 2011 at 1:34 AM
@nayak,shruti,niki: thnks for ur frank inputs..really appreciate it..vill update d blog accordingly..:)
@nayak:whether its an example of divide and conquer..its not because the solution is arrived at by taking optimal substructure and overlapping property..:) So it comes under dynamic programming..:)
Chetan.S.KumarAugust 9, 2011 at 6:39 AM
@shobitha

article was great.

i first saw this thing on the 'Cormen' book.
then i thought..haven't we had enough of matrices to last an entire lifetime...?

and then i promptly forgot about it.

this article,..lengthy though it was,..has revoked my interest in matrices. they are an amazing field.
thanks to your algo,..i am starting to read stuff about matrix again. finished pretty much everything in Cormen related to it.

your neo conversation was a good prelude..but a little inaccurate.

in other words, short words....this article rocks.

good work,
chetan.
shobithaAugust 9, 2011 at 8:05 PM
@chetan : Thnks a lot..:) i do agree its lengthy.. :) gud to hear about ur regaining interest in matrix.. :) keep goin .. :)

Wednesday, July 27, 2011

CHAIN MATRIX MULTIPLICATION

7 comments: