Wednesday 27 January 2016

Basic Question of Dyanmic Programming : Matrix Chain Multiplication

Matrix Multiplication

A quick dropback on what matrix multiplication is (although you should know it already :P ) :






M11 = r11× t11  +  r12× t21  +   r13×t31
M12 = r21× t11  +  r22× t21   +  r23×t31



  So basically given two matrices of p x q and r x s. Then they can only be multiplied iff q=r.  The dimension of resultant matrix C is p x s.  

The time to compute the matrix C is O(p*q*r) because we multiply q elements of each row of A with q elements of each column of B. This takes q time. One row of A is multiplied with each column of B this takes q*s time. As the number of rows in A is p, q*s will takes place p times. Hence the total complexity is p*q*s.


MATRIX CHAIN MULTIPLICATION :


What the hell is this "chain"?


lets say there are n matrices. The dimension of ith matrix is p(i-1) x p(i) (except for the first one). Its something like this :

1st :  p0 x p1
2nd : p1 x p2
3rd : p2 x p3
4th : p3 x p4
                                                                        .
                                                                        .
                                                                        .

You get the basic idea.

So observe that you can multiply any ith matrix with (i+1)th matrix because the row of ith is same as column of (i+1)th.
Matrix Chain multiplication is multiplying these matrices to give the final matrix of dimension p1 x pn.

M1 x M2 x M3 x M4... 

As only two matrices can be multiplied at one time, there can be many combinations in which these can be multiplied. Like if we have 4 matrices, there can be these combinations :

( M1( M2( M3M4 ) ) )
( ( M1( M2M3) ) M4 )
( M1 ( (M2M3) M4) )
( (M1M2) (M3M4) ) 

The cost of computation of each of these combinations will be different and obviously we want the minimum cost.

The number of combinations can be computed as such :

P(n)    = 1                                              , if n=1
        = summation of ( P(k)P(n-k) )      k=[1,n-1]     ,if n != 1

If we compute all the combinations of Parentheses, the complexity would be appox. O(2^n) which would take billions of years to run for large values of n.


So Here's where Dynamic Programming kicks in like a boss!


The 2 conditions of DP is Optimal substructures and Overlapping subproblems.
Now keeping this in mind, we approach using DP.

0. To Calculate Ai...j, optimally we divide it into 2 subproblems and recursively find the solution.

1. Consider a range from i to j, say A i...j. Now if we break it into 2 matrices and then multiply, i.e. first compute A i...k and A k+1...j. Then multiply them to get A i...j.

Ai..j = (A i..k) * (A k+1...j)

2. For some value of k between i and j-1, cost of one of the Ai...k * A k+1...j will be minimum. To find that min, we calculate the value for each k from i to j-1.


3. Now finding Ai..k is a subproblem and can similarly be found by recursion as we found    A i...j, by breaking into parts Ai..p and Ap+1...k and finding the minimum of the possibilities.

Recursify it !

 
So if we say m[i..j] stores the minimum of A i..j then we can say :

m[i..j] = m[i..k] + m[k+1..j] + p (i-1)*p (k)*p(j)

Store all the k's for each pair of i..j in s[i..j].

So simple! isn't it? :P

 
 
 Simple code of the recursive function :

 void func(int l[], int i, int j ) {
        
              if(i==j) return 0;    // Multiplying a matrix with itself
              int mn = INF;                   

              for ( int c = i+1 , c < j, c++ ) {
                     mn = min ( func (l,i,c)   +                              func (l,c+1 , j) +                               p [i-1] * p [c] * p [j] ,
                                mn ) ;
              }  
             return mn;
       }


BUT same subproblems are being computed over and over again! As can be seen in the diagram above. Computing same over and over again increases complexity. We can cache the subproblems when we solve them to use again later with recomputing or use bottom up DP like this :

int MatrixChainOrder(int p[], int n)
{
 
    /* For simplicity of the program, one extra row and one extra column are
       allocated in m[][].  0th row and 0th column of m[][] are not used */
    int m[n][n];
 
    int i, j, k, L, q;
 
    /* m[i,j] = Minimum number of scalar multiplications needed to compute
       the matrix A[i]A[i+1]...A[j] = A[i..j] where dimention of A[i] is
       p[i-1] x p[i] */
 
    // cost is zero when multiplying one matrix.
    for (i = 1; i < n; i++)
        m[i][i] = 0;
 
    // L is chain length. 
    for (L=2; L<n; L++)  
    {
        for (i=1; i<=n-L+1; i++)
        {
            j = i+L-1;
            m[i][j] = INT_MAX;
            for (k=i; k<=j-1; k++)
            {
                // q = cost/scalar multiplications
                q = m[i][k] + m[k+1][j] + p[i-1]*p[k]*p[j];
                if (q < m[i][j])
                    m[i][j] = q;
            }
        }
    }
 
    return m[1][n-1];
}

* GeeksForGeeks * ( Too lazy to write again :P )



No comments:

Post a Comment