matrix multiplication

程序员文章站 2022-07-15 12:29:32

...

文章目录

Overview of Matrix Multiplication

For Matrix Multiplication A ( i , j ) ∗ B ( j , k ) = C ( i , k ) A(i,j) * B(j,k) = C(i,k) A(i,j)∗B(j,k)=C(i,k)：

( i , k ) = ( i , j 1 ) ( j 1 , k ) + ( i , j 2 ) ( j 2 , k ) + . . . + ( i , j n ) ( j n , k ) (i,k) = (i,j_1)(j_1,k) + (i,j_2)(j_2,k)+...+(i,j_n)(j_n,k) (i,k)=(i,j1)(j1,k)+(i,j2)(j2,k)+...+(i,jn)(jn,k)

The i i i row of the A matrix and the k k k column of the B matrix are multiplied and then summed. The rows and rows of A and the columns and columns of B do not affect each other and can be parallelized.

For non-parallel matrix multiplication, the complexity is O ( n 3 ) O(n^3) O(n3)

For matrix multiplication using map reduce，speed up can be:

S = 1 1 M + 1 N S = \frac{1}{\frac{1}{M} + \frac{1}{N}} S=M1+N11

How to implement Map & Reduce

Take the following matrix multiplication as an example:
1 2 3 4 × 5 6 7 8 \begin{matrix} 1&2\\ 3&4 \end{matrix} \times \begin{matrix} 5&6\\ 7&8 \end{matrix} 1324×5768

The basic idea is to use the coordinates of the final result (matrix C) as the key, so that the row of A and the column of B can be determined, and the product of the corresponding elements of the row and column can be further summed.

Specific method 1

The first method is to find each element of each row of matrix A and the corresponding element of each column of matrix B, combine them in pairs, and use the position of their final result as their key. In the reduce phase, according to this representation, all the values at the same position are multiplied and added.

Mapper：

For matrix A： k e y : ( i , k ) key: (i,k) key:(i,k) , v a l u e : ( A , j , A i j ) value: (A, j, A_{ij}) value:(A,j,Aij)

For matrix B： k e y : ( i , k ) key: (i,k) key:(i,k) , v a l u e : ( B , j , A j k ) value: (B, j, A_{jk}) value:(B,j,Ajk)

Reducer：

For all ( i , k ) = = > S u m m a t i o n ( A i j ∗ B j k ) (i,k) ==> Summation(A_{ij}*B_{jk}) (i,k)==>Summation(Aij∗Bjk) for all j j j

# k, i, j computes the number of times it occurs.
# Here all are 2, therefore when k=1, i can have 
# 2 values 1 & 2, each case can have 2 further
# values of j=1 and j=2. Substituting all values 
# in formula

k=1  i=1  j=1   ((1, 1), (A, 1, 1)) 
          j=2   ((1, 1), (A, 2, 2))   
     i=2  j=1   ((2, 1), (A, 1, 3))
          j=2   ((2, 1), (A, 2, 4)) 

k=2  i=1  j=1   ((1, 2), (A, 1, 1))
          j=2   ((1, 2), (A, 2, 2))   
     i=2  j=1   ((2, 2), (A, 1, 3))
          j=2   ((2, 2), (A, 2, 4))

Computing the mapper for Matrix B

i=1  j=1  k=1   ((1, 1), (B, 1, 5)) 
          k=2   ((1, 2), (B, 1, 6))   
     j=2  k=1   ((1, 1), (B, 2, 7))
          j=2   ((1, 2), (B, 2, 8)) 

i=2  j=1  k=1   ((2, 1), (B, 1, 5))
          k=2   ((2, 2), (B, 1, 6))   
     j=2  k=1   ((2, 1), (B, 2, 7))
          k=2   ((2, 2), (B, 2, 8))

Therefore computing the reducer:

# We can observe from Mapper computation 
# that 4 pairs are common (1, 1), (1, 2),
# (2, 1) and (2, 2)
# Make a list separate for Matrix A & 
# B with adjoining values taken from 
# Mapper step above:

(1, 1) =>Alist ={(A, 1, 1), (A, 2, 2)}
        Blist ={(B, 1, 5), (A, 2, 7)}
        Now Aij x Bjk: [(1*5) + (2*7)] =19 -------(i)

(1, 2) =>Alist ={(A, 1, 1), (A, 2, 2)}
        Blist ={(B, 1, 6), (A, 2, 8)}
        Now Aij x Bjk: [(1*6) + (2*8)] =22 -------(ii)

(2, 1) =>Alist ={(A, 1, 3), (A, 2, 4)}
        Blist ={(B, 1, 5), (A, 2, 7)}
        Now Aij x Bjk: [(3*5) + (4*7)] =43 -------(iii)

(2, 2) =>Alist ={(A, 1, 3), (A, 2, 4)}
        Blist ={(B, 1, 6), (A, 2, 8)}
        Now Aij x Bjk: [(3*6) + (4*8)] =50 -------(iv)

From (i), (ii), (iii) and (iv) we conclude that
((1, 1), 19)
((1, 2), 22)
((2, 1), 43)
((2, 2), 50)

Specific method 2

The basic data block size processed by the mapper of the second method is larger, which is a whole row of data. Then the search and operation of the corresponding elements are handed over to the reducer.

Mapper：

For matrix A： k e y : ( i , k ) key: (i,k) key:(i,k) , v a l u e : ( A , i , [ . . . . ] ) value: (A, i, [....]) value:(A,i,[....])

For matrix B： k e y : ( i , k ) key: (i,k) key:(i,k) , v a l u e : ( B , k , [ . . . . ] ) value: (B, k, [....]) value:(B,k,[....])

The difference from the above is that the key-value pairs sent by the mapper directly send an entire row of the A matrix or an entire column of the B matrix to the reducer.

Reducer:

Multiply the key value to the same corresponding row and corresponding column and then sum them.

Specific method 3

Divide the matrix into n blocks, each block contains n rows (n columns). Then perform operations between blocks.

At this time, the key is no longer the row number and column number represented by i , k i,k i,k, but the block number.

Java code

According to the data files I have, I chose method 2.

The format of the data file is as follows:

Matrix A（1024 * 1024）：
matrix multiplication
L is the identifier of matrix A, the number is the row number i i i, and the rest is the value of each element of the matrix, the length is 1024.

Matrix B（1024 * 1024）：
matrix multiplication

This data file is matrix B that has been transposed, which is convenient for data reading and manipulation. Therefore, each row here represents each column of matrix B.

R is the identifier of matrix B, and the number at the second position is the column number j j j,

below are the main part of the code：

// <mapper>
public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text>{

    private Logger logger = Logger.getLogger(Mapper.class);

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        Configuration conf = context.getConfiguration();
        int m = Integer.parseInt(conf.get("m"));
        int p = Integer.parseInt(conf.get("p"));
        String line = value.toString();
        String[] indicesAndValue = line.split(" ", 3);
        Text outputKey = new Text();
        Text outputValue = new Text();
        
        for (int i = 0; i < indicesAndValue.length; ++i) {
            System.out.println(indicesAndValue[i]);
        }
        if (indicesAndValue[0].equals("L")) {
            try {
                for (int k = 1; k <= p; k++) {  // 这个范围内的 k 都要用到这个 i 所在的行
                    outputKey.set(indicesAndValue[1] + "," + k); // (i, k)
                    String[] values = Arrays.copyOfRange(indicesAndValue, 2, indicesAndValue.length);
                    outputValue.set("L," + indicesAndValue[1] + "," + Arrays.toString(values)); // (L, lineno, [....])
                    context.write(outputKey, outputValue);
                }
            } catch (Exception e) {
                logger.error(e.getMessage());
                logger.error("--------Left--------");
                logger.error(indicesAndValue);
            }
        }
        else {
            try {
                for (int i = 1; i <= m; i++) {
                    outputKey.set(i + "," + indicesAndValue[1]); // (i, k)
                    String[] values = Arrays.copyOfRange(indicesAndValue, 2, indicesAndValue.length);
                    outputValue.set("R," + indicesAndValue[1] + "," + Arrays.toString(values));  // (R, lineno, [....])
                    context.write(outputKey, outputValue);
                }
            } catch (Exception e) {
                logger.error(e.getMessage());
                logger.error("--------Right--------");
                logger.error(indicesAndValue);
            }
        }
    }
}

// <reducer>
public class Reducer extends org.apache.hadoop.mapreduce.Reducer<Text, Text, Text, Text>{
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        String[] value;
        HashMap<String, String> hashA = new HashMap<String, String>();
        HashMap<String, String> hashB = new HashMap<String, String>();
        for (Text val : values) {
            value = val.toString().split(",");
            if (value[0].equals("L")) { // hashA: (i,k),[]
                hashA.put(key.toString(), value[2]);
            }
            else {                      // hashB: (i,k),[]
                hashB.put(key.toString(),value[2]);
            }
        }
        int result = 0;
        if (hashA.containsKey(key.toString()) && hashB.containsKey(key.toString())) {
            String Row = hashA.get(key.toString());
            String Column = hashB.get(key.toString());
            String[] row = Row.replaceAll("\\[", "").replaceAll("\\]", "").split(" ");
            String[] column = Column.replaceAll("\\[", "").replaceAll("\\]", "").split(" ");

            for (int i = 0; i < row.length; ++i) {
                byte[] bytes = parseHexBinary(row[i]);
                byte[] bytes1 = parseHexBinary(column[i]);
                ByteBuffer wrapped = ByteBuffer.wrap(bytes); // big-endian by default
                ByteBuffer wrapped1 = ByteBuffer.wrap(bytes1);
                int r = wrapped.getInt();
                int c = wrapped1.getInt();

                result += r * c;
            }
        }
        if (result != 0) {
            System.out.printf("%d\n", result);
            // key 应该是 (i, k)
            context.write(null, new Text(key.toString() + "," + Float.toString(result)));
        }
    }
}

github address

Result analysis

The following results are obtained by running on a distributed cluster.

When the number of mappers is 3 and the number of reducers is 1, the experimental results are as follows:

The time consumed by the CPU is 1426690ms = 142.669s

Since the number of reducers is set to 1, there is no parallel operation in the reduce phase.

The total time consumed by Map is: 251578ms = 251.578s

The total time consumed by the Reducer is: 920940ms = 920.94s

Since we assign the multiplication and addition operations of the matrix to the reducer, the mapper is mainly responsible for the correspondence between the rows and columns of the matrix, so the reducer takes more time.

matrix multiplication

When the number of mappers is 3 and the number of reducers is 8, the experimental results are as follows:

matrix multiplication

The total time consumed by Map is: 262805 = 262.805s

The total time consumed by the Reducer is: 1653154ms = 1653.154s

Increasing the number of reducers did not increase the execution speed. The reason may be that when the amount of data is small, one reducer can complete all the work. The addition of reducers increases the time consumption of collaborative work between different reducers and synchronization of all data blocks.

Data file preprocessing

The original data file uses the hexadecimal ASCII code, so it needs to converted to an integer

Hex and bytes and Int conversion

How to read ASCII data similar to hexadecimal format in data file?

Python：

Val = int(struct.unpack("I", binascii.a2b_hex(Matrix[i][k][2:]))[0])

By converting hex to binary, and then using the unpack method to interpret the binary data.

Java：

byte[] bytes = parseHexBinary(row[i]);
ByteBuffer wrapped = ByteBuffer.wrap(bytes); // big-endian by default 
int r = wrapped.getInt();

Read the byte stream, and then use ByteBuffer to convert to an integer.

How to modify the number of mapper and reducer

REF

Note that it is not recommended to actively modify the number of mappers.

Number of mappers set to run are completely dependent on

File Size
Block Size

The MapReduce framework in Hadoop allows us only to

suggest the number of Map tasks for a job
demand that it provide n reducers.

Reference

http://dblab.xmu.edu.cn/blog/820-2/

https://www.geeksforgeeks.org/matrix-multiplication-with-1-mapreduce-step/

https://github.com/shask9/Matrix-Multiplication-Hadoop

https://www.jianshu.com/p/c6c543e4975e

https://docs.python.org/zh-cn/3/library/binascii.html

https://docs.python.org/zh-cn/3/library/struct.html

https://www.journaldev.com/776/string-to-array-java

https://*.com/questions/6885441/setting-the-number-of-map-tasks-and-reduce-tasks

https://*.com/questions/28861620/hadoop-map-reduce-total-time-spent-by-all-maps-in-occupied-slots-vs-total-time

matrix multiplication

文章目录

Overview of Matrix Multiplication

How to implement Map & Reduce

Specific method 1

Specific method 2

Specific method 3

Java code

Result analysis

Data file preprocessing

Hex and bytes and Int conversion

How to modify the number of mapper and reducer

Reference

Android中Matrix用法实例分析

python的dataframe和matrix的互换方法

如何使用Matrix对bitmap的旋转与镜像水平垂直翻转

Matrix AI (MAN)人工智能生态平台赋能智能医疗

Android中利用matrix 控制图片的旋转、缩放、移动

Android变形(Transform)之Matrix用法

IE矩阵Matrix滤镜旋转与缩放及如何结合transform

numpy matrix和array的乘和加实例

android.graphics.Matrix类用法分析

C#使用Matrix执行缩放的方法