MPI 与并行计算（四）：数据类型

# 1. 预定义数据类型

MPI 支持异构计算(Heterogeneous Computing)，它指在不同计算机系统上运行程序，每台计算可能有不同生产厂商，不同操作系统。 MPI 通过提供预定义数据类型来解决异构计算中的互操作性问题，建立它与具体语言的对应关系。

MPI 中预定义的数据类型如下：

MPI 数据类型(C 语言绑定)	C 语言数据类型
MPI_CHAR	char
MPI_SHORT	short
MPI_INT	int
MPI_LONG	long
MPI_UNSIGNED_CHAR	unsigned char
MPI_UNSIGNED_SHORT	unsigned short
MPI_UNSIGNED	unsigned
MPI_UNSIGNED_LONG	unsigned long
MPI_FLOAT	float
MPI_DOUBLE	double
MPI_LONG_DOUBLE	long double
MPI_BYTE	无
MPI_PACKED	无

但是，对于点对点通信，仅仅使用包含一系列相同基本数据类型的缓冲区是不够的。我们经常要传递包含不同数据类型值的信息（例如一个整数变量 count，然后是一串实数）；并且我们经常要发送不连续的数据（例如，矩阵的一个子块）。

OpenMPI 为发送非连续数据提供 pack/unpack 函数。用户在发送数据前要明确地将数据打包到连续的缓冲区中，并在接收数据后将其从连续的缓冲区中解包。虽然使用这些函数可以实现非连续数据的发送，但是这种方式不够灵活，而且效率低下。不过为了与以前的库或代码兼容，下面提供了这两个函数的使用方法。

1
int MPI_Pack(const void* inbuf, int incount, MPI_Datatype datatype, void *outbuf, int outsize, int *position, MPI_Comm comm)

inbuf：输入缓冲区的起始地址
incount：输入缓冲区中数据的个数
datatype：输入缓冲区中数据的类型
outbuf：输出缓冲区的起始地址
outsize：输出缓冲区的大小
position：输出缓冲区中的位置
comm：通信域

1
int MPI_Unpack(const void* inbuf, int insize, int *position, void *outbuf, int outcount, MPI_Datatype datatype, MPI_Comm comm)

inbuf：输入缓冲区的起始地址
insize：输入缓冲区的大小
position：输入缓冲区中的位置
outbuf：输出缓冲区的起始地址
outcount：输出缓冲区中数据的个数
datatype：输出缓冲区中数据的类型
comm：通信域

注释
示例 1：Pack/Unpack

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"

#define MASTER 0
#define STRLEN 25

int main(int argc, char* argv[])
{
  int rank;
  int size;
  int position;

  char message[BUFSIZ];

  float  value;          //VALUE TO SEND
  char   name[STRLEN];  //ASSIGNED NAME
  int    param;   //ADDITIONAL PARAM

  MPI_Init( &argc, &argv );
  MPI_Comm_size( MPI_COMM_WORLD, &size );
  MPI_Comm_rank( MPI_COMM_WORLD, &rank );

  if (rank == MASTER) {

    value = 10;
    sprintf(name, "My Name");
    param = 1;

    position = 0;
    /* now let's pack all those values into a single message */
    MPI_Pack(&value, 1, MPI_FLOAT, message, BUFSIZ,
	     &position, MPI_COMM_WORLD);
    /* position has been incremented to first free byte in the message.. */
    MPI_Pack(name, STRLEN, MPI_CHAR, message, BUFSIZ,
	     &position, MPI_COMM_WORLD);
    /* position has been incremented again.. */
    MPI_Pack(&param, 1, MPI_INT, message, BUFSIZ,
	     &position, MPI_COMM_WORLD);

    MPI_Send(message, BUFSIZ, MPI_PACKED, 1, 1, MPI_COMM_WORLD);
  }
  else {

    MPI_Recv(message, BUFSIZ, MPI_PACKED, 0, 1, MPI_COMM_WORLD, NULL);

    position = 0;
    MPI_Unpack(message, BUFSIZ, &position, &value, 1,
	       MPI_FLOAT, MPI_COMM_WORLD);
    /* Note that we must know the length of string to expect here!  */
    MPI_Unpack(message, BUFSIZ, &position, name, STRLEN,
	       MPI_CHAR, MPI_COMM_WORLD);
    MPI_Unpack(message, BUFSIZ, &position, &param, 1,
	       MPI_INT, MPI_COMM_WORLD);

    printf("rank %d:\t%d %.1f %s\n", rank, param, value, name);
  }

  MPI_Finalize();

  return EXIT_SUCCESS;
}

# 2. 派生数据类型

MPI 提供了全面而强大的 构造函数(Constructor Function) 来定义派生数据类型。派生数据类型是一种抽象的数据结构，可以用来描述数据的组织形式，而不是数据本身。
派生数据类型可以用类型图来描述，这是一种通用的类型描述方法，它是一系列二元组<基类型，偏移>的集合，可以表示成如下格式：

1
<基类型 1，偏移 1>，<基类型 2，偏移 2>，...，<基类型 n，偏移 n>

在派生数据类型中，基类型可以是任何 MPI 预定义数据类型，也可以是其它的派生数据类型，即支持数据类型的嵌套定义。
如图，阴影部分是基类型所占用的空间，其它空间可以是特意留下的，也可以是为了方便数据对齐。
基类型指出了该类型图中包括哪些基本的数据类型，而偏移则指出该基类型在整个类型图中的起始位置，基类型可以是预定义类型或派生类型，偏移可正可负，没有递增或递减的顺序要求，而一个类型图中包括的所有基类型的集合称为某类型的类型表，表示为：

1
类型表={基类型 1，基类型 2，...，基类型 n}

将类型图和一个数据缓冲区的基地址结合起来可以说明一个通信缓冲区内的数据分布情况
预定义数据类型是通用数据类型的特例，比如 MPI_INT 是一个预先定义好了的数据类型句柄，其类型图为{(int, 0)}，有一个基类型入口项 int 和偏移 0，其它的基本数据类型与此相似，数据类型的跨度被定义为该数据类型的类型图中从第一个基类型到最后一个基类型间的距离
即如果某一个类型的类型图为:

1
typemap={(type0,disp0),...,(typen-1,dispn-1)},

则该类型图的下界定义为：

1
lb(typemap)=min{dispj}, j=0,...,n-1

该类型图的上界定义为：

1
ub(typemap)=max{dispj+sizeof(typej)}, j=0,...,n-1

该类型图的跨度定义为：

1
extent(typemap)=ub(typemap)-lb(typemap) + e

由于不同的类型有不同的对齐位置的要求 e(extent)就是能够使类型图的跨度满足该类型的类型表中的所有的类型都能达到下一个对齐要求所需要的最小非负整数值
假设type={(double, 0), (char, 8)}，进一步假设 double 型的值必须严格分配到地址为 8 的倍数的存储空间，则该数据类型的 extent 是 16（(从 9 循环到下一个 8 的倍数)，一个由一个字符后面紧跟一个双精度值的数据类型,其 extent 也是 16
在 MPI 中，派生数据类型的构造函数有如下几种：

函数名	含义
MPI_Type_contiguous	定义由相同数据类型的元素组成的类型
MPI_Type_vector	定义由成块的元素组成的类型，块之间具有相同间隔
MPI_Type_indexed	定义由成块的元素组成的类型，块长度和偏移由参数指定
MPI_Type_struct	定义由不同数据类型的元素组成的类型
MPI_Type_commit	提交一个派生数据类型
MPI_Type_free	释放一个派生数据类型

（1）最简单的数据类型构造函数是 MPI_Type_contiguous ，它允许将数据类型复制到连续位置。

1
int MPI_Type_contiguous(int count, MPI_Datatype oldtype,MPI_Datatype *newtype)

count: 重复的次数
oldtype: 基本数据类型
newtype: 派生数据类型

注释
示例 2： MPI_Type_contiguous 的使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include "mpi.h"
#include <stdio.h>

int main(int argc, char *argv[])
{
    int myrank;
    MPI_Status status;
    MPI_Datatype type;
    int buffer[100];

    MPI_Init(&argc, &argv);

    MPI_Type_contiguous( 100, MPI_CHAR, &type );
    MPI_Type_commit(&type);

    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

    if (myrank == 0)
    {
        MPI_Send(buffer, 1, type, 1, 123, MPI_COMM_WORLD);
    }
    else if (myrank == 1)
    {
        MPI_Recv(buffer, 1, type, 0, 123, MPI_COMM_WORLD, &status);
    }

    MPI_Finalize();
    return 0;
}

（2）函数 MPI_Type_vector 是一个更通用的构造函数，它允许将数据类型复制到由等间距块组成的位置。每个块都是通过连接相同数量的旧数据类型副本来获得的。块之间的间距是旧数据类型范围的倍数。

1
int MPI_Type_vector(int count, int blocklength, int stride,MPI_Datatype oldtype, MPI_Datatype *newtype)

count: 重复的次数
blocklength: 每个块中的元素数
stride: 旧数据类型的跨度
oldtype: 基本数据类型
newtype: 派生数据类型

注释
示例 3： MPI_Type_vector 的使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define SIZE 4

/*Sendind each colum to a processor*/
int main (int argc, char *argv[])
{
    int numtasks, rank, source=0, dest, tag=1, i;
    float a[SIZE][SIZE] =
    {1.0, 2.0, 3.0, 4.0,
    5.0, 6.0, 7.0, 8.0,
    9.0, 10.0, 11.0, 12.0,
    13.0, 14.0, 15.0, 16.0};
    float b[SIZE];

    MPI_Status stat;
    MPI_Datatype columntype;

    MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

    MPI_Type_vector(SIZE/*num of element in a column*/,
                     1 /*one element for row*/,
                    SIZE /*take an element each 4*/, MPI_FLOAT, &columntype);
    MPI_Type_commit(&columntype);

    if (numtasks == SIZE) {
    if (rank == 0) {
        for (i=0; i<numtasks; i++)
        MPI_Send(&a[0][i], 1, columntype, i, tag, MPI_COMM_WORLD);
            }

    MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
    printf("rank= %d  b= %3.1f %3.1f %3.1f %3.1f\n",
            rank,b[0],b[1],b[2],b[3]);
    }
    else
    printf("Must specify %d processors. Terminating.\n",SIZE);

    MPI_Type_free(&columntype);
    MPI_Finalize();
}

（3）函数 MPI_Type_index 允许将旧数据类型复制到一系列块中(每个块是旧数据类型的串联)，其中每个块可以包含不同数量的副本，并且具有不同的位移。所有块位移都是旧类型范围的倍数。

1
int MPI_Type_indexed(int count, const int array_of_blocklengths[],const int array_of_displacements[], MPI_Datatype oldtype,MPI_Datatype *newtype)

count: 重复的次数
array_of_blocklengths: 每个块中的元素数
array_of_displacements: 每个块的偏移量
oldtype: 基本数据类型
newtype: 派生数据类型

注释
示例 4： MPI_Type_indexed 的使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
 #include "mpi.h"
   #include <stdio.h>
   #define NELEMENTS 6

   main(int argc, char *argv[])  {
   int numtasks, rank, source=0, dest, tag=1, i;
   int blocklengths[2], displacements[2];
   float a[16] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
   float b[NELEMENTS];

   MPI_Status stat;
   MPI_Datatype indextype;   // required variable

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

   blocklengths[0] = 4; /*take 4 elements from the array*/
   blocklengths[1] = 2; /*take 2 elemnets from the array*/
   displacements[0] = 5;/*start from the element index 5 the first block that is 6.0 */
   displacements[1] = 12;/*start from the element index 12  the first block that is 13.0 */

   // create indexed derived data type
   MPI_Type_indexed(2, blocklengths, displacements, MPI_FLOAT, &indextype);
   MPI_Type_commit(&indextype);

   if (rank == 0) {
     for (i=0; i<numtasks; i++)
      // task 0 sends one element of indextype to all tasks
        MPI_Send(a, 1, indextype, i, tag, MPI_COMM_WORLD);
     }

   // all tasks receive indextype data from task 0
   MPI_Recv(b, NELEMENTS, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
   printf("rank= %d  b= %3.1f %3.1f %3.1f %3.1f %3.1f %3.1f\n",
          rank,b[0],b[1],b[2],b[3],b[4],b[5]);

   // free datatype when done using it
   MPI_Type_free(&indextype);
   MPI_Finalize();
   }

（4）MPI_Type_create_struct 是最通用的类型构造函数。允许程序员定义由组件数据类型的完全定义的映射形成的新数据类型。

1
int MPI_Type_create_struct(int count, const int array_of_blocklengths[],const MPI_Aint array_of_displacements[],const MPI_Datatype array_of_types[], MPI_Datatype *newtype

count: 重复的次数
array_of_blocklengths: 每个块中的元素数
array_of_displacements: 每个块的偏移量
array_of_types: 每个块的数据类型
newtype: 派生数据类型

注释
示例 5：MPI_Type_create_struct 的使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include "mpi.h"
#include <stdio.h>
#define NELEM 25

main(int argc, char *argv[])
{
    int numtasks, rank, source = 0, dest, tag = 1, i;

    typedef struct
    {
        float x, y, z;
        float velocity;
        int n, type;
    } Particle;
    Particle p[NELEM], particles[NELEM];
    MPI_Datatype particletype, oldtypes[2];  // required variables
    int blockcounts[2];

    // MPI_Aint type used to be consistent with syntax of
    // MPI_Type_extent routine
    MPI_Aint offsets[2], lb, extent;

    MPI_Status stat;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

    // setup description of the 4 MPI_FLOAT fields x, y, z, velocity
    offsets[0]     = 0;
    oldtypes[0]    = MPI_FLOAT;
    blockcounts[0] = 4;

    // setup description of the 2 MPI_INT fields n, type
    // need to first figure offset by getting size of MPI_FLOAT
    MPI_Type_get_extent(MPI_FLOAT, &lb, &extent);
    offsets[1]     = 4 * extent;
    oldtypes[1]    = MPI_INT;
    blockcounts[1] = 2;

    // define structured type and commit it
    MPI_Type_create_struct(2, blockcounts, offsets, oldtypes, &particletype);
    MPI_Type_commit(&particletype);

    // task 0 initializes the particle array and then sends it to each task
    if (rank == 0)
    {
        for (i = 0; i < NELEM; i++)
        {
            particles[i].x        = i * 1.0;
            particles[i].y        = i * -1.0;
            particles[i].z        = i * 1.0;
            particles[i].velocity = 0.25;
            particles[i].n        = i;
            particles[i].type     = i % 2;
        }
        for (i = 0; i < numtasks; i++)
            MPI_Send(particles, NELEM, particletype, i, tag, MPI_COMM_WORLD);
    }

    // all tasks receive particletype data
    MPI_Recv(p, NELEM, particletype, source, tag, MPI_COMM_WORLD, &stat);

    printf("rank= %d   %3.2f %3.2f %3.2f %3.2f %d %d\n", rank, p[3].x, p[3].y, p[3].z,
           p[3].velocity, p[3].n, p[3].type);

    // free datatype when done using it
    MPI_Type_free(&particletype);
    MPI_Finalize();
}

在这里，偏移量有一个问题。手动计算偏移量可能比较麻烦。虽然这种情况越来越少，但有些类型的大小会因系统/操作系统而异，因此硬编码可能会带来麻烦。一种更简洁的方法是使用标准库中的 offsetof 宏（在 C 语言中必须包含 stddef.h，在 C++ 语言中必须包含 cstddef）。它会返回一个 size_t（可隐式转换为 MPI_Aint），与该属性的偏移量相对应。于是可以将偏移量表定义为：

1
MPI_Aint displacements[2] = {offsetof(Particle, x), offsetof(Particle, n)};