Contents:
Introduction
Notation
Example
Summary
Indirect addressing is a fundamental operation in many numerical and scientific algorithms. Instead of accessing array elements with loop indices directly, indirect addressing uses the element of one array (sometimes called an index table or indirection table) as indices for another array. These index tables can store either static information (such as the neighbors of points in an unstructured mesh), or dynamic information (such as the sorting order of the elements in a vector).
This tutorial shows how to perform indirect addressing in POOMA, and discusses some of the subtleties and complexities that arise when indirection and multithreading are combined.
Suppose that the array X contains the following values:
a | b | c | d |
while the array J contains:
3 | 0 | 1 | 2 |
A program could re-order the elements of X while copying them into another array Y using the following loop:
for (int i=0; i<4; ++i) { Y = X(J(i)); }
The effect of this would be to fill Y with the following values:
d | a | b | c |
POOMA allows this operation to be expressed more economically, simply by using the array J as a subscript on the array X directly:
Y = X(J);
Indirect addressing on the source side of an assignment is sometimes called "pull" addressing, since the index array's values are being used to "pull" values from the source into the destination. POOMA also supports "push" addressing, in which the index array is used on the destination side of the assignment. The syntax for this is simply:
Y(J) = X;
which is equivalent to the loop:
for (int i=0; i<4; ++i) { Y(J(i)) = X; }
This operation would fill Y with:
b | c | d | a |
So long as J is a strict permutation of the indices 0...N-1, this will have the same effect as the loop shown above. The effects of this statement if J has repeated or missing elements are discussed in the next section.
One-dimensional arrays of integers can be used as subscripts to one-dimensional arrays of any other type, but how are multi-dimensional arrays to be subscripted? In POOMA, the answer is to use arrays whose elements are of type Loc. An Array<Loc<2>>, for example, can be used to re-order the elements of a two-dimensional array, since each element of the index array can act as a coordinate in the data array. Similarly, an Array<Loc<3>> can be used to subscript a 3-dimensional array of any type. Future releases of POOMA will support higher-dimensional indirect addressing as well.
Indirect addressing is a very powerful tool, but must be used carefully. The most important consideration is that the order of data movement during indirection is not defined. If indirection is performed using an index table that sends many values to a single location, for example, then there is no way to predict which of those values will be written into that location.
However, indirect addressing can always be used to read values safely, and to thereby perform a scatter operation. Suppose a source array S contains the values [3.14, 2.71], while an index array IA contains the values [0,1,0,0,1,1]. The expression S(IA) yields:
[3.14, 2.71, 3.14, 3.14, 2.71, 2.71]
and can always be used safely on the right-hand side of an expression. This works because the domain of a(b) is the domain of b. In expressions like a(b) = c, the domains of c and b have to match, but the domain of a can be arbitrary.
The example for this tutorial is a 1-dimensional Fast Fourier Transform (FFT) that shuffles data using indirect addressing. This FFT implementation is not efficient---it recalculates trigonometric constants repeatedly, for example, rather than pre-calculating and storing them---but it does illustrate the power of indirection.
The source for this example is included in the release in the file examples/Indirection/FFT/FFT.cpp. The main body of this program initializes POOMA, creates and fills an array of complex values, transforms it, and prints the result of that transform, as shown below:
138 int main(int argc, char* argv[]) 139 { 140 Pooma::initialize(argc, argv); 141 142 int size = 16; 143 144 Array<1, complex<double>, Brick> a(size); 145 146 int i; 147 for (i = 0; i < size; ++i) 148 { 149 a(i) = sin(4*i*Pi/size); 150 } 151 152 std::cout << a << std::endl; 153 154 fft(a); 155 156 std::cout << a << std::endl; 157 158 Pooma::finalize(); 159 return 0; 160 }
The key statement is on line 154, where the fft() function is invoked. The program contains two overloaded versions of this function. The first version, shown below, determines the level of the FFT (i.e. the number of recursive steps the calculation requires), then invokes the second version, which actually performs the calculations. (Note that in this simple example, the input array's length is required to be a power of two. Also, Pi is defined as a static const double equal to the value of pi computed from the expression acos(-1.0) because some compilers do not define mathematical constants such as M_PI in the <math.h> header file.)
117 void fft(const Array<1, complex<double>, Brick> &array) 118 { 119 int size = array.domain().size(); 120 121 // Determine size as power of 2 122 int level = -1; 123 while (size > 1) 124 { 125 PAssert(!(size & 1)); 126 ++level; 127 size /= 2; 128 } 129 130 if (level >= 0) 131 fft(array, level); 132 }
The second version of fft() does the real number-crunching. If the computation has reached its final stage, odd and even elements are combined directly (lines 106-111). If the computation is still recurring, the elements are shuffled, a half-sized transform is applied on each subsection, and the results are combined (lines 100-102). All of these operations use indirect addressing to move data values around. Most of the rest of the program can be viewed as infrastructure needed to make this data movement simple and efficient.
083 void fft(const Array<1,complex<double>,Brick> &array, int level) 084 { 085 Interval<1> domain = array.domain(); 086 087 if (level > 0) 088 { 089 ConstArray<1, int, IndexFunction<LeftMap> > left(domain); 090 ConstArray<1, int, IndexFunction<RightMap> > right(domain); 091 ConstArray<1, int, IndexFunction<ShuffleMap> > shuffle(domain); 092 ConstArray<1, complex<double>, IndexFunction<TrigFactor> > trig(domain); 093 094 left.engine().setFunctor(LeftMap(level)); 095 right.engine().setFunctor(RightMap(level)); 096 shuffle.engine().setFunctor(ShuffleMap(level)); 097 trig.engine().setFunctor(TrigFactor(level)); 098 099 // Shuffle values, compute n/2 length ffts, combine results. 100 array = array(shuffle); 101 fft(array, level-1); 102 array = array(left) + trig*array(right); 103 } 104 else 105 { 106 int size = domain.size(); 107 Range<1> left (0, size-2, 2), 108 right(1, size-1, 2); 109 110 array(left) += array(right); 111 array(right) = array(left) - 2.0 * array(right); 112 } 113 }
The shuffling step on line 100 uses an indirection array called shuffle to pull values into the right positions. This array, which is declared on line 91, is a ConstArray of integers. Instead of storing the values, however, the array calculates them on the fly using an IndexFunction engine, which is bound to the array on line 96. The IndexFunction engine works as one would expect: having been specialized with a user-defined class with an overloaded operator(), the engine transforms an index i into some new value by calling that operator(). In this case, the specializing class is ShuffleMap, which is shown below:
003 struct ShuffleMap 004 { 005 ShuffleMap(int n = 0) 006 : degree_m(n) 007 { 008 nbit_m = 1 << n; 009 mask1_m = nbit_m - 1; 010 mask2_m = ~(nbit_m | mask1_m); 011 } 012 013 int operator()(int i) const 014 { 015 return 016 (mask2_m & i) 017 | ( (mask1_m & i) << 1 ) 018 | ( (nbit_m & i) ? 1 : 0 ); 019 } 020 021 int nbit_m, mask1_m, mask2_m, degree_m; 022 };
Similar engines are used to select and combine elements of the arrays after the sub-FFTs have been performed. These use the overloaded operator() in the classes LeftMap and RightMap, shown below:
029 struct LeftMap 030 { 031 LeftMap(int n = 0) 032 : nbit_m(~(1 << n)) 033 { } 034 035 int operator()(int i) const 036 { 037 return (nbit_m & i); 038 } 039 040 int nbit_m; 041 }; 042 043 struct RightMap 044 { 045 RightMap(int n = 0) 046 : nbit_m(1 << n) 047 { } 048 049 int operator()(int i) const 050 { 051 return (nbit_m | i); 052 } 053 054 int nbit_m; 055 };
Finally, an IndexFunction engine is also used to calculate the trigonometric weights used in combining. This IndexFunction is an extreme example of trading time for space: it does not store anything, but repeatedly recalculates factors on demand.
065 struct TrigFactor 066 { 067 TrigFactor(int n = 0) 068 : n_m(1 << n) 069 { } 070 071 complex<double> operator()(int i) const 072 { 073 return complex<double>(cos(Pi*i/n_m), sin(Pi*i/n_m)); 074 } 075 076 int n_m; 077 };
Efficient support for indirect addressing---the use of the values
in one array to change the way another array's elements are
accessed---is one of the features that characterizes full-strength
numerical libraries. This release of POOMA supports indirect
addressing in both "push" and "pull" modes using conventional
data-parallel syntax, without compromising the performance of regular
index operations.
[Prev] | [Home] | [Next] |