The map/reduce pattern combines a map and reduce operation in a single pattern. In a map/reduce operation, firstly one or more input data sets are mapped by applying a transformation operation. Then, the results are combined by means of a reduction operation.
The interface to the map/reduce pattern is provided by function
grppi::map_reduce()
. As all functions in GrPPI, this function takes as its
first argument an execution policy.
grppi::map_reduce(exec, other_arguments...);
Note: A map/reduce could be also expressed by the composition of a map and a reduce. However, map/reduce may fuse both stages, allowing for extra optimizations.
There are several variants:
- Unary map/reduce: A map/reduce taking a single input sequence.
- N-ary map/reduce: A map/reduce taking multiple input sequences that are combined during the map stage.
There are two central elements of a map/reduce: the Transformer used for the map stage, and the Combiner used for the reduce stage.
A Transformer may be a UnaryTransformer or a MultiTransformer.
A UnaryTransformer is any C++ callable entity that takes a data item and
transforms it. The input type and the output type may differ. Thus, a unary
transformer op
is any operation that, given an input x
of type T
and
output type U
, makes valid the following:
U res = op(x);
A MultiTransformer is any C++ callable entity that takes data items, one of
each input sequence, and transforms them into an output value. The input types
and the output type may differ. Thus, a multi-transformer op
is any operation
that, given the inputs x1, x2, ... , xN
of types T1, T2, ... , TN
and an output
type U
, makes valid the following:
U res = op(x1, x2, ..., xN);
A Combiner is any C++ callable entity, that is able to combine two values
into a single value. A Combiner cmb
is any operation taking two values x
and
y
of types T
and U
and returning a combined value of type T
, making valid the
following:
T x;
U y;
T res = cmb(x,y);
An unary map/reduce takes a single data set and performs consecutively the map and the reduce stages, returning the reduced value.
There are two interfaces for the unary map/reduce:
- A range based interface.
- An iterator based interface.
The range based interface specifies sequences as ranges.
A range is any type satisfying the grppi::range_concept
.
In particular, any STL sequence is a range.
Thus, the only sequence provided to grppi::map_reduce
is:
- The input data set is specified by a range.
Example: Transforms a sequence of strings to its corresponding double values and computes the addition of those values.
vector<string> v { "1.0", "2.0", "3.5", "0.25" };
auto res = grppi::map_reduce(exec,
v,
0.0,
[](string s) { return stod(s); },
[](double x, double y) { return x+y; }
);
// res == 6.75
The iterator based interface specifies sequences in terms of iterators (following the C++ standard library conventions):
- The input data set is specified by two iterators (or by an iterator and a size).
A unary map/reduce also requires an identity value for the Combiner.
Example: Transforms a sequence of strings to its corresponding double values and computes the addition of those values.
vector<string> v { "1.0", "2.0", "3.5", "0.25" };
auto res = grppi::map_reduce(exec,
begin(v), end(v),
0.0,
[](string s) { return stod(s); },
[](double x, double y) { return x+y; }
);
// res == 6.75
auto res = grppi::map_reduce(exec,
begin(v), v.size(),
0.0,
[](string s) { return stod(s); },
[](double x, double y) { return x+y; }
);
// res == 6.75
A n-ary map/reduce takes multiple data sets and performs consecutively the map and reduce stages, returning the reduced value.
There are two interfaces for the unary map/reduce:
- A range based interface.
- An iterator based interface.
The range based interface specifies sequences as ranges.
A range is any type satisfying the grppi::range_concept
.
In particular, any STL sequence is a range.
Thus, the sequences provided to grppi::map_reduce
are:
- Each input data set is specified by a range.
Example: Compute scalar product between two vectors of doubles.
v = get_first_vector();
w = get_second_vector();
std::vector<double> r(v.size());
auto res = grppi::map_reduce(exec,
v, w, r,
0.0,
[](double x, double y) { return x*y; },
[](double x, double y) { return x+y; },
);
The iterator based interface specifies sequences in terms of iterators (following the C++ standard library conventions):
- The input data sets are specified by a tuple of iterators to the start of each sequence
- Additionally, the size of those sequences is specified by either an iterator to the end of the first sequence or by an integral value.
- The output data set is specified by an iterator to the start of the output sequence.
Example: Compute scalar product between two vectors of doubles.
v = get_first_vector();
w = get_second_vector();
std::vector<double> r(v.size());
auto res = grppi::map_reduce(exec,
std::make_tuple(begin(v), begin(w)), end(v),
begin(w),
0.0,
[](double x, double y) { return x*y; },
[](double x, double y) { return x+y; },
);
Example: Count word appearances in text lines.
vector<string> words = get_words();
auto result = grppi::map_reduce(ex,
words, map<string,int>{},
[](string word) -> map<string,int> { return {{word, 1}}; },
[](map<string,int> & lhs, const map<string,int> & rhs) -> map<string,int> & {
for (auto & w : rhs) {
lhs[w.first]+= w.second;
}
return lhs;
}
);