ak.argcartesian and performance of record arrays vs non-record arrays in ak.cartesian #575

Superharz · 2020-12-07T14:17:14Z

Superharz
Dec 7, 2020

Let a and b be some jagged arrays of same outer (axis = 0) lengths.
It is possible to do:

index = ak.argcartesian({"x": a, "y": b}, axis = 1, nested = False)
x       = a[index['x']]
y       = b[index['y']]

However, it would also be nice to have this working with nested = True
At the moment, this results in:

ValueError: too many jagged slice dimensions for array

Superharz · 2020-12-07T14:38:25Z

Superharz
Dec 7, 2020
Author

This is motivated by the idea of having some a1, a2 and a3 that are identical in shape, only different in values. And having some b1, b2 and b3 that are also identical in shape (but different from the a ones), only different in values.
I need:

a1_b1 = ak.cartesian({"x": a1, "y": b1}, axis = 1, nested = True)
x1    = a1_b1['x']
y1    = a1_b1['y']
a2_b2 = ak.cartesian({"x": a2, "y": b2}, axis = 1, nested = True)
x2    = a2_b2['x']
y2    = a2_b2['y']
a3_b3 = ak.cartesian({"x": a3, "y": b3}, axis = 1, nested = True)
x3    = a3_b3['x']
y3    = a3_b3['y']

That works but seems slow and I hope to be faster by doing:

index = ak.argcartesian({"x": a1, "y": b1}, axis = 1, nested = True)
x1 = a1[index['x']]
y1 = b1[index['y']]
x2 = a2[index['x']]
y2 = b2[index['y']]
x3 = a3[index['x']]
y3 = b3[index['y']]

Which doesn't work (for me).

0 replies

jpivarski · 2020-12-07T14:38:52Z

jpivarski
Dec 7, 2020
Maintainer

This doesn't work for nested=True because of what nested=True means:

>>> a = ak.Array([[1, 2, 3], [], [4, 5]])
>>> b = ak.Array([["a", "b"], ["c"], ["d", "e"]])
>>> index1 = ak.argcartesian({"x": a, "y": b}, nested=False)
>>> index2 = ak.argcartesian({"x": a, "y": b}, nested=True)
>>> index1.tolist()
[[{'x': 0, 'y': 0},
  {'x': 0, 'y': 1},
  {'x': 1, 'y': 0},
  {'x': 1, 'y': 1},
  {'x': 2, 'y': 0},
  {'x': 2, 'y': 1}],
 [],
 [{'x': 0, 'y': 0},
  {'x': 0, 'y': 1},
  {'x': 1, 'y': 0},
  {'x': 1, 'y': 1}]]
>>> index2.tolist()
[[[{'x': 0, 'y': 0}, {'x': 0, 'y': 1}],
  [{'x': 1, 'y': 0}, {'x': 1, 'y': 1}],
  [{'x': 2, 'y': 0}, {'x': 2, 'y': 1}]],
 [],
 [[{'x': 0, 'y': 0}, {'x': 0, 'y': 1}],
  [{'x': 1, 'y': 0}, {'x': 1, 'y': 1}]]]
>>> index1["x"].tolist()
[[0, 0, 1, 1, 2, 2], [], [0, 0, 1, 1]]
>>> index2["x"].tolist()
[[[0, 0], [1, 1], [2, 2]], [], [[0, 0], [1, 1]]]

index2["x"] doesn't have the same "shape" (in an extended-beyond-NumPy sense) as a. We could explicitly broadcast a to get the same shape:

>>> ak.broadcast_arrays(a, index2["x"])[0].tolist()
[[[1, 1], [2, 2], [3, 3]], [], [[4, 4], [5, 5]]]
>>> ak.broadcast_arrays(a, index2["x"])[0].tolist()
[[[1, 1], [2, 2], [3, 3]], [], [[4, 4], [5, 5]]]
>>> index2["x"].tolist()
[[[0, 0], [1, 1], [2, 2]], [], [[0, 0], [1, 1]]]

but it still doesn't work:

>>> ak.broadcast_arrays(a, index2["x"])[0][index2["x"]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/awkward/highlevel.py", line 946, in __getitem__
    return ak._util.wrap(self._layout[where], self._behavior)
ValueError: in ListArray64 attempting to get 2, index out of range

because, for example, the 2 in [2, 2] is beyond all the indexes in the corresponding [3, 3] in the broadcasted a (there's only 0 and 1). The indexes refer to positions in the dimension one level higher: we've nested them more deeply than they were intended by broadcasting those arrays.

0 replies

jpivarski · 2020-12-07T14:44:33Z

jpivarski
Dec 7, 2020
Maintainer

An easier way to deal with this conceptually, with possibly better performance as well, is to ak.zip all your "a" arrays into one package and all your "b" arrays into another, then do a single ak.cartesian on both groups.

>>> many_a = ak.zip({"a1": a, "a2": a, "a3": a})
>>> many_b = ak.zip({"b1": b, "b2": b, "b3": b})
>>> ak.cartesian({"x": many_a, "y": many_b}).tolist()
[[{'x': {'a1': 1, 'a2': 1, 'a3': 1}, 'y': {'b1': 'a', 'b2': 'a', 'b3': 'a'}},
  {'x': {'a1': 1, 'a2': 1, 'a3': 1}, 'y': {'b1': 'b', 'b2': 'b', 'b3': 'b'}},
  {'x': {'a1': 2, 'a2': 2, 'a3': 2}, 'y': {'b1': 'a', 'b2': 'a', 'b3': 'a'}},
  {'x': {'a1': 2, 'a2': 2, 'a3': 2}, 'y': {'b1': 'b', 'b2': 'b', 'b3': 'b'}},
  {'x': {'a1': 3, 'a2': 3, 'a3': 3}, 'y': {'b1': 'a', 'b2': 'a', 'b3': 'a'}},
  {'x': {'a1': 3, 'a2': 3, 'a3': 3}, 'y': {'b1': 'b', 'b2': 'b', 'b3': 'b'}}],
 [],
 [{'x': {'a1': 4, 'a2': 4, 'a3': 4}, 'y': {'b1': 'd', 'b2': 'd', 'b3': 'd'}},
  {'x': {'a1': 4, 'a2': 4, 'a3': 4}, 'y': {'b1': 'e', 'b2': 'e', 'b3': 'e'}},
  {'x': {'a1': 5, 'a2': 5, 'a3': 5}, 'y': {'b1': 'd', 'b2': 'd', 'b3': 'd'}},
  {'x': {'a1': 5, 'a2': 5, 'a3': 5}, 'y': {'b1': 'e', 'b2': 'e', 'b3': 'e'}}]]

Beyond that, Numba is generally faster than these array-at-a-time functions, though the array-at-a-time functions are generally more convenient. The second thing to try would be Numba.

0 replies

Superharz · 2020-12-07T14:45:40Z

Superharz
Dec 7, 2020
Author

Thank you for the fast response.
Could this be related to #555 ?

0 replies

jpivarski · 2020-12-07T14:55:41Z

jpivarski
Dec 7, 2020
Maintainer

#555 is about adding a way to create new arrays-of-lists. It's something one can already do from the low level (directly manipulating layouts), but it would provide a high-level way to do it, motivated by JaggedArray.fromcounts in Awkward 0.x. I don't personally see the connection to this.

I've relabeled this issue as a question, because I think it was about technique, not new functionality. Zipping all of your arrays would let you do a single ak.cartesian, which might solve your speed problem. If this is a promising avenue, please close the issue. Thanks!

0 replies

Superharz · 2020-12-08T00:01:18Z

Superharz
Dec 8, 2020
Author

Okay, thank you for the clarification.
Your idea of zipping the 3 arrays works like a charm and is about 5 times faster.
However, a new speed problem arrives:

eta_cross = ak.cartesian({"x": eta, "y": eta}, axis = 1, nested = True)
phi_cross = ak.cartesian({"x": phi, "y": phi}, axis = 1, nested = True)
et_cross  = ak.cartesian({"x": et , "y": et }, axis = 1, nested = True)

The above takes 15s

eta_diff  = eta_cross['y'] - eta_cross['x']
phi_diff  = phi_cross['y'] - phi_cross['x']

The above takes 1s
Now the new approach:

eta_phi_et = ak.zip({"eta": eta, "phi": phi, "et": et})
eta_phi_et_cross = ak.cartesian({"x": eta_phi_et, "y": eta_phi_et}, axis = 1, nested = True)

The above takes 3s

eta_phi_et_x, eta_phi_et_y = ak.unzip(eta_phi_et_cross)
eta_x, phi_x, _  = ak.unzip(eta_phi_et_x)
eta_y, phi_y, et = ak.unzip(eta_phi_et_y)

The above takes almost no time.

eta_diff  = eta_y - eta_x
phi_diff  = phi_y - phi_x

The above suddenly takes 11s

The new approach is faster in crossing but for some reason way slower in calculating the difference, which doesn't make sense to me as eta_y and eta_x should be identical to eta_cross['y'] and eta_cross['x']

0 replies

jpivarski · 2020-12-08T00:42:32Z

jpivarski
Dec 8, 2020
Maintainer

The trade-off is due to a difference in technique. When computing the Cartesian product (as well as several other operations) of records, we don't descend through every field of the record, computing the Cartesian product of each field. We create an IndexedArray of those records. I'll demonstrate.

For the Caresian product of two record arrays,

>>> one = ak.Array([[{"x": 1}, {"x": 2}, {"x": 3}], [], [{"x": 4}, {"x": 5}]])
>>> two = ak.Array([[{"y": 1}, {"y": 2}], [{"y": 3}], [{"y": 4}, {"y": 5}]])
>>> ak.cartesian([one, two]).tolist()
[[({'x': 1}, {'y': 1}),
  ({'x': 1}, {'y': 2}),
  ({'x': 2}, {'y': 1}),
  ({'x': 2}, {'y': 2}),
  ({'x': 3}, {'y': 1}),
  ({'x': 3}, {'y': 2})],
 [],
 [({'x': 4}, {'y': 4}),
  ({'x': 4}, {'y': 5}),
  ({'x': 5}, {'y': 4}),
  ({'x': 5}, {'y': 5})]]

The result is a ListOffsetArray of RecordArray of IndexedArrays, with the unduplicated data inside the IndexedArrays.

>>> ak.cartesian([one, two]).layout
<ListOffsetArray64>
    <offsets><Index64 i="[0 6 6 10]" offset="0" length="4" at="0x562c068de5d0"/></offsets>
    <content><RecordArray>
        <field index="0">
            <IndexedArray64>
                <index><Index64 i="[0 0 1 1 2 2 3 3 4 4]" offset="0" length="10" at="0x562c06903df0"/></index>
                <content><RecordArray>
                    <field index="0" key="x">
                        <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x562c069f74e0"/>
                    </field>
                </RecordArray></content>
            </IndexedArray64>
        </field>
        <field index="1">
            <IndexedArray64>
                <index><Index64 i="[0 1 0 1 0 1 3 4 3 4]" offset="0" length="10" at="0x562c068e3ee0"/></index>
                <content><RecordArray>
                    <field index="0" key="y">
                        <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x562c06a3e660"/>
                    </field>
                </RecordArray></content>
            </IndexedArray64>
        </field>
    </RecordArray></content>
</ListOffsetArray64>

The index of the IndexedArray has the duplication for the Cartesian product (0 0 1 1 2 2 3 3 4 4 and 0 1 0 1 0 1 3 4 3 4)—essentially what you were using the argcartesian for earlier—and the data within that are the original 1 2 3 4 5. Whenever you access a field of the new record, it goes through this IndexedArray indirection.

By contrast, if you did a Cartesian product of non-record arrays:

>>> ak.cartesian([ak.Array([[1, 2, 3], [], [4, 5]]), ak.Array([[1, 2], [3], [4, 5]])]).layout
<ListOffsetArray64>
    <offsets><Index64 i="[0 6 6 10]" offset="0" length="4" at="0x562c06a43ef0"/></offsets>
    <content><RecordArray>
        <field index="0">
            <NumpyArray format="l" shape="10" data="1 1 2 2 3 3 4 4 5 5" at="0x562c0688a7d0"/>
        </field>
        <field index="1">
            <NumpyArray format="l" shape="10" data="1 2 1 2 1 2 4 5 4 5" at="0x562c068e3ee0"/>
        </field>
    </RecordArray></content>
</ListOffsetArray64>

there are no IndexedArrays and the innermost NumpyArray data are duplicated.

Adding this IndexedArray layer to most operations with RecordArrays was a performance upgrade this summer: issue #204, PR #261. However, as you point out, it is a tradeoff. If you have records with few fields and frequently access the results (in most or all fields), then you'd rather do ak.cartesian separately and then ak.zip (for no IndexedArray). If you have records with many fields and access the results infrequently or only access a small subset of the results, then you'd rather ak.zip and then ak.cartesian together (to get an IndexedArray).

We were motivated to make the change because (1) many physics records are wide, and a given analysis typically doesn't use all of those fields and (2) it's also fairly common for records to contain VirtualArrays for lazy-reading of data. The IndexedArray indirection not only delays duplication, it also prevents unaccessed fields from being eagerly read. So we were seeing cases in which dozens of fields were "Cartesianed" and then ignored, as well as cases in which fields were unnecessarily read from disk, "Cartesianed," and ignored.

What you're seeing in your examples is just illustrating that you have to pay the price eventually: either up front or upon access. Since you will be using all of the fields of your record, there's not a strong advantage to one case or the other. We could imagine complicating the interface, adding compute-once caches to the IndexedArrays, but there's diminishing returns in that: the array-at-a-time interface is a balance of fast and convenient (at least, faster than Python loops and more convenient than writing a function in Numba). If you really need speed for a given application, you should probably turn to Numba. That avoids the whole intermediate arrays that duplicate the data: a traditional for loop will win on a CPU. (It's not as clear on a GPU, but the GPU implementations are far from ready.) This is the same reason Numba usually beats NumPy.

But this isn't a concession: the ability to mix Awkward Array operations with the occasional Numba-accelerated loop was part of the plan. It enables "non-premature optimization": you try it in Awkward operations because they're the most convenient, then replace just the hot-spots with Numba, leaving everything else intact.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ak.argcartesian and performance of record arrays vs non-record arrays in ak.cartesian #575

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

ak.argcartesian and performance of record arrays vs non-record arrays in ak.cartesian #575

Superharz Dec 7, 2020

Replies: 7 comments

Superharz Dec 7, 2020 Author

jpivarski Dec 7, 2020 Maintainer

jpivarski Dec 7, 2020 Maintainer

Superharz Dec 7, 2020 Author

jpivarski Dec 7, 2020 Maintainer

Superharz Dec 8, 2020 Author

jpivarski Dec 8, 2020 Maintainer

Superharz
Dec 7, 2020

Superharz
Dec 7, 2020
Author

jpivarski
Dec 7, 2020
Maintainer

jpivarski
Dec 7, 2020
Maintainer

Superharz
Dec 7, 2020
Author

jpivarski
Dec 7, 2020
Maintainer

Superharz
Dec 8, 2020
Author

jpivarski
Dec 8, 2020
Maintainer