How to deal with complex combinatorics? #153

DraTeots · 2020-03-09T10:25:03Z

DraTeots
Mar 9, 2020

Dear Jim, David, Peter and other awkward developers.

Let me cite an example from the awkward array presentation:

For events with at least three leptons (electrons or muons) and a same-flavor opposite-sign lepton pair, find the same-flavor opposite-sign lepton pair with a mass closest to 91.2 GeV;

I can't figure out an effective way of doing it with uproot, awkward arrays and pandas. How one does it without PartiQL?

In sequential event-by-event world is kind of easy. Somewhere I found an example with numba accelerated for loops with event by event processing. It is slow (even with numba) and I believe there might be more effective way. Maybe not as elegant as with PartiQL but still? There are cross and choose, but they kind of effective on that 2 muons example and hard to scale.

Let me put simplified version of question here:

Imagine one has Pythia output (or any other generator) with particles parameters as sub arrays for each event. So we have some awkward arrays with something like pdg, px, py, pz, E.

How one selects one particle with some cuts, other particle with other cuts and builds an effective mass of combinations?

jpivarski · 2020-03-09T11:29:21Z

jpivarski
Mar 9, 2020
Maintainer

This is a rather open-ended question. I'll start by ruling out Pandas, since it simply want designed for this sort of question and that's why we're developing Awkward. Also, uproot is just for reading the data.

Are you asking about solving Benchmark 8 with only array-at-a-time primitives? (That's the problem you quoted.) Or about applying cuts and combinatorics in general? The latter is a broad subject in need of a tutorial, and the former was held up as an example of why array-at-a-time operations shouldn't be the only way to work. Even if someone finds a clever solution or we add more primitives to make that problem easier, busy physicists shouldn't be forced to solve puzzles or find the magic primitive when they have real work to do. PartiQL is conceptually event-at-a-time (though may be implemented as array-at-a-time: see AwkwardQL), and Numba is imperative.

How recently have you tried operating on Awkward Arrays in Numba, and with what version? A week or two ago, I rewrote the Numba lowering of Awkward 1.0 to be much more lightweight. I haven't had the chance to do any performance tests on it, but in principle, it should be many times faster. (I noticed that the old version was scaling with the size of the data structure, and that's because it was pass-by-value. I've since reworked it as a cursor object that navigates over the structure but copies nothing, reading only on demand.)

0 replies

jpivarski · 2020-03-10T21:25:08Z

jpivarski
Mar 10, 2020
Maintainer

This question may just be asking for better documentation, which is definitely in our plans. Should it remain open now?

0 replies

DraTeots · 2020-03-11T03:50:15Z

DraTeots
Mar 11, 2020
Author

Jim, thank you! Yes, this question is basically about a better documentation. And please, close the ticket if this is convenient for you.
Could I also ask about this "Benchmark 8" and where it is located? (I searched scikit-hep). Or for now maybe you have your older/newer presentations with some examples?
To clarify what we are trying to achieve.

I'm one of the core software developers for Electron Ion Collider (EIC) at Jefferson Lab.

We created a framework which consist of C++ chunks (such as geant4, track reconstruction, etc) glued together with python. So one can easily configure and run everything in python and jupyter lab (with widgets GUI and stuff like that). The data exchange (and output) between those C++ parts is made with flattened root files. One of our requirements for that files actually is that it should be possible to process them with uproot as convenient as possible.

For many years there where just a small number of people, doing analysis and work for upcoming EIC (mainly in Jefferson Lab and BNL). But recently it has been changed. There was an announcement about the CD0 for EIC, along with the beginning of Yellow Report studies initiated by EIC user group, where a lot of new users from universities and other labs will try to make their analysis on EIC physics and detectors. Result: ~150 new users are trying to tame our software for their first time.

So it is actually we (our team) - who desperately need to fill the gap in the documentation and examples. There are a lot of analysis that can be done directly from those flattened root files. And we stuck with a number of questions like: "Can we analyze that in Jupyer? How we analyse this?". And as I said before, I can't find a good answer to some questions. So any new information, talks, examples which could satisfy our users could be very appreciated.

Would you or maybe other developers be interested in giving a talk for Electron Ion Collider User Group?

I also talked with one of the EIC UG conveners and JLab management today and there is full green light to any activities and collaboration to advocate for new tools for Nuclear Physics community at JLab and EIC which also could be discussed.

If you be interested, my email is romanov@jlab.org

0 replies

jpivarski · 2020-03-11T12:07:36Z

jpivarski
Mar 11, 2020
Maintainer

One of the IRIS-HEP projects was to develop a suite of typical analysis problems to test the expressiveness of new domain-specific languages. Awkward isn't exactly a language (at best, you could call it an embedded DSL), but the benchmarks applied. Most of them were quite easy, but number 8 was especially hard. Here's the full list:

https://github.com/iris-hep/adl-benchmarks-index

(That site also has solutions in various languages.)

My take-away from that is that the array-at-a-time interface is generally useful but not one-size-fits-all. There have to be multiple ways to solve a problem using the same data structures with as little switching overhead (i.e. hoops for the user to jump through) as possible. PartiQL was one idea of how this can be done, and @lgray is interested enough to develop that into a real product, AwkwardQL. Numba is another. They're complementary: AwkwardQL is declarative and Numba is imperative.

I'd be very interested in presenting this to the EIC user group. I started giving tutorial-style talks at the LPC when Uproot was first introduced, which I think has helped a lot because it sets the context for where to look for solutions, and a community of people in close proximity who can lead each other in the right direction. Documentation is still necessary, but documentation without context is limited.

I'll follow-up by email, but the most significant variable is "when?" Awkward 1 isn't complete, but it's very close and there are some early adopters tinkering with it. The combinatorics functions haven't been written, but I'd estimate that it would take a few days to write them, rounding up to a week. So now is an interesting time—would a tutorial be entirely about the new version, it would it pragmatically include the old version?

For reference, the most recent tutorial I gave on columnar analysis is this one:

https://github.com/jpivarski/2019-07-29-dpf-python

which covers NumPy (on its own), Uproot, and the original Awkward.

0 replies

audiya · 2020-10-04T06:23:47Z

audiya
Oct 4, 2020

hi, im a root newbie, im interested in you topic on how one selects one particle with some cuts, other particle with other cuts and builds an effective mass of combinations?

are there any tutorials available for this topic? i found one tutorial which uses rdataframe, but im rather instrested if theres a tutorial for this using standard root
thanks!

0 replies

jpivarski · 2020-10-04T14:05:42Z

jpivarski
Oct 4, 2020
Maintainer

Perhaps this:

https://github.com/jpivarski-talks/2020-04-08-eic-jlab/blob/master/2020-04-08-eic-jlab-EVALUATED.ipynb

Or this:

https://github.com/jpivarski-talks/2020-06-08-uproot-awkward-columnar-hats/blob/master/evaluated/02-columnar-analysis-awkward-array.ipynb

Or this:

https://github.com/jpivarski-talks/2020-07-13-pyhep2020-tutorial/blob/master/tutorial-EVALUATED.ipynb

0 replies

DraTeots · 2020-10-04T19:47:15Z

DraTeots
Oct 4, 2020
Author

To help a bit. The video of the first tutorial is located here:
https://www.youtube.com/watch?v=FoxNS6nlbD0

The video of PyHEP tutorial is here:
https://www.youtube.com/watch?v=ea-zYLQBS4U&feature=youtu.be

And the full time table of PyHEP (with videos) is here:
https://indico.cern.ch/event/882824/timetable/#20200713.detailed

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with complex combinatorics? #153

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to deal with complex combinatorics? #153

DraTeots Mar 9, 2020

Replies: 7 comments

jpivarski Mar 9, 2020 Maintainer

jpivarski Mar 10, 2020 Maintainer

DraTeots Mar 11, 2020 Author

jpivarski Mar 11, 2020 Maintainer

audiya Oct 4, 2020

jpivarski Oct 4, 2020 Maintainer

DraTeots Oct 4, 2020 Author

DraTeots
Mar 9, 2020

jpivarski
Mar 9, 2020
Maintainer

jpivarski
Mar 10, 2020
Maintainer

DraTeots
Mar 11, 2020
Author

jpivarski
Mar 11, 2020
Maintainer

audiya
Oct 4, 2020

jpivarski
Oct 4, 2020
Maintainer

DraTeots
Oct 4, 2020
Author