-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposition | New form of Zip: MultiZip #1077
Comments
Btw, hwo come you chose to use "Equi" word in Zip? Why not "Equal" or some such? |
This is already available as the operator |
Not really the same thing, though in theory you could use MultiZip to transpose (even fully transpose with default placements) by yielding source array, its not its primary goal. MultiZip zips variable number of streams into one stream |
That was a very, very long time ago. See the following issues for historical context & discussions on IIRC, I took inspiration from a similar name being used/proposed in Python. While Python eventually took another route and added it as an option to
For background and discussions, see:
And MoreLINQ/MoreLinq/Transpose.cs Line 64 in 088df95
and then yields an array of elements gathered from each before proceeding with the next: MoreLINQ/MoreLinq/Transpose.cs Line 94 in 088df95
It does this until all iterators are exhausted: MoreLINQ/MoreLinq/Transpose.cs Lines 90 to 91 in 088df95
The one difference is that a zip-like method will generally stop at the shortest sequence. I find that it helps to discuss with code examples. Do you have a case of
This doesn't compose well and can be a source of surprises and bugs so it's best to design this differently. Have a look at PR #856, which added a distinct version of |
Well, the transpose does not have a
Well, one of the examples where we used this was loan transaction processing where it was fed account payment streams grouped by month/year and produced a stream of monthly analitics. It had about 30000 input streams and produced one (as I sad its a from of a reduce pattern). Hence the reusable result array (but we can make it default true if that is a concern, although unless you are projecting the result array into TResult itself it does not really matter if its reused, the mutliZip itself does not project those rows like Transpose does). This is a heavy duty cruncher by design where variable number of sources would be generated dynamically. We mostly used it in Longest mode. This is not really for a small number of fixed manually generated streams. Btw, Strict does sound better, imho. |
Technically, |
Suppose the following: var matrix = new[]
{
new[] { 10, 11 },
new[] { 20 },
new[] { 30, 31, 32 },
new[] { 40, 41, 42, 43, 44, 45, 46 },
};
Now, admittedly, both of these require you to know one of the two dimensions, either in terms of expected columns or rows.
Good to hear about real use cases and thanks for sharing. I'm not surprised to be honest. I've dealt with gigabytes of data processing in constant memory with (More)LINQ so can appreciate the scale and dynamic nature of sources you might be dealing with.
If you're having to work with large arrays along the likes of 30,000 elements, then they'll easily end up on the LOH. They won't be collected with younger collections and the LOH isn't compacted by default (although this can be controlled in a limited way now), which can lead to fragmentation. Reusing the array certainly makes sense if If we decide to add this, it seems to me that this will have a design similar to The
Yeah, naming is hard. People used to get confused by |
@atifaziz, as instructed
I used this in financial analytics. Its a form of a single Reduce step of Map/Reduce pattern. it implements Zip but on a variable number of sources, with the caveat that they are all of the same type.
Current values of all source streams are packed in an array and send to the caller suppled
resultSelector
reducer which will process them intoTResult
.It takes several parameters:
sourceList is the list of streams. resultSelector is the reducer, missingSourceAction and operatingMode define how the cycling will be done (see below). Ultimately I've taken the naming scheme you used in the three separate Zip* methods with variable-type-fixed-streams you already have. Defaults to removing missing sources and doing the Shortest (exit on first exhaust) mode.
immutableResultSource
controls allocation by reusing the results arrays, which can make a difference in a scenario with large number of sources. For most uses it can be reused, if caller wants to store arrays somewhere, they can flip this to true.resultSelector
is also supplied with a bool-map of which elements were sourced from streams and which are default(TSource) for exhausted streams in Longest mode of operation.Explanation of enums:
The version I have does not support async, but it can be made. Also, several overrides can be made to facilitate different forms of result selector.
Ok, let me know. I can post this relatively quickly, I already have the code, I just need to write some tests.
The text was updated successfully, but these errors were encountered: