MaxBy and MinBy equivalent #895

glrusso · 2024-04-17T13:14:08Z

glrusso
Apr 17, 2024

Hello,
as pointed out here, the Maximum and Minimum operators are limited to value types, @RolandPheasant mentioned that such operations are achievable by other means and the issue was closed.

I'd like to ask what do you think the best way to implement the operators would be.

In the aggregation snippet it is suggested to implement custom operators as follows:

source.Connect().ToCollection().Select(items=>...);

This is simple but has the downside of iterating over the collection of items each time there's any change. While the existing operators look at single changes and only iterate over the collection when needed.

What would be the best approach according to you?
Is it possible to evaluate the operator only when an observable fires? E.g. the property of interest changes

Many thanks

Answered by JakenVeina

Apr 24, 2024

Couldn't we treat the Refresh as Remove and then Add?

I suppose you're right, in that the Refresh case can be refined a little further. We can know whether or not the item was previously the max, and when that happens, the only way to determine what the new max should be is to re-aggregate the whole set. However, if the object wasn't the previous max, we only need to check if it exceeds the previous max. So...

case ChangeReason.Refresh:
    itemsByKey[change.Key] = change.Current;
    switch (comparer.Compare(change.Current, max))
    {
        case 0:
            RecalculateMax();
            break;
        case int comparison when comparison > 0:
            max = change.Current;
    …

View full answer

JakenVeina · 2024-04-18T03:47:16Z

JakenVeina
Apr 18, 2024
Maintainer

This is simple but has the downside of iterating over the collection of items each time there's any change. While the existing operators look at single changes and only iterate over the collection when needed.

While true that this is not optimal, it's also true that calculating a min or max often requires iterating the entire collection anyway, even when you know what the single-item change was. E.G. when an item is removed that matches the previous max. So, depending on your scenario, you may not see any meaningful difference between the optimal approach and the "good enough" approach.

If you're interested in the optimal approach, you can try using the .ForAggregation() operator. I'm not completely sure how it works, but at a glance, it should publish either an Add or Remove action for each item in the stream, which you can then combine with .InvalidateWhen() to force the whole aggregation to reset, when needed (I.E. all items in the collection will re-fire as an Add).

Personally, the combination of those operators just strikes me as the same thing that an IObservable<IChangeSet<>> stream already gives you, but more complicated. If it were me, I'd just do something like...

public static IObservable<TObject> Maximum<TObject, TKey>(
            this    IObservable<IChangeSet<TObject, TKey>>  source,
                    IComparer<TObject>                      comparer)
        where TObject : notnull
        where TKey : notnull
    => Observable.Create<IObservable<TObject>>(observer =>
    {
        var itemsByKey = new Dictionary<TKey, TObject>();
        TObject max;
        
        return source.SubscribeSafe(Observer.Create<IChangeSet<TObject, TKey>>(
            onNext:         changes =>
            {
                foreach (var change in changes)
                {
                    switch (change.Reason)
                    {
                        case ChangeReason.Add:
                            itemsByKey.Add(change.Key, change.Current);
                            if (comparer.Compare(change.Current, max) > 0)
                            {
                                max = change.Current;
                                observer.OnNext(max);
                            }
                            break;

                        case ChangeReason.Refresh:
                            itemsByKey[change.Key] = change.Current;
                            RecalculateMax();
                            break;

                        case ChangeReason.Remove:
                            itemsByKey.Remove(change.Key);
                            if (comparer.Compare(max, change.Current) is 0)
                                RecalculateMax();
                            break;

                        case ChangeReason.Replace:
                            itemsByKey[change.Key] = change.Current;
                            if (comparer.Compare(max, change.Previous.Value) is 0)
                                RecalculateMax();
                            break;
                    }
                }
            },
            onError:        observer.onError,
            onCompleted:    observer.onCompleted);
            
        void RecalculateMax()
        {
            var oldMax = max;
            max = itemsByKey.Values.Max(comparer);
            if (comparer.Compare(oldMax, max) is not 0)
                observer.OnNext(max);
        }
    });

Is it possible to evaluate the operator only when an observable fires? E.g. the property of interest changes

You can achieve this by putting an .AutoRefresh() operator that that property, upstream of the aggregation, if the aggregation supports Refresh, like my example. If you want to be more direct, you could write an operator as...

public static IObservable<TObject> Maximum<TObject, TKey, TComparisonKey>(
        this    IObservable<IChangeSet<TObject, TKey>>      source,
                Func<TObject, IObservable<TComparisonKey>>  comparisonKeySelector)
    where TObject : notnull
    where TKey : notnull
    where TComparisonKey : IComparable<TComparisonKey>

or

public static IObservable<TObject> Maximum<TObject, TKey, TComparisonKey>(
        this    IObservable<IChangeSet<TObject, TKey>>      source,
                Func<TObject, IObservable<TComparisonKey>>  comparisonKeySelector,
                IComparer<TComparisonKey>                   comparer)
    where TObject : notnull
    where TKey : notnull

0 replies

glrusso · 2024-04-18T13:32:18Z

glrusso
Apr 18, 2024
Author

@JakenVeina, your answer was super helpful to better understand the possibilities I have, thank you.

I do have a couple of questions, and would really appreciate if you'll find the time to answer them.

What is the reason for recalculating everything on refresh?
Edit: Sorry, I got your intent, but please see issue below.

case ChangeReason.Refresh:
  itemsByKey[change.Key] = change.Current;
  RecalculateMax();
  break;

I don't know if it is intended but TransformWithInlineUpdate seem to be converting changes which reason is Update to Refresh; this would cause RecalculateMax to be called for each update.

From the source code:

case ChangeReason.Update:
  InlineUpdate(cache, change);
  break;

private void InlineUpdate(ChangeAwareCache<TDestination, TKey> cache, Change<TSource, TKey> change)
{
    var previous = cache.Lookup(change.Key)
                            .ValueOrThrow(() => new MissingKeyException($"{change.Key} is not found."));
    if (exceptionCallback is not null)
    {
        try
        {
            updateAction(previous, change.Current);
        }
        catch (Exception ex)
        {
            exceptionCallback(new Error<TSource, TKey>(ex, change.Current, change.Key));
        }
    }
    else
    {
        updateAction(previous, change.Current);
    }

    cache.Refresh(change.Key);
}

On a related note, since ForAggregation ignores Refresh, wouldn't the use of TransformWithInlineUpdate and the library Maximum cause issues? I'll try to see if this is actually an issue in some sample code.
In the code you provided, am I right to assume that calling RecalculateMax - if needed - only once at the end of the loop would be more efficient?
Wouldn't this condition pass even when removing an item which value is the same as the current maximum, even if they reference different objects?

case ChangeReason.Remove:
  itemsByKey.Remove(change.Key);
  if (comparer.Compare(max, change.Current) is 0)
      RecalculateMax();
  break;

Is this meant to be ChangeReason.Update? If so, shouldn't it follow what it's done for ChangeReason.Add?

case ChangeReason.Replace:
  itemsByKey[change.Key] = change.Current;
  if (comparer.Compare(max, change.Previous.Value) is 0)
      RecalculateMax();
  break;

Again, many thanks for your help. I'm learning a lot.

1 reply

glrusso Apr 18, 2024
Author

On a related note, since ForAggregation ignores Refresh, wouldn't the use of TransformWithInlineUpdate and the library Maximum cause issues? I'll try to see if this is actually an issue in some sample code.

I was able to replicate the issue:

using DynamicData;
using DynamicData.Aggregation;
using DynamicDataRightJoin;

int? max = null;

var cache = new SourceCache<Foo, string>(foo => foo.Id);

var sub = cache
    .Connect()
    .TransformWithInlineUpdate(
        foo => new TransformedFoo(foo.Id)
        {
            TransformedValue = foo.Value
        },
        (transformedFoo, foo) =>
        {
            transformedFoo.TransformedValue = foo.Value;
        })
    .Maximum(transformedFoo => transformedFoo.TransformedValue)
    .Subscribe(newMax =>
    {
        max = newMax;
    });

cache.AddOrUpdate(new Foo("0")
{
    Value = 0
});

Console.WriteLine(max); // 0

cache.AddOrUpdate(new Foo("0")
{
    Value = 1
});

Console.WriteLine(max); // Expected: 1, Actual: 0

sub.Dispose();

If I understood correctly, this is because:

TransformWithInlineUpdate calls cache.Refresh(change.Key); when inline updating even if the change reason was Update. Is this intended?
ForAggregation ignores changes which reason is Refresh.

JakenVeina · 2024-04-21T19:23:20Z

JakenVeina
Apr 21, 2024
Maintainer

What is the reason for recalculating everything on refresh?

Basically, because it's impossible for us to know WHAT the change was, so there's no way we could do a check to see whether a full refresh is needed or not, like we can for adds and removes and such. The idea behind a Refresh is that it's a way for the upstream to tell us that an item has been mutated, not just replaced with a new item with slightly different state. Even if we were to go pull the item being refreshed from itemsByKey, we can't use that to figure out what the prior state of the item was because that reference potentially has already been mutated to the new state.

I don't know if it is intended but TransformWithInlineUpdate seem to be converting changes which reason is Update to Refresh.

Yes, that's intended. The main useage-scenario for that operator is to transform immutable Model objects from upstream, into mutable ViewModel objects for consumption by a View layer, where it's generally very beneficial to avoid ViewModel churn, I.E. transforming changes that swap out entire objects (an Update) into changes that just mutate an existing object (a Refresh) is the entire goal, because it prevents View-layer bindings from frequently having to be reconstructed.

I could provide a more concrete example, if you're curious, or aren't following.

On a related note, since ForAggregation ignores Refresh, wouldn't the use of TransformWithInlineUpdate and the library Maximum cause issues? I'll try to see if this is actually an issue in some sample code.

Yup.

In the code you provided, am I right to assume that calling RecalculateMax - if needed - only once at the end of the loop would be more efficient?

Yeah, that would be a good optimization. Surprised I didn't spot it myself.

Wouldn't this condition pass even when removing an item which value is the same as the current maximum, even if they reference different objects?

Yes, it would, that's intentional. It's entirely possible that a dataset can have two different values in it that are equivalent, and happen to be the max. Removing one of them COULD mean that you don't have to re-calculate the max, because you already know that there's another item with the same value, but that right there is the key: you have to KNOW that there's another item with the same value. In the snippet I wrote, we don't know that. It probably wouldn't be terribly complicated to add a variable maxCount to track how many "max" items there currently are, and that would eliminate the occasional need to re-iterate the whole set, when a max is removed. It would mean you'd no longer get to leverage IEnumerable<T>.Max(), since whenever you need to re-calculate the max, you also need to keep a count. That's probably not a significant issue, I'd say IEnumerable<T>.Max() is unlikely to be heavily optimized over just writing your own foreach.

Is this meant to be ChangeReason.Update? If so, shouldn't it follow what it's done for ChangeReason.Add?

Yeah, ChangeReason.Update.

And yes, it's doing (effectively) both a remove and an add, so it should include the logic for both. That'd be a bug. There should be an else condition there for when the new value is greater than the current max. When it's less, there's still no need to do anything.

1 reply

glrusso Apr 23, 2024
Author

Basically, because it's impossible for us to know WHAT the change was, so there's no way we could do a check to see whether a full refresh is needed or not, like we can for adds and removes and such.

Even if we were to go pull the item being refreshed from itemsByKey, we can't use that to figure out what the prior state of the item was because that reference potentially has already been mutated to the new state.

Couldn't we treat the Refresh as Remove and then Add?

If the reference Value for which we're aggregating has been mutated, this condition under Remove will pass and max would be recalculated. Otherwise we'll just check if the current Value is greater than our Max.Value.

if (comparer.Compare(max, change.Current) is 0)
                                RecalculateMax();

Wouldn't this fix the issue of using this operator together with TransformWithInlineUpdate?

I could provide a more concrete example, if you're curious, or aren't following.

Yes please, I make extensive use of TransformWithInlineUpdate in my application precisely because of what you said. It would be super helpful to better grasp the idea and mechanisms behind it.

Yup.

Thanks, I opened an issue. I'm really interested in the problem.

Yeah, that would be a good optimization. Surprised I didn't spot it myself.

Ok, nice!

Yes, it would, that's intentional. It's entirely possible that a dataset can have two different values in it that are equivalent, and happen to be the max. Removing one of them COULD mean that you don't have to re-calculate the max, because you already know that there's another item with the same value, but that right there is the key: you have to KNOW that there's another item with the same value. In the snippet I wrote, we don't know that. It probably wouldn't be terribly complicated to add a variable maxCount to track how many "max" items there currently are, and that would eliminate the occasional need to re-iterate the whole set, when a max is removed. It would mean you'd no longer get to leverage IEnumerable.Max(), since whenever you need to re-calculate the max, you also need to keep a count. That's probably not a significant issue, I'd say IEnumerable.Max() is unlikely to be heavily optimized over just writing your own foreach.

Ok, many thanks for the explanation.

When dealing with Refresh or other changes, would checking if changes.Current/Previous reference the same object as our Max lead to errors?

And yes, it's doing (effectively) both a remove and an add, so it should include the logic for both. That'd be a bug. There should be an else condition there for when the new value is greater than the current max. When it's less, there's still no need to do anything.

Thanks! I'm putting everything together to write my custom operator.

JakenVeina · 2024-04-24T03:06:28Z

JakenVeina
Apr 24, 2024
Maintainer

Couldn't we treat the Refresh as Remove and then Add?

I suppose you're right, in that the Refresh case can be refined a little further. We can know whether or not the item was previously the max, and when that happens, the only way to determine what the new max should be is to re-aggregate the whole set. However, if the object wasn't the previous max, we only need to check if it exceeds the previous max. So...

case ChangeReason.Refresh:
    itemsByKey[change.Key] = change.Current;
    switch (comparer.Compare(change.Current, max))
    {
        case 0:
            RecalculateMax();
            break;
        case int comparison when comparison > 0:
            max = change.Current;
            observer.OnNext(max);
            break;
    }
    break;

At least, I think so. Ultimately, back all this up with tests.

Wouldn't this fix the issue of using this operator together with TransformWithInlineUpdate

Which operator? The proof-of-concept version of .Maximum() that I wrote, and we've been talking about? This operator doesn't have any issue with .TransformWithInlineUpdate().

If you're instead talking about .ForAggregation() the fix is to add support for Refresh() to .ForAggregation(). And yeah, perhaps that could be implemented by having .ForAggregation() treat Refresh as a Remove followed by an Add. That sounds like it's equivalent to the optimal approach for Max and Min aggregations, but it might cause poor performance for other aggregations, compared to implementing some kind of native support for Refresh within the operator.

I'd defer to @RolandPheasant for considering any changes to .ForAggregation() because I don't fully understand the design decisions made there. Like I mentioned before, it rather looks like exactly what IChangeSet<> already provides, but not as detailed.

When dealing with Refresh or other changes, would checking if changes.Current/Previous reference the same object as our Max lead to errors?

Not sure what you mean. Like, are you proposing calling ReferenceEquals() first, as an optimization, before using the comparer? I'd have to refresh myself on how that behaves for boxed value types, but at minimum, it would hurt performance for value types, by causing boxing in the first place. That could in theory provide a performance boost, for reference types, but only when the item in question is the current max, and in real-world scenarios, that seems like a really rare case. Rare enough that you'd never see a measurable difference at the high level.

I make extensive use of TransformWithInlineUpdate in my application precisely because of what you said. It would be super helpful to better grasp the idea and mechanisms behind it.

So, I once made a version of Battleship in WPF, using System.Reactive, back before I got involved with DynamicData. This is an excellent example of how ViewModel churn can kill an application.

public class GameBoardViewModelBase<TBoardPositionViewModel>
    : INotifyPropertyChanged
{
    public ImmutableArray<TBoardPositionViewModel> BoardPositions { get; }
}

<ItemsControl
        Grid.Column="1"
        Grid.Row="1"
        ItemsSource="{Binding BoardPositions}"
        ItemTemplate="{Binding BoardPositionTemplate, RelativeSource={RelativeSource AncestorType=UserControl}}">
</ItemsControl>

(abbreviated and adjusted, for brevity)

Here we have my implementation of the game board, in the form of a Grid containing 100 separate controls, one for each tile on the 10x10 board. Each tile has its own TBoardPositionViewModel which drives things like what ship segment is currently in that tile, what its orientation is, whether or not the tile has been shot at, and whether it was a hit or miss, etc.

Even if I were to write this project with DynamicData today, I wouldn't have GameBoardViewModelBase.BoardPositions be an ObservableCollection<> driven by DynamicData, it would still just be an ImmutableArray<> because the SIZE of the board doesn't change, only its contents, so we can create 100 VMs for the tiles immediately on startup, assign each one to a particular grid position, and then each VM itself can monitor for changes within that tile, and publish notifications about changes.

In an earlier revision of this project, I wasn't doing this, I was rebuilding every VM every time there was a game state change, and it absolutely TANKED performance. The reason for that is because each tile has half-a-dozen bindings within it to that ViewModel, and swapping out the VM itself means recompiling ALL those bindings, for the ENTIRE board, rather than just publishing one or two change notifications that will be seen by just the one or two bindings affected, and only cause a re-evaluation of the bound value, rather than a full re-compliation.

Let's look at a simple example, in code...

public record MyItem
{
    public required int Id { get; init; }
    public required string Name { get; init; }
    public required string Description { get; init; }
}

using var itemsSource = new SourceCache<MyItem, int>(static item => item.Id);

using var itemsBinding = itemsSource
    .Connect()
    .Filter(...)
    .Sort(static item => item.Name)
    .Bind(out var items);

This would be the "bad" way to do it. It's not nearly as bad as my Battleship example, because using DynamicData here at least means that when one item is changed, we don't rebuild the bindings for EVERY item, just that one. And there's probably only two bindings on each item, for Name and Description so, it's probably not that big of a deal. But because MyItem is an immutable record, in order to change, say, the name of an item, we'd have to make a copy of the whole object, with only the .Name value being different, and Replace that into the source collection. That would trigger a full replace of the item in the View layer, which would re-compile the .Name and .Description bindings. We can do better.

public record MyItem
{
    public required int Id { get; init; }
    public required string Name { get; init; }
    public required string Description { get; init; }
}

public class MyItemViewModel
    : INotifyPropertyChanged
{
    public required int Id { get; init; }
    public required string Name { get; set; }
    public required string Description { get; set; }
}

using var itemsSource = new SourceCache<MyItem, int>(static item => item.Id);

using var itemsBinding = itemsSource
    .Connect()
    .Filter(...)
    .TransformWithInlineUpdate(
        transformFactory:   static item => new MyItemViewModel()
        {
            .Id             = item.Id,
            .Name           = item.Name,
            .Description    = item.Description
        },
        updateAction:        (item, viewModel) =>
        {
            viewModel.Name          = item.Name;
            viewModel.Description   = item.Description;
        })
    .Sort(static item => item.Name)
    .Bind(out var items);

With .TransformWithInlineUpdate() a MyItemViewModel is only created upon the initial Add for each key. Update and Refresh changes are both handled by invoking updateAction allowing us to push the new state of the item into the ViewModel, and let the ViewModel publish targeted notifications about exactly what changed. The operator also only publishes Refresh changes downstream, as the items themselves don't change. This allows us to still be able to do a downstream .Sort(), since the Refreshes will trigger the mutated items to be re-sorted, and then plays well with .Bind() which actually ignores Refresh changes.

Thanks! I'm putting everything together to write my custom operator.

Feel free to PR it, when you've got it working.

0 replies

glrusso · 2024-04-26T12:11:13Z

glrusso
Apr 26, 2024
Author

At least, I think so. Ultimately, back all this up with tests.

Thanks, will do.

Which operator? The proof-of-concept version of .Maximum() that I wrote, and we've been talking about? This operator doesn't have any issue with .TransformWithInlineUpdate().

Yeah sorry, let me clarify. I was referring to the poc of Maxmimum you wrote and the issue with TransformWithInlineUpdate I had in mind was what I wrote previously: "TransformWithInlineUpdate seem to be converting changes which reason is Update to Refresh; this would cause RecalculateMax to be called for each update."

If you're instead talking about .ForAggregation() the fix is to add support for Refresh() to .ForAggregation(). And yeah, perhaps that could be implemented by having .ForAggregation() treat Refresh as a Remove followed by an Add. That sounds like it's equivalent to the optimal approach for Max and Min aggregations, but it might cause poor performance for other aggregations, compared to implementing some kind of native support for Refresh within the operator.
I'd defer to @RolandPheasant for considering any changes to .ForAggregation() because I don't fully understand the design decisions made there. Like I mentioned before, it rather looks like exactly what IChangeSet<> already provides, but not as detailed.

Interesting matter, I'd also like to know more about .ForAggregation() and when and why to use it.

Not sure what you mean. Like, are you proposing calling ReferenceEquals() first, as an optimization, before using the comparer? I'd have to refresh myself on how that behaves for boxed value types, but at minimum, it would hurt performance for value types, by causing boxing in the first place. That could in theory provide a performance boost, for reference types, but only when the item in question is the current max, and in real-world scenarios, that seems like a really rare case. Rare enough that you'd never see a measurable difference at the high level.

Ok, I agree with what you said. I do not have in mind to use the operator for value types but yup, additional complexity may not be worth the price.

So, I once made a version of Battleship in WPF, using System.Reactive, back before I got involved with DynamicData. This is an excellent example of how ViewModel churn can kill an application ...

Many thanks for the explanation, it was super clear and helpful.

Feel free to PR it, when you've got it working.

Got it! At the moment I avoided writing my custom operator and used the library .Maximum() with structs that wrap my objects, that works but I'd love to get rid of the wrappers and have less code to maintain.

Thank you for all the helpful information. I believe I now have everything I need to implement the operator.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaxBy and MinBy equivalent #895

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

MaxBy and MinBy equivalent #895

glrusso Apr 17, 2024

Replies: 5 comments · 2 replies

JakenVeina Apr 18, 2024 Maintainer

glrusso Apr 18, 2024 Author

glrusso Apr 18, 2024 Author

JakenVeina Apr 21, 2024 Maintainer

glrusso Apr 23, 2024 Author

JakenVeina Apr 24, 2024 Maintainer

glrusso Apr 26, 2024 Author

glrusso
Apr 17, 2024

Replies: 5 comments 2 replies

JakenVeina
Apr 18, 2024
Maintainer

glrusso
Apr 18, 2024
Author

glrusso Apr 18, 2024
Author

JakenVeina
Apr 21, 2024
Maintainer

glrusso Apr 23, 2024
Author

JakenVeina
Apr 24, 2024
Maintainer

glrusso
Apr 26, 2024
Author