Fast and correct #22

rinde · 2024-04-18T09:02:56Z

First of all, thank you for reading this. I know this PR is very big and contains a lot of breaking changes. I will try my best to explain the major changes in this description. The changes in this PR are the result of an iterative development process over the past 2 months. At my company we have been linking to this PR (and it's successor) in an actively developed codebase for the past month and that usage has informed a lot of the changes. If something is still unclear, please reach out.

Fast and correct

As we discussed before, the current API has issues with speed and soundness. The design put forward in this PR fixes both.

In the new design, the non-empty collection essentially encapsulates the corresponding possibly empty collection, for example:

pub struct NEVec<T> {
    inner: Vec<T>,
}

The implementation of NEVec forwards all calls to the underlying Vec while guaranteeing non-emptiness. This simplified inner representation also greatly simplifies a lot of complexity that previously existed. Take for example Vec::insert():

pub fn insert(&mut self, index: usize, element: T) {
        let len = self.len().get();
        assert!(index <= len);

        if index == 0 {
            let head = std::mem::replace(&mut self.head, element);
            self.tail.insert(0, head);
        } else {
            self.tail.insert(index - 1, element);
        }
 }

This is now:

pub fn insert(&mut self, index: usize, element: T) {
        self.inner.insert(index, element);
}

Advantages of this approach are:

a lower chance of introducing bugs
a lower chance of introducing inefficiencies
lower maintenance burden (less code)

Non-emptiness is enforced by ensuring that an NEVec can never be constructed without any elements (quite similar to before), and by ensuring that the inner data structure can not be accessed (i.e. it is properly encapsulated).

For NEVec I added a benchmark in benches/vec.rs that tests that performance of NEVec is equal to Vec. Because of this new simplified design I belief that the other datastructures are also on par with their possibly empty counterpart. Additionally, the codebase at work gets regularly benchmarked and there we found no performance impact with this new version (we mostly use NEVec and NEIndexMap), this is contrary to the current version which slowed our code down noticeably.

A sound `NonEmptyIterator`

As discussed in #16 there are some problems with the current implementation, most notably:

Calling the methods next() and first() in the incorrect order could lead to data loss
Calling any of NonEmptyIterators method after calling next() at least once, leads to unexpected results because the first value is always included even if the iterator has already passed it.

Additionally I noticed that to implement the NonEmptyIterator a lot of repetitive boiler plate code is required.

In this PR, the NonEmptyIterator looks as follows:

pub trait NonEmptyIterator: IntoIterator {
    fn next(self) -> (Self::Item, Self::IntoIter)
    where
        Self: Sized,
    {
        let mut iter = self.into_iter();
        (iter.next().unwrap(), iter)
    }

    // all other methods
}

Note that NonEmptyIterator is bound by IntoIterator and that all its methods are already implemented, this means that it is a marker trait and therefore trivial to implement. This allows for very simple implementations of non-empty iterators, take for example Iter of Vec:

pub struct Iter<'a, T: 'a> {
    iter: std::slice::Iter<'a, T>,
}

impl<T> NonEmptyIterator for Iter<'_, T> {}

impl<'a, T> IntoIterator for Iter<'a, T> {
    type Item = &'a T;

    type IntoIter = std::slice::Iter<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        self.iter
    }
}

This is the entire implementation.

All NonEmptyIterators are safe to use when they are created by a non-empty collection because in that case it is guaranteed that the underlying iterator (std::slice::Iter<'a, T> in the above example) is non-empty. In case an implementor would make a mistake and create a NonEmptyIterator that is in fact empty, a panic would occur when calling next(). This means that such mistakes are easy to detect and test for.

Other changes

New features

In our codebase we converted some datastructures to their non-empty equivalent. During this refactoring we found missing functionality in this library. To fill the gaps, I created two new features:

itertools: adds the NonEmptyItertools trait similar to the Itertools trait for Iterator. So far it only contains a few methods that we needed. It is easy to extend this implementation to cover more of itertools in the future.
either: this feature adds the NEEither enum, which is a non-empty variant of the Either enum.

Linting, formatting, and `justfile`

Cargo.toml now contains configuration for both Rust lints and Clippy lints. I've updated the code where needed to satisfy the lints. In order to ensure that lints don't get violated accidentally, they are automatically checked as part of CI.

I've pinned the Rust toolchain to the most recent version (at the time of writing). This is to ensure that the build is stable. If we don't do this, it could be that a new version of Rust is released with new lints and that will break the build which will prohibit any updates/releases until it is fixed. Pinning the toolchain allows for choosing the time when to update to the new Rust version, which avoids breaking the build.

I added a justfile which can be run by the just runner. I've defined several useful commands for the project such as test, fmt, and lint. I've also updated CI to use these definitions.

The fmt command formats all Rust code. There is a rustfmt.toml file that configures the formatter. To run the code formatter a version of Rust nightly is required, but this introduces no dependency on nightly for users of the library. I've added just install-nightly for easy installing the correct version. Part of the lint command is also a formatting check, this ensures that all code in the repository follows consistent formatting. You will see some minor changes in the diff of this PR that fixes some formatting inconsistencies.

Breaking changes

I updated the changelog with every noteworthy change.

Follow-up

I've continued my work in a follow-up PR.

fosskers · 2024-06-10T05:43:18Z

Hi, thank you for this. I'm back from my trip. I will be addressing this after finishing up some Aura work and releasing Beta 3 of that. Thanks again for your patience.

rinde · 2024-06-10T08:45:40Z

Thanks for the transparency!

fosskers · 2024-08-09T23:39:15Z

Hi, I've completed the major project I had been working on, and it's time to return to this. In the interim you would have noticed that I've patched the library a few times for work purposes, and there are some conflicts now. We'll need to resolve those.

Conflicts aside, I will begin to review this in earnest. To confirm: this is the PR I should be looking at? What's the status of the other two PRs?

rinde · 2024-09-06T10:07:51Z

Hi @fosskers , yes this is the PR to review first. There is already a follow-up PR that we use internally but I think it makes sense to address that one separately once we reached consensus on this one.

I'll have a look at the changes that happened on master and try to port them over to this PR.

rinde · 2024-09-06T14:42:23Z

I just updated this PR to be in sync with master again. I removed the group_by function temporarily because I ran out of time to fix it. I think what needs to happen is to re-add it but in the new itertools file now. I think that it should be possible to wrap the itertools implementation to avoid having to essentially re-implement the group_by function from itertools. I hope this gets clear once you have reviewed the rest of this PR. I may have some time to work on the group_by sometime in the coming weeks, but it might also be instructive for you to re-implement it in the new structure that I'm proposing in this PR.

I'm back from parental leave now, and I hope we can use the coming weeks to push this over the finish line to avoid doing double work again. I'm happy to answer any questions you may have.

fosskers · 2024-09-08T07:20:26Z

Thanks a lot, I will be going over this fully soon.

rinde · 2025-01-22T17:11:30Z

Hi @fosskers , are you planning to follow-up on this? It's totally fine if you don't have the interest or time, I just would like to know because in that case I will probably publish this (and several follow-ups) in a separate repository.

fosskers · 2025-01-22T20:39:23Z

I do, I'm sorry to not have followed up yet.

rinde added 30 commits April 4, 2024 15:38

benchmark NEVec

473123d

add benchmark for map

0788234

add alternative implementation of NEVec and NonEmptyIterator

7551232

update bench with nevec2

a3f2448

refactor NonEmptyIterator to ensure correctness

5235078

refactor NEIndexMap for speed and update iter

6e51b82

update arrays iter

9f1b0d7

refactor NEVec for speed and update iter

cc1dacf

update NEVec to new API

6db8a01

refactor NEIndexMap for speed and update iter

abcb8fe

update iter

f78166f

improve NEVec

f633926

refactor NESet for speed and update iter

153c696

refactor NEMap for speed and update iter

23dbbed

update iter for arrays

0d60a66

update examples in lib

8abf033

refactor NESlice for speed and update iter

1b4d24b

add rust lints and fix code

0232b73

add justfile with fmt, lint, and test targets

89a7dab

apply consistent formatting

b36ef13

add clippy lints and fix all violations

b1922a9

build docs with all features

43c3a47

Merge branch 'benchmark' into fast-and-correct

5731cda

remove NEVec2

ccb2eed

Merge branch 'master' into fast-and-correct

04038d9

improve docs and changelog

6d56eb4

fix testcase

00a0b67

add info about bugfix to changelog

93534b3

improve justfile

1d1c8fc

add extension to either

7a3bbb9

rinde added 13 commits May 17, 2024 15:28

document justfile

6c171bb

tweaks

7ee41f4

update changelog

5bd0d20

add readme commands to justfile

6a6e28b

add readme check to CI

217caff

fix lint

9e8b755

fix ci

092667e

split CI in multiple jobs

d5327df

rename CI

85c9cca

update readme

677aeaa

improve lib doc

0c289cc

update readme

48db401

allow redundant explicit links fir cargo rdme

7eff28a

rinde marked this pull request as ready for review May 21, 2024 14:40

fosskers mentioned this pull request Jun 10, 2024

Adding iterators? fosskers/cl-transducers#8

Open

rinde added 7 commits September 6, 2024 14:17

Merge branch 'master' into fast-and-correct

06e4d4d

fix impl<K, V, S> FromNonEmptyIterator<(K, V)> for HashMap<K, V, S>

6dba154

remove group_by (for now)

95aa76a

cleanup

2bae0ae

adjust map

47e2359

fix set

daa9fce

adjust sorting impls in NEVec

536a13d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast and correct #22

Fast and correct #22

rinde commented Apr 18, 2024 •

edited

Loading

fosskers commented Jun 10, 2024

rinde commented Jun 10, 2024

fosskers commented Aug 9, 2024 •

edited

Loading

rinde commented Sep 6, 2024

rinde commented Sep 6, 2024

fosskers commented Sep 8, 2024

rinde commented Jan 22, 2025

fosskers commented Jan 22, 2025

Fast and correct #22

Are you sure you want to change the base?

Fast and correct #22

Conversation

rinde commented Apr 18, 2024 • edited Loading

Fast and correct

A sound NonEmptyIterator

Other changes

New features

Linting, formatting, and justfile

Breaking changes

Follow-up

fosskers commented Jun 10, 2024

rinde commented Jun 10, 2024

fosskers commented Aug 9, 2024 • edited Loading

rinde commented Sep 6, 2024

rinde commented Sep 6, 2024

fosskers commented Sep 8, 2024

rinde commented Jan 22, 2025

fosskers commented Jan 22, 2025

rinde commented Apr 18, 2024 •

edited

Loading

A sound `NonEmptyIterator`

Linting, formatting, and `justfile`

fosskers commented Aug 9, 2024 •

edited

Loading