Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2728R7 #1

Merged
merged 1 commit into from
Oct 7, 2024
Merged

P2728R7 #1

merged 1 commit into from
Oct 7, 2024

Conversation

ednolan
Copy link
Member

@ednolan ednolan commented Sep 22, 2024

No description provided.

@ednolan ednolan force-pushed the p2728r7 branch 5 times, most recently from 70f178b to b279f17 Compare September 27, 2024 19:00
#define UTFVIEW_CODE_UNIT_VIEW_HPP

#include <UtfView/detail/concepts.hpp>
#include <boost/stl_interfaces/iterator_interface.hpp>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change needed here, but just a heads-up. I've pulled iterator_interface from consideration for C++26, because I was to see what it looks like to do it using the new reflection language feature we'll get in 26. Sine this paper is also probably targeting 29, this is maybe a non-issue; I fully intend to resurrect the iterator_interface paper for 29.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see how the scheduling goes, but I ideally was hoping to target 26 because P2871 and P2873 are targeting 26 and I wanted this to replace the removed std::codecvt facets. That's not a showstopper, though; if it turns out I have to wait for 29, I'll leave iterator_interface in, and if I do manage to target 26, I can take out iterator_interface at the expense of verbosity.

class as_char8_t_view : public std::ranges::view_interface<as_char8_t_view<V>> {
V base_ = V(); // @*exposition only*@

template <bool Const>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these exposition-only? Most (all?) C++ views have nested iterator and sentinel templates.

Copy link
Member Author

@ednolan ednolan Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have nested exposition-only iterator and sentinel templates, as far as I can tell: https://en.cppreference.com/w/cpp/ranges/elements_view/iterator

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! So they are. I think I just misremembered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the correct way to get the types is iterator_t and sentinel_t which are just decltype(ranges::begin(r)) and similarly for end. So the types don't need to have publicly visible names as nested types of the view.

Copy link
Collaborator

@tzlaine tzlaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, except that I don't have that much confidence in the tests. It looks like you don't cover:

  • back-and-forth iteration using the transcoding iterator in bidirectional mode;
  • the exact place that replacement characters are produced by the transcoding iterators; and
  • the combinatoric coverage of value category+as_utfN.

For the last one that would include foo | as_utfN, where foo is a borrowed/unborrowed range, and its value category is rvalue/lvalue+const/non-const.

I looked pretty closely at the implementation, and a lot less closely at the tests. I may have missed these things.

On a design level, I think this version of things is simpler than the previous one (since it drops the unpacking logic). That's a marked improvement, IMO. We do lose compatibility in some rare cases by not having the CPO, but . I don't think we'll miss it.

The only other things to note are the lack of direct support for working with NTBSs, and direct support for working with std::string{,_view} without going to char8_t first. The former is fine, given how easy null_term is to use and recommend. The latter... could I see an example of exactly how bad it will be? Did I maybe miss that somewhere?

friend constexpr bool operator==(I const& it, null_sentinel_t) {
return *it == std::iter_value_t<I>{};
}
template <std::forward_iterator I>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both? If you get rid of the forward overload, and get rid of the not-forward constraint on the input overload, the only thing you lose is passing an iterator by reference instead of by value. Are the two here just for that small optimization?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a big thing; I'm essentially just wondering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the only reason why. The paper has the following language about it:

The null_sentinel_t's operator== has a separate overload for input iterators that takes the iterator by reference instead of by value. We want to take input iterators by reference because they are not required to be copyable. However, for forward iterators, we want to take by value because otherwise we incur a double indirection (e.g. int* const& it) that compilers may not optimize.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks.

class as_char8_t_view : public std::ranges::view_interface<as_char8_t_view<V>> {
V base_ = V(); // @*exposition only*@

template <bool Const>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! So they are. I think I just misremembered.


} // namespace detail

inline constexpr detail::null_term_impl null_term;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta say, I like this a lot more than I thought I would. :)

@ednolan
Copy link
Member Author

ednolan commented Sep 27, 2024

This looks really good, except that I don't have that much confidence in the tests. It looks like you don't cover:

  • back-and-forth iteration using the transcoding iterator in bidirectional mode;
  • the exact place that replacement characters are produced by the transcoding iterators; and

Those two are both tested by the function run_test_case_impl in to_utf_view.t.cpp. There's a set of test cases that maps a range of input code units to a range of pairs of output code units and error codes (lines 263-451). Each test case is run backwards and forwards to ensure that the result matches expectations in both cases, and there are many tests that produce replacement characters.

The repo implements automated test coverage, which is at 100%-- this is misleading because if a template isn't instantiated it doesn't count towards the total, but to_utf_view.hpp still has excellent coverage: https://coveralls.io/builds/70047022/source?filename=include%2FUtfView%2Fto_utf_view.hpp

  • the combinatoric coverage of value category+as_utfN.

For the last one that would include foo | as_utfN, where foo is a borrowed/unborrowed range, and its value category is rvalue/lvalue+const/non-const.

I've added this to my TODO list.

The only other things to note are the lack of direct support for working with NTBSs, and direct support for working with std::string{,_view} without going to char8_t first. The former is fine, given how easy null_term is to use and recommend. The latter... could I see an example of exactly how bad it will be? Did I maybe miss that somewhere?

There is direct support for working with NTBSs if they're in the form of non-decayed string literals; that's implemented by the else if constexpr (std::is_bounded_array_v<T>) { of to_utf_impl (to_utf_view.hpp:1129). I had to take out support for pointers because RangeAdaptorClosureObject constrains its parameters to be ranges. "std::string{,_view} without going to char8_t first" is still supported. I even added an expanded/updated rationale for continuing to support char and wchar_t in the paper: https://www.ednolan.com/D2728R7.html#discussion-of-whether-transcoding-views-should-accept-ranges-of-char-and-wchar_t.

@tzlaine
Copy link
Collaborator

tzlaine commented Sep 28, 2024

This looks really good, except that I don't have that much confidence in the tests. It looks like you don't cover:

  • back-and-forth iteration using the transcoding iterator in bidirectional mode;
  • the exact place that replacement characters are produced by the transcoding iterators; and

Those two are both tested by the function run_test_case_impl in to_utf_view.t.cpp. There's a set of test cases that maps a range of input code units to a range of pairs of output code units and error codes (lines 263-451). Each test case is run backwards and forwards to ensure that the result matches expectations in both cases, and there are many tests that produce replacement characters.

Right. You still may be missing some coverage though if you only go all the way through the seqeunce each time (forward or backward). I had tests that, for N code points would do something like:

for (i: (0, N)) {
  for (j: (0, i]) forward_one_step()
  for (j: (0, i]) backward_one_step()
}

The result was checked at each step. In the initial implemenation, I did find a problem or two doing this.

The only other things to note are the lack of direct support for working with NTBSs, and direct support for working with std::string{,_view} without going to char8_t first. The former is fine, given how easy null_term is to use and recommend. The latter... could I see an example of exactly how bad it will be? Did I maybe miss that somewhere?

There is direct support for working with NTBSs if they're in the form of non-decayed string literals; that's implemented by the else if constexpr (std::is_bounded_array_v<T>) { of to_utf_impl (to_utf_view.hpp:1129). I had to take out support for pointers because RangeAdaptorClosureObject constrains its parameters to be ranges. "std::string{,_view} without going to char8_t first" is still supported. I even added an expanded/updated rationale for continuing to support char and wchar_t in the paper: https://www.ednolan.com/D2728R7.html#discussion-of-whether-transcoding-views-should-accept-ranges-of-char-and-wchar_t.

It's fine to me to remove direct support for NTSBs, but references to arrays are not NTSBs.

@ednolan ednolan force-pushed the p2728r7 branch 2 times, most recently from 8b8b837 to dae4300 Compare September 28, 2024 18:59
@ednolan ednolan force-pushed the p2728r7 branch 11 times, most recently from 9c0c210 to d68becd Compare October 7, 2024 05:35
@ednolan ednolan merged commit 64f7f6c into main Oct 7, 2024
4 checks passed
@ednolan ednolan deleted the p2728r7 branch October 8, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants