Skip to content

Commit

Permalink
utfview conditional borrowing
Browse files Browse the repository at this point in the history
  • Loading branch information
ednolan committed Sep 21, 2024
1 parent dfc3a01 commit 43693b8
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 22 deletions.
25 changes: 25 additions & 0 deletions include/UtfView/to_utf_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1162,4 +1162,29 @@ inline constexpr detail::to_utf_impl<char32_t> to_utf32;

/* PAPER: } */

template <class V>
inline constexpr bool std::ranges::enable_borrowed_range<utfview::to_utf8_view<V>> =
std::ranges::enable_borrowed_range<V>;

template <class V>
inline constexpr bool std::ranges::enable_borrowed_range<utfview::to_utf16_view<V>> =
std::ranges::enable_borrowed_range<V>;

template <class V>
inline constexpr bool std::ranges::enable_borrowed_range<utfview::to_utf32_view<V>> =
std::ranges::enable_borrowed_range<V>;

/* PAPER: namespace std::ranges { */
/* PAPER: */
/* PAPER: template <class V> */
/* PAPER: inline constexpr bool enable_borrowed_range<std::uc::to_utf8_view<V>> = enable_borrowed_range<V>; */
/* PAPER: */
/* PAPER: template <class V> */
/* PAPER: inline constexpr bool enable_borrowed_range<std::uc::to_utf16_view<V>> = enable_borrowed_range<V>; */
/* PAPER: */
/* PAPER: template <class V> */
/* PAPER: inline constexpr bool enable_borrowed_range<std::uc::to_utf32_view<V>> = enable_borrowed_range<V>; */
/* PAPER: */
/* PAPER: } */

#endif // UTFVIEW_TO_UTF_VIEW_HPP
64 changes: 42 additions & 22 deletions paper/P2728.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,25 +278,8 @@ In short, rejecting `char` and `wchar_t` forces you to write "`| as_char8_t`"
everywhere you want to use a `std::string` with the interfaces proposed in
this paper.

SG-16 has previously expressed support for rejecting `char` and `wchar_t`.
[Here](https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#april-12th-2023)
is the relevant SG-16 poll:

*UTF transcoding interfaces provided by the C++ standard library should
operate on charN_t types, with support for other types provided by adapters,
possibly with a special case for char and wchar_t when their associated
literal encodings are UTF.*

+----+---+---+---+----+
| SF | F | N | A | SA |
+====+===+===+===+====+
| 6 |1 |0 |0 | 1 |
+----+---+---+---+----+

(This paper ignores the "possibly with a special case for char and
wchar_t when their associated literal encodings are UTF" part. Making the
evaluation of a concept change based on the literal encoding seems like a
flaky move; the literal encoding can change TU to TU.)
SG-16 has previously expressed strong support for rejecting `char` and
`wchar_t`, as can be observed in the polling history section.

The feeling in SG-16 was that the `charN_t` types are designed to represent
UTF encodings, and `char` is not. A `char const *` string could be in any one
Expand Down Expand Up @@ -5241,6 +5224,19 @@ namespace std::uc {
inline constexpr @*unspecified*@ to_utf32;
}
namespace std::ranges {
template <class V>
inline constexpr bool enable_borrowed_range<std::uc::to_utf8_view<V>> = enable_borrowed_range<V>;
template <class V>
inline constexpr bool enable_borrowed_range<std::uc::to_utf16_view<V>> = enable_borrowed_range<V>;
template <class V>
inline constexpr bool enable_borrowed_range<std::uc::to_utf32_view<V>> = enable_borrowed_range<V>;
}
```

The exposition-only concept `@*to-utf-view-iterator-optimizable*@` is true if
Expand All @@ -5255,7 +5251,7 @@ down.
The iterator type of `@*to-utf-view-impl*@` is
`@*utf-iterator*@`. `@*utf-iterator*@` is an iterator that transcodes from
UTF-N to UTF-M, where N and M are each one of 8, 16, or 32. N may equal
M.
M.

`@*utf-iterator*@` uses a mapping between character types and UTF encodings,
which is that that `char` and `char8_t` correspond to UTF-8, `char16_t`
Expand Down Expand Up @@ -5393,10 +5389,10 @@ expression-equivalent to:

- Otherwise, if `T` is an array type of known bound, then:

- If the array extent is nonzero and the last element of the array is zero,
- If the array extent is nonzero and the last element of the array is zero,
then
`V(std::ranges::subrange(std::ranges::begin(E), --std::ranges::end(E)))`
- Otherwise,
- Otherwise,
`V(std::ranges::subrange(std::ranges::begin(E), std::ranges::end(E)))`

- Otherwise, `V(std::views::all(E))`
Expand Down Expand Up @@ -5915,6 +5911,30 @@ exception of `empty_view<T>{} | to_utfN`, the following are always true:

Add the feature test macro `__cpp_lib_unicode_transcoding`.

## Relevant Polls/Minutes

TODO

[Here](https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#april-12th-2023)
is the relevant SG-16 poll:

*UTF transcoding interfaces provided by the C++ standard library should
operate on charN_t types, with support for other types provided by adapters,
possibly with a special case for char and wchar_t when their associated
literal encodings are UTF.*

+----+---+---+---+----+
| SF | F | N | A | SA |
+====+===+===+===+====+
| 6 |1 |0 |0 | 1 |
+----+---+---+---+----+

(This paper ignores the "possibly with a special case for char and
wchar_t when their associated literal encodings are UTF" part. Making the
evaluation of a concept change based on the literal encoding seems like a
flaky move; the literal encoding can change TU to TU.)


## Design notes

None of the proposed interfaces is subject to change in future versions of
Expand Down
4 changes: 4 additions & 0 deletions src/UtfView/tests/to_utf_view.t.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1140,6 +1140,10 @@ constexpr bool empty_test() {
return true;
}

static_assert(std::ranges::borrowed_range<to_utf8_view<std::string_view>>);
static_assert(std::ranges::borrowed_range<to_utf16_view<std::string_view>>);
static_assert(std::ranges::borrowed_range<to_utf32_view<std::string_view>>);

CONSTEXPR_UNLESS_MSVC bool utf_view_test() {
if (!input_iterator_test(std::initializer_list<char8_t>{u8'x'})) {
return false;
Expand Down

0 comments on commit 43693b8

Please sign in to comment.