C++ Proposal for encoded strings
- Document number: PxxxxRy
- Maks Mazurov <[email protected]>
- Target audience: LEWG, LWG
The goal is to add compile-time information to string classes about used encoding along with conversion functions.
- Provide more flexible replacement for deprecated
std::wstring_convert. - You already have
std::stringand/orstd::string_viewall around your code. Why copy to 3rd-party's encoding-aware string, when you can operate on standard string? - ...
Note: Declarations written as if they are inside std namespace (std:: is omitted). ... is all other template arguments.
- Add
Encodingtemplate argument tostd::basic_stringandstd::basic_string_view.
template<
class CharT,
class Traits = char_traits<CharT>,
class Allocator = allocator<CharT>,
class Encoding = string::default_encoding
> class basic_string;
template<
class CharT,
class Traits = char_traits<CharT>,
class Encoding = string_view::default_encoding
> class basic_string_view;- Add template member function
to_encodingtostd::basic_stringandstd::basic_string_view.
template<class TargetEncoding>
basic_string<..., TargetEncoding> to_encoding() const;Note: to_encoding may return std::basic_string with different CharT. This is required to support UTF-16 and similar encodings.
-
std::ascii7-bit ASCII encoding. -
std::nativeSystem native encoding. -
std::wideSystem native encoding for wide characters. -
std::utf8UTF-8 (RFC 3629) encoding tag. -
std::utf16UTF-16 (RFC 2781) encoding tag. -
std::utf32UTF-32 encoding tag. -
std::string::default_encodingImplementation-defined encoding, can be any of specified above or another unrelated encoding. Not required to be same between program runs. -
std::string_view::default_encodingMust be same asstd::string::default_encoding.
Implementation can provide additional encodings.
The implementation is allowed to store some information about encoding in static fields.
Example:
We have encoding conversion library that identifies various encoding using string names.
It is allowed to use encoding tag static field (preferably constexpr) named encoding_name.
struct win1251 {
static constexpr const char* library_encoding_name = "cp1251";
template<class TargetEncoding>
static basic_string<..., TargetEncoding> to_encoding(const char* sptr, size_t length) {
library_convert(TargetEncoding::library_encoding_name, win1251::library_encoding_name);
}
};