Lightweight UTF-8-based string library with some modern improvements
The class utf::string
describes a dynamic, contiguous storage of UTF-8-encoded characters set.
- Dynamic length;
- Methods chaining;
- Fixed "Unsigned size_type problem" —
utf::string::size_type
isptrdiff_t
, unlike STL'ssize_t
; - Non-owning inner type for viewing and iteration —
utf::string::view
(alsoutf::string_view
)... - ...and rights to view and change are completely divided between
string
s andview
s by design!
- Download the library source;
#include
theutf8string.hpp
file in your C++ project;- Enjoy!
⚠️ Note that the library requires C++17 support
See more examples in the
sample.cpp
source file
- Creating the string:
// Constructing via const char*
utf::string MyString1 { "Amazing chest ahead" };
// Using std::initializer_list with integral code points
auto MyString2 = utf::string::from_unicode({ 'L',0xf6,'w','e', 'L',0xe9,'o','p','a','r','d' });
// From a vector of bytes (already encoded in UTF-8)
auto MyString3 = utf::string::from_bytes({ 'B','y','t','e','s' });
// As multiple copies of the character
utf::string MyString4 { 0xA2, 10 }; // == "¢¢¢¢¢¢¢¢¢¢"
// From an std::string
auto MyString5 = utf::string::from_std_string("Evil is evil");
- Iterating over the characters:
utf::string Line { "Il buono, il brutto, il cattivo" };
// Using C++20 init-for
for (auto view = Line.chars(); auto ch : view)
{
std::cout << ch << std::endl; // prints chars' code points
}
- Chaining:
utf::string Line { "Mr Dursley was the director of a firm called Grunnings" };
// Remove all spaces
Line.clone().remove(' ');
/* or */
Line.clone().remove_if(utf::is_space /* handles over 20 different Unicode spaces */ );
// Cut the last word off
std::cout <<
Line.first(Line.chars().reverse().find_if(utf::is_space).as_forward_index()).to_string();
// ↑ ↑
// no need to clone here — just operating with the view and actually clone here
- Multi-pattern operations:
utf::string Line { "Stumbling everywhere" };
// Searches all occurences of every pattern in the parameter pack
auto all_matches = Line.chars().matches("every", "everywhere", "around");
// ^^^^ - for substring-matching version the type is std::vector<view>
for (auto& vi : all_matches)
{
std::cout << vi << std::endl; // prints "every", "everywhere"
}
// Removes the longest found substrings
std::cout << Line.remove("every", "everywhere"); // prints "Stumbling ", not "Stumbling where"
- Access to the:
- Single character:
front()
,back()
— constant / O(1)- N-th (
get(N)
) — linear / O(N) - Back character with removal (
pop()
) — constant / O(1)
- Substring's view (by
chars(...)
,first(...)
,last(...)
) — linear / O(N) - Entire string's view (
chars()
) — constant / O(1)
- Single character:
- Insertion — linear / O(N); requires extra memory reallocation
- Search (
find*(...)
,contains*(...)
,count*(...)
) / erasure (erase(...)
,remove*(...)
) — linear / O(N) - Length calculation — linear / O(N) as it requires iteration over every character in the string
Note that a replacement (replace*(...)
) is more complicated. It behaves like an insertion if the new substring is longer (by its size()
) than the replacement. Otherwise, the operation does not requires an extra memory and behaves like an erasure; both cases have linear / O(N) time complexity.
See the LICENSE file for license rights and limitations (MIT).