Description
null-terminated is a problem, arguably C strings should have always had a length field.
Problem: if the null terminator is missing, buffer overflow security issue.
But we don't want to break compatibility with all existing C code.
all C APIs that accept char *
should also accept a length
parameter. Like this:
void foo(char *some_string, int some_string_len);
Optional: let the user supply -1 and do:
if (some_string_len == -1)
some_string_len = strlen(some_string);
Or avoid the extra logic and branch and just force the user do strlen
themselves if they don't already have the length available.
Benefit: string processing functions that need to know the string length get it in O(1) instead of O(N). Examples: getting a file extension, getting the file name from a path, searching a string backwards. String equality checking can first check the length.
Benefit: utf-8 encoded strings. code point 0 is a valid unicode character and it might make sense to allow it in a string. If you use null terminated strings you can't store this character in UTF-8 but if you have the length then you can.
Benefit: strings don't even need null terminators. I'd recommend still having them so you can pass strings to other C functions, but you could for example perform string operation on a substring without allocating or inserting a null terminator hack.
Arguably one should always have a field for the length of a string and this API encourages and rewards that practice, while still remaining compatible with null-terminated strings.
When you return a char *
you should also return the length. Example:
char *alloc_sprintf(int *len, const char *format, ...) {
va_list ap, ap2;
va_start(ap, format);
va_copy(ap2, ap);
int len1 = vsnprintf(nullptr, 0, format, ap);
assert(len1 >= 0);
size_t required_size = len1 + 1;
char *mem = allocate<char>(required_size);
if (!mem)
return nullptr;
int len2 = vsnprintf(mem, required_size, format, ap2);
assert(len2 == len1);
va_end(ap2);
va_end(ap);
if (len)
*len = len1;
return mem;
}