Strings & binary data

Librcd uses the data type fstr_t for representing binary data. An fstr_t consists of two members: a length (size_t len) and a pointer to the data (uint8_t* str). The memory is not owned by the structure, and in fact fstrings can point into arbitrary memory, e.g. the middle of another fstring. To represent owned memory, an fstr_mem_t* is used.

Many convenience functions and macros is defined for fstr_ts, for instance:

fstr_slice, for creating a substring (slice) of another fstring, pointing into the same memory
fstr_cpy, fstr_cpy_over, for copying data between fstrings
conc, for concatenating fstrings (returning an fstr_mem_t* - see also concs and sconc)
fss, for converting from fstr_mem_t* to fstr_t
FSTR_PACK, for packing an arbitrary C type into an fstr_t by taking its address and size

A more complete list, including detailed comments, can be found in fstring.h.

Text strings are normally represented by their UTF-8 encoded forms. There are a few helper functions for dealing with UTF-8-encoded text (such as extracting Unicode code points), but usually they are not needed: applications tend to be content agnostic except for small sets of parse-affecting control characters, and those generally always have single-byte encodings within UTF-8¹.

Librcd's preprocessor (rcd-pp) will automatically convert any string literals to their fstr_t equivalents - for instance, "abc" will be converted into ((fstr_t){.str = "abc", .len = 3}). This conversion is only performed after the magic word #pragma librcd occurs in the preprocessed source code, which must happen only within source (*.c) files. This avoids inflicting global state upon unrelated header files. To define fstring constants within headers, the fstr macro can be used to convert C string literals into fstr_ts.

¹ Note that due to the design of UTF-8, a byte cannot be both a starting byte and a continuation byte. Thus, when searching for single-byte characters it is not necessary to care about parsing of prior multi-byte characters. The same property is coincidentally also what makes concatenation of content-controlled strings with trusted ones safe, even for content containing broken UTF-8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strings & binary data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally