Skip to content

1Hyena/nt4c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NT4C Readme

NT4C stands for "NestedText for C" and that is exactly what this project is about.

What is NestedText

In short, NestedText is a file format for holding structured data.

The following resources can explain more if you are unfamiliar with it:

What is NT4C

NT4C is a NestedText parser implementation written in accordance with the C23 standard of the C programming language. It includes the following features:

  • Compliance: NT4C aims to comply with the latest version of the NestedText specification. However, it is currently only compliant with the Minimal NestedText specification.

  • Performance: NT4C is fast as it does not involve any heap memory allocations. It also avoids unnecessary memory copying by directly referencing the input text in the resulting graph.

  • Compactness: The NT4C parser is implemented in a single header file with no dependencies other than the standard C library.

  • Embedding: The NT4C parser is easily reusable in other projects with a simple API that includes a few key functions, primarily nt_parse().

  • Tree model: NT4C parses the entire document and constructs a graph (DOM) where each node directly references a segment from the input string.

  • Portability: NT4C builds and functions on Linux. It should be relatively simple to make it run on most other platforms as long as the platform provides the C standard library.

  • Encoding: NT4C expects UTF-8 encoding of the input text and does not attempt to detect Unicode encoding errors.

  • Permissive license: NT4C is available under the MIT license.

Using NT4C

Parsing NestedText

To parse a NestedText document, you can include the nt4c.h header file directly in your codebase. The parser is implemented in a single C header file for easy integration.

The main function to use is nt_parse(), which takes a text in NestedText syntax and a pointer to the NT_PARSER structure for customizing the deserialization process.

The NT_PARSER structure stores parsing configuration and the parsing process state. By default, it can handle up to NT_PARSER_NCOUNT nodes in its internal memory. However, you can use the nt_parser_set_memory function to work with a custom array of NT_NODE structures.

When you call nt_parse(), the parser populates the deserialization graph of the document with nodes. It continues processing even if the output buffer reaches its capacity.

After a successful parsing operation, nt_parse() returns the number of nodes in the input text. This information can help you determine the memory required for storing the deserialization graph. If parsing fails, the function returns a negative value.

The deserialization graph is considered fully stored when the value returned by nt_parse() is non-negative and does not exceed the output buffer's capacity.

API

nt4c/nt4c.h

Lines 76 to 81 in acc5fe1

static int nt_parse(const char *text, size_t text_size, NT_PARSER *);
static void nt_parser_set_memory(NT_PARSER *, NT_NODE *, size_t ncount);
static void nt_parser_set_recursion(NT_PARSER *, size_t depth);
static void nt_parser_set_blacklist(NT_PARSER *, NT_TYPE blacklist);
static void nt_parser_set_whitelist(NT_PARSER *, NT_TYPE whitelist);
static const char * nt_code(NT_TYPE);

nt4c/nt4c.h

Lines 42 to 73 in acc5fe1

typedef enum : uint32_t {
NT_NONE = 0,
////////////////////////////////////////////////////////////////////////////
NT_TOP_DCT = 1 << 0, // root node contains a dictionary
NT_TOP_LST = 1 << 1, // root node contains a list
NT_TOP_MLS = 1 << 2, // root node contains a multiline string
NT_TOP_NIL = 1 << 3, // root node does not hold any meaningful data
NT_KEY_ROL = 1 << 4, // name of the key for a rest-of-line string
NT_KEY_MLS = 1 << 5, // name of the key for a multiline string
NT_KEY_LST = 1 << 6, // name of the key for the following list
NT_KEY_DCT = 1 << 7, // name of the key for the following dictionary
NT_KEY_NIL = 1 << 8, // name of the key for missing value
NT_SET_ROL = 1 << 9, // node references a rest-of-line assigment
NT_SET_MLS = 1 << 10, // node references a multiline assigment
NT_SET_LST = 1 << 11, // node references a list assigment
NT_SET_DCT = 1 << 12, // node references a dictionary assigment
NT_SET_NIL = 1 << 13, // node references a nil assignment
NT_TAG_MLS = 1 << 14, // node references the tag of a multiline string
NT_TAG_COM = 1 << 15, // node references the tag of a comment line
NT_TAG_LST_ROL = 1 << 16, // tag of the enlisted rest-of-line string
NT_TAG_LST_MLS = 1 << 17, // tag of the enlisted multiline string
NT_TAG_LST_LST = 1 << 18, // tag of the enlisted sublist
NT_TAG_LST_DCT = 1 << 19, // tag of the enlisted dictionary
NT_TAG_LST_NIL = 1 << 20, // tag of the enlisted nil value
NT_STR_ROL = 1 << 21, // node references a rest-of-line string
NT_STR_MLN = 1 << 22, // node references a multiline string
NT_STR_COM = 1 << 23, // node references a comment string
NT_NEWLINE = 1 << 24, // node references the new line data
NT_SPACE = 1 << 25, // node references the (indentation) spaces
NT_INVALID = 1 << 26, // node references a segment of invalid input
NT_DEEP = 1 << 27 // node that exceeds the maximum nesting depth
} NT_TYPE;

Specify the size of the integrated memory buffer of the NT_PARSER structure by defining the NT_PARSER_NCOUNT macro before including the nt4c.h header. The integrated memory was added to increase the API usage convenience in cases where the size of the input document is always known to be small (see ex_hello and ex_pretty).

nt4c/nt4c.h

Lines 35 to 37 in acc5fe1

#ifndef NT_PARSER_NCOUNT
#define NT_PARSER_NCOUNT 8
#endif

Examples

ex_hello

The ex_hello example demonstrates how to use the NT4C parser to generate the text "hello world" and display it on the screen.

int main(int, char **) {
NT_PARSER parser = {};
if (nt_parse("hello world", 0, &parser) <= 0) {
return EXIT_FAILURE;
}
printf("%.*s\n", (int) parser.doc.begin->size, parser.doc.begin->data);
return EXIT_SUCCESS;
}

screenshot

ex_echo

This example demonstrates how to utilize the NT4C parser to parse and display a NestedText document on the screen. The input document undergoes parsing twice. Initially, the length of the document is calculated. Subsequently, a variable-length array is set up to store the Document Object Model (DOM).

int main(int, char **) {
int node_count = nt_parse(input_data, sizeof(input_data), nullptr);
if (node_count <= 0) {
fprintf(stderr, "%s\n", "parse error");
return EXIT_FAILURE;
}
NT_NODE nodes[node_count];
NT_PARSER parser = {};
nt_parser_set_memory(&parser, nodes, sizeof(nodes)/sizeof(nodes[0]));
if (nt_parse(input_data, sizeof(input_data), &parser) > (int) node_count) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return EXIT_FAILURE;
}
for (NT_NODE *it = parser.doc.begin; it < parser.doc.end; ++it) {
printf("%.*s", (int) it->size, it->data);
}
return EXIT_SUCCESS;
}

screenshot

ex_pretty

This example shows how to use the NT4C parser to pretty-print a NestedText document. It reformats the input text and adds syntax highlighting.

int main(int, char **) {
NT_PARSER parser = { .settings = { .blacklist = NT_SPACE|NT_NEWLINE } };
if (nt_parse((char *) input_data, 0, &parser) > (int) parser.mem.capacity) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return EXIT_FAILURE;
}
return pretty_print(parser.doc.root, 0);
}

Here is a NestedText document before and after pretty-printing, as shown in the screenshot below:

nt4c/examples/ugly.nt

Lines 1 to 29 in 934ab23

this :
is : an ugly 𝓪𝓼𝓼
# NestedText
#document
#and we are going
to parse it :
- for the purpose
# of
making :
it :
appear :
- more
- easily
- readable
by :
> indenting
> the document
> properly
and :
we:
also :
added :
-color
# ... to
highlight :
- syntax
#
# errors

screenshot

ex_tree

This example shows how to use the NT4C parser to print the structure of a NestedText document on the screen.

int main(int, char **) {
constexpr size_t node_count = 200;
NT_NODE nodes[node_count];
NT_PARSER parser = {};
nt_parser_set_memory(&parser, nodes, node_count);
int result = nt_parse((char *) input_data, sizeof(input_data), &parser);
if (result > (int) node_count) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return EXIT_FAILURE;
}
print_tree(parser.doc.root, 0);
return EXIT_SUCCESS;
}

Here is a screenshot showing the structure of the parsed NestedText document:

screenshot

License

NT4C has been authored by Erich Erstu and is released under the MIT license.