NT4C Readme

Home: https://github.com/1Hyena/nt4c
Issue tracker: https://github.com/1Hyena/nt4c/issues

NT4C stands for "NestedText for C" and that is exactly what this project is about.

What is NestedText

In short, NestedText is a file format for holding structured data.

The following resources can explain more if you are unfamiliar with it:

What is NT4C

NT4C is a NestedText parser implementation written in accordance with the C23 standard of the C programming language. It includes the following features:

Compliance: NT4C aims to comply with the latest version of the NestedText specification. However, it is currently only compliant with the Minimal NestedText specification.
Performance: NT4C is fast as it does not involve any heap memory allocations. It also avoids unnecessary memory copying by directly referencing the input text in the resulting graph.
Compactness: The NT4C parser is implemented in a single header file with no dependencies other than the standard C library.
Embedding: The NT4C parser is easily reusable in other projects with a simple API that includes a few key functions, primarily nt_parse().
Tree model: NT4C parses the entire document and constructs a graph (DOM) where each node directly references a segment from the input string.
Portability: NT4C builds and functions on Linux. It should be relatively simple to make it run on most other platforms as long as the platform provides the C standard library.
Encoding: NT4C expects UTF-8 encoding of the input text and does not attempt to detect Unicode encoding errors.
Permissive license: NT4C is available under the MIT license.

Using NT4C

Parsing NestedText

To parse a NestedText document, you can include the nt4c.h header file directly in your codebase. The parser is implemented in a single C header file for easy integration.

The main function to use is nt_parse(), which takes a text in NestedText syntax and a pointer to the NT_PARSER structure for customizing the deserialization process.

The NT_PARSER structure stores parsing configuration and the parsing process state. By default, it can handle up to NT_PARSER_NCOUNT nodes in its internal memory. However, you can use the nt_parser_set_memory function to work with a custom array of NT_NODE structures.

When you call nt_parse(), the parser populates the deserialization graph of the document with nodes. It continues processing even if the output buffer reaches its capacity.

After a successful parsing operation, nt_parse() returns the number of nodes in the input text. This information can help you determine the memory required for storing the deserialization graph. If parsing fails, the function returns a negative value.

The deserialization graph is considered fully stored when the value returned by nt_parse() is non-negative and does not exceed the output buffer's capacity.

API

nt4c/nt4c.h

Lines 76 to 81 in acc5fe1

    
           static int          nt_parse(const char *text, size_t text_size, NT_PARSER *); 
        
           static void         nt_parser_set_memory(NT_PARSER *, NT_NODE *, size_t ncount); 
        
           static void         nt_parser_set_recursion(NT_PARSER *, size_t depth); 
        
           static void         nt_parser_set_blacklist(NT_PARSER *, NT_TYPE blacklist); 
        
           static void         nt_parser_set_whitelist(NT_PARSER *, NT_TYPE whitelist); 
        
           static const char * nt_code(NT_TYPE);

nt4c/nt4c.h

Lines 42 to 73 in acc5fe1

    
           typedef enum : uint32_t { 
        
               NT_NONE = 0, 
        
               //////////////////////////////////////////////////////////////////////////// 
        
               NT_TOP_DCT      = 1 <<  0,  // root node contains a dictionary 
        
               NT_TOP_LST      = 1 <<  1,  // root node contains a list 
        
               NT_TOP_MLS      = 1 <<  2,  // root node contains a multiline string 
        
               NT_TOP_NIL      = 1 <<  3,  // root node does not hold any meaningful data 
        
               NT_KEY_ROL      = 1 <<  4,  // name of the key for a rest-of-line string 
        
               NT_KEY_MLS      = 1 <<  5,  // name of the key for a multiline string 
        
               NT_KEY_LST      = 1 <<  6,  // name of the key for the following list 
        
               NT_KEY_DCT      = 1 <<  7,  // name of the key for the following dictionary 
        
               NT_KEY_NIL      = 1 <<  8,  // name of the key for missing value 
        
               NT_SET_ROL      = 1 <<  9,  // node references a rest-of-line assigment 
        
               NT_SET_MLS      = 1 << 10,  // node references a multiline assigment 
        
               NT_SET_LST      = 1 << 11,  // node references a list assigment 
        
               NT_SET_DCT      = 1 << 12,  // node references a dictionary assigment 
        
               NT_SET_NIL      = 1 << 13,  // node references a nil assignment 
        
               NT_TAG_MLS      = 1 << 14,  // node references the tag of a multiline string 
        
               NT_TAG_COM      = 1 << 15,  // node references the tag of a comment line 
        
               NT_TAG_LST_ROL  = 1 << 16,  // tag of the enlisted rest-of-line string 
        
               NT_TAG_LST_MLS  = 1 << 17,  // tag of the enlisted multiline string 
        
               NT_TAG_LST_LST  = 1 << 18,  // tag of the enlisted sublist 
        
               NT_TAG_LST_DCT  = 1 << 19,  // tag of the enlisted dictionary 
        
               NT_TAG_LST_NIL  = 1 << 20,  // tag of the enlisted nil value 
        
               NT_STR_ROL      = 1 << 21,  // node references a rest-of-line string 
        
               NT_STR_MLN      = 1 << 22,  // node references a multiline string 
        
               NT_STR_COM      = 1 << 23,  // node references a comment string 
        
               NT_NEWLINE      = 1 << 24,  // node references the new line data 
        
               NT_SPACE        = 1 << 25,  // node references the (indentation) spaces 
        
               NT_INVALID      = 1 << 26,  // node references a segment of invalid input 
        
               NT_DEEP         = 1 << 27   // node that exceeds the maximum nesting depth 
        
           } NT_TYPE;

Specify the size of the integrated memory buffer of the NT_PARSER structure by defining the NT_PARSER_NCOUNT macro before including the nt4c.h header. The integrated memory was added to increase the API usage convenience in cases where the size of the input document is always known to be small (see ex_hello and ex_pretty).

nt4c/nt4c.h

Lines 35 to 37 in acc5fe1

    
           #ifndef NT_PARSER_NCOUNT 
        
           #define NT_PARSER_NCOUNT 8 
        
           #endif

Examples

ex_hello

The ex_hello example demonstrates how to use the NT4C parser to generate the text "hello world" and display it on the screen.

nt4c/examples/src/ex_hello.c

Lines 5 to 15 in 1f85958

    
           int main(int, char **) { 
        
               NT_PARSER parser = {}; 
        
               if (nt_parse("hello world", 0, &parser) <= 0) { 
        
                   return EXIT_FAILURE; 
        
               } 
        
               printf("%.*s\n", (int) parser.doc.begin->size, parser.doc.begin->data); 
        
               return EXIT_SUCCESS; 
        
           }

ex_echo

This example demonstrates how to utilize the NT4C parser to parse and display a NestedText document on the screen. The input document undergoes parsing twice. Initially, the length of the document is calculated. Subsequently, a variable-length array is set up to store the Document Object Model (DOM).

nt4c/examples/src/ex_echo.c

Lines 10 to 34 in 1f85958

    
           int main(int, char **) { 
        
               int node_count = nt_parse(input_data, sizeof(input_data), nullptr); 
        
               if (node_count <= 0) { 
        
                   fprintf(stderr, "%s\n", "parse error"); 
        
                   return EXIT_FAILURE; 
        
               } 
        
               NT_NODE nodes[node_count]; 
        
               NT_PARSER parser = {}; 
        
               nt_parser_set_memory(&parser, nodes, sizeof(nodes)/sizeof(nodes[0])); 
        
               if (nt_parse(input_data, sizeof(input_data), &parser) > (int) node_count) { 
        
                   fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length); 
        
                   return EXIT_FAILURE; 
        
               } 
        
               for (NT_NODE *it = parser.doc.begin; it < parser.doc.end; ++it) { 
        
                   printf("%.*s", (int) it->size, it->data); 
        
               } 
        
               return EXIT_SUCCESS; 
        
           }

ex_pretty

This example shows how to use the NT4C parser to pretty-print a NestedText document. It reformats the input text and adds syntax highlighting.

nt4c/examples/src/ex_pretty.c

Lines 65 to 74 in 1f85958

    
           int main(int, char **) { 
        
               NT_PARSER parser = { .settings = { .blacklist = NT_SPACE|NT_NEWLINE } }; 
        
               if (nt_parse((char *) input_data, 0, &parser) > (int) parser.mem.capacity) { 
        
                   fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length); 
        
                   return EXIT_FAILURE; 
        
               } 
        
               return pretty_print(parser.doc.root, 0); 
        
           }

Here is a NestedText document before and after pretty-printing, as shown in the screenshot below:

nt4c/examples/ugly.nt

Lines 1 to 29 in 934ab23

    
           this                                                     : 
        
            is                 : an ugly 𝓪𝓼𝓼 
        
                                                          # NestedText 
        
                                           #document 
        
           #and we are going 
        
           to parse it                                          : 
        
            - for the purpose 
        
                               # of 
        
           making                                        : 
        
                           it                 : 
        
                     appear                                      : 
        
                                   - more 
        
                                  - easily 
        
                          - readable 
        
                  by                             : 
        
                                                          > indenting 
        
                   > the document 
        
                                 > properly 
        
           and           : 
        
                                                                  we: 
        
                                   also              : 
        
                                                   added            : 
        
                                                       -color 
        
                                                           # ... to 
        
           highlight                        : 
        
                                        - syntax 
        
                                                           # 
        
                                          # errors

ex_tree

This example shows how to use the NT4C parser to print the structure of a NestedText document on the screen.

nt4c/examples/src/ex_tree.c

Lines 79 to 97 in 1f85958

    
           int main(int, char **) { 
        
               constexpr size_t node_count = 200; 
        
               NT_NODE nodes[node_count]; 
        
               NT_PARSER parser = {}; 
        
               nt_parser_set_memory(&parser, nodes, node_count); 
        
               int result = nt_parse((char *) input_data, sizeof(input_data), &parser); 
        
               if (result > (int) node_count) { 
        
                   fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length); 
        
                   return EXIT_FAILURE; 
        
               } 
        
               print_tree(parser.doc.root, 0); 
        
               return EXIT_SUCCESS; 
        
           }

Here is a screenshot showing the structure of the parsed NestedText document:

License

NT4C has been authored by Erich Erstu and is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
examples		examples
img		img
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
nt4c.h		nt4c.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NT4C Readme

What is NestedText

What is NT4C

Using NT4C

Parsing NestedText

API

Examples

ex_hello

ex_echo

ex_pretty

ex_tree

License

About

Uh oh!

Languages

	static int nt_parse(const char text, size_t text_size, NT_PARSER );
	static void nt_parser_set_memory(NT_PARSER , NT_NODE , size_t ncount);
	static void nt_parser_set_recursion(NT_PARSER *, size_t depth);
	static void nt_parser_set_blacklist(NT_PARSER *, NT_TYPE blacklist);
	static void nt_parser_set_whitelist(NT_PARSER *, NT_TYPE whitelist);
	static const char * nt_code(NT_TYPE);

	typedef enum : uint32_t {
	NT_NONE = 0,
	////////////////////////////////////////////////////////////////////////////
	NT_TOP_DCT = 1 << 0, // root node contains a dictionary
	NT_TOP_LST = 1 << 1, // root node contains a list
	NT_TOP_MLS = 1 << 2, // root node contains a multiline string
	NT_TOP_NIL = 1 << 3, // root node does not hold any meaningful data
	NT_KEY_ROL = 1 << 4, // name of the key for a rest-of-line string
	NT_KEY_MLS = 1 << 5, // name of the key for a multiline string
	NT_KEY_LST = 1 << 6, // name of the key for the following list
	NT_KEY_DCT = 1 << 7, // name of the key for the following dictionary
	NT_KEY_NIL = 1 << 8, // name of the key for missing value
	NT_SET_ROL = 1 << 9, // node references a rest-of-line assigment
	NT_SET_MLS = 1 << 10, // node references a multiline assigment
	NT_SET_LST = 1 << 11, // node references a list assigment
	NT_SET_DCT = 1 << 12, // node references a dictionary assigment
	NT_SET_NIL = 1 << 13, // node references a nil assignment
	NT_TAG_MLS = 1 << 14, // node references the tag of a multiline string
	NT_TAG_COM = 1 << 15, // node references the tag of a comment line
	NT_TAG_LST_ROL = 1 << 16, // tag of the enlisted rest-of-line string
	NT_TAG_LST_MLS = 1 << 17, // tag of the enlisted multiline string
	NT_TAG_LST_LST = 1 << 18, // tag of the enlisted sublist
	NT_TAG_LST_DCT = 1 << 19, // tag of the enlisted dictionary
	NT_TAG_LST_NIL = 1 << 20, // tag of the enlisted nil value
	NT_STR_ROL = 1 << 21, // node references a rest-of-line string
	NT_STR_MLN = 1 << 22, // node references a multiline string
	NT_STR_COM = 1 << 23, // node references a comment string
	NT_NEWLINE = 1 << 24, // node references the new line data
	NT_SPACE = 1 << 25, // node references the (indentation) spaces
	NT_INVALID = 1 << 26, // node references a segment of invalid input
	NT_DEEP = 1 << 27 // node that exceeds the maximum nesting depth
	} NT_TYPE;

	int main(int, char **) {
	NT_PARSER parser = {};

	if (nt_parse("hello world", 0, &parser) <= 0) {
	return EXIT_FAILURE;
	}

	printf("%.*s\n", (int) parser.doc.begin->size, parser.doc.begin->data);

	return EXIT_SUCCESS;
	}

	int main(int, char **) {
	int node_count = nt_parse(input_data, sizeof(input_data), nullptr);

	if (node_count <= 0) {
	fprintf(stderr, "%s\n", "parse error");
	return EXIT_FAILURE;
	}

	NT_NODE nodes[node_count];
	NT_PARSER parser = {};

	nt_parser_set_memory(&parser, nodes, sizeof(nodes)/sizeof(nodes[0]));

	if (nt_parse(input_data, sizeof(input_data), &parser) > (int) node_count) {
	fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);

	return EXIT_FAILURE;
	}

	for (NT_NODE *it = parser.doc.begin; it < parser.doc.end; ++it) {
	printf("%.*s", (int) it->size, it->data);
	}

	return EXIT_SUCCESS;
	}

	int main(int, char **) {
	NT_PARSER parser = { .settings = { .blacklist = NT_SPACE\|NT_NEWLINE } };

	if (nt_parse((char *) input_data, 0, &parser) > (int) parser.mem.capacity) {
	fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
	return EXIT_FAILURE;
	}

	return pretty_print(parser.doc.root, 0);
	}

	this :
	is : an ugly 𝓪𝓼𝓼
	# NestedText
	#document
	#and we are going
	to parse it :
	- for the purpose
	# of
	making :
	it :
	appear :
	- more
	- easily
	- readable
	by :
	> indenting
	> the document
	> properly
	and :
	we:
	also :
	added :
	-color
	# ... to

	highlight :
	- syntax
	#
	# errors

	int main(int, char **) {
	constexpr size_t node_count = 200;
	NT_NODE nodes[node_count];
	NT_PARSER parser = {};

	nt_parser_set_memory(&parser, nodes, node_count);

	int result = nt_parse((char *) input_data, sizeof(input_data), &parser);

	if (result > (int) node_count) {
	fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);

	return EXIT_FAILURE;
	}

	print_tree(parser.doc.root, 0);

	return EXIT_SUCCESS;
	}

License

1Hyena/nt4c

Folders and files

Latest commit

History

Repository files navigation

NT4C Readme

What is NestedText

What is NT4C

Using NT4C

Parsing NestedText

API

Examples

ex_hello

ex_echo

ex_pretty

ex_tree

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages