Tree-sitter is an incremental parsing library in C and C++, intended to be used via bindings to higher-level languages. It allows documents to be efficiently re-parsed after localized edits, making it suitable for use in performance-intensive text-editing programs.
Tree-sitter uses a sentential-form incremental LR parsing algorithm, as described in the paper Efficient and Flexible Incremental Parsing by Tim Wagner. It handles ambiguity at compile-time via precedence annotations, and at run-time via the GLR algorithm. This allows it to generate a fast parser for any context-free grammar.
script/configure.sh  # Generate a Makefile using gyp
make                 # Build static libraries for the compiler and runtimeTree-sitter's interface for creating grammars is a C++ library, libcompiler.
This allows grammars and rules to be defined, manipulated and
extended as simple values in high-level languages like javascript,
and then converted into tree-sitter's native representation and compiled to C
parsers. These parsers can then be used from any language that has a binding to
tree-sitter's runtime library, libruntime.
Here's a simple example that uses libcompiler directly:
// arithmetic_grammar.cc
#include <assert.h>
#include <stdio.h>
#include "tree_sitter/compiler.h"
using namespace tree_sitter;
int main() {
  auto arithmetic_grammar = Grammar({
    // The first rule listed in a grammar becomes the 'start rule'.
    { "expression", choice({
      sym("sum"),
      sym("product"),
      sym("number"),
      sym("variable"),
      // Error recovery is controlled by wrapping rule subtrees with `err`.
      seq({
        str("("),
        err(sym("expression")),
        str(")") }) }) },
    // Tokens like '+' and '*' are described directly within the grammar's rules,
    // as opposed to in a seperate lexer description.
    { "sum", prec_left(1, seq({
      sym("expression"),
      str("+"),
      sym("expression") })) },
    // Ambiguities can be resolved at compile time by assigning precedence
    // values to rule subtrees.
    { "product", prec_left(2, seq({
      sym("expression"),
      str("*"),
      sym("expression") })) },
    // Tokens can be specified using ECMAScript regexps.
    { "number", pattern("\\d+") },
    { "variable", pattern("[a-zA-Z]+\\w*") },
    { "comment", pattern("//.*") },
  }).ubiquitous_tokens({
    // Things that can appear anywhere in the language are expressed as
    // 'ubiquitous tokens'.
    sym("comment"),
    pattern("\\s+")
  });
  // Generate C code for parsing this language.
  auto output = compile(arithmetic_grammar, "arithmetic");
  std::string c_code = output.first;
  const GrammarError *error = output.second;
  assert(!error);
  puts(c_code.c_str());
  return 0;
}To create a parser for this language, compile and run this grammar like this:
clang++ -stdlib=libc++ -std=c++11                             \
  -I tree-sitter/include -L tree-sitter/out/Debug -l compiler \
  arithmetic_grammar.cc -o arithmetic_grammar
./arithmetic_grammar > arithmetic_parser.cThe tree_sitter/runtime C library exposes a DOM-style interface for inspecting
documents.
Functions like ts_node_child(node, index) and ts_node_next_sibling(node)
expose every node in the concrete syntax tree. This is useful for operations
like syntax-highlighting, that operate on a token-by-token basis. You can also
traverse the tree in a more abstract way by using functions like
ts_node_named_child(node, index) and ts_node_next_named_sibling(node). These
functions don't expose nodes that were specified in the grammar as anonymous
tokens, like ( and +. This is useful when analyzing the meaning of a document.
#include <stdio.h>
#include "tree_sitter/runtime.h"
// Declare the language constructor that was generated from your grammar.
TSLanguage *ts_language_arithmetic();
int main() {
  TSDocument *document = ts_document_make();
  ts_document_set_language(document, ts_language_arithmetic());
  // Usually, you would use the more general `ts_document_set_input`, which
  // takes a struct with function pointers for seeking to positions in the text,
  // and reading chunks of text. This allows you to efficiently parse text
  // stored in your own data structure.
  ts_document_set_input_string(document, "a + b * 5");
  ts_document_parse(document);
  TSNode root_node = ts_document_root_node(document);
  printf(
    "Root name: %s, start: %lu, end: %lu\n",
    ts_node_name(root_node, document),
    ts_node_start_char(root_node),
    ts_node_end_char(root_node)
  );
  TSNode product_node = ts_node_named_child(ts_node_child(root_node, 0), 1);
  printf(
    "Child name: %s, start: %lu, end: %lu\n",
    ts_node_name(product_node, document),
    ts_node_start_char(product_node),
    ts_node_end_char(product_node)
  );
  ts_document_free(document);
  return 0;
}To demo this parser's capabilities, compile this program like this:
clang                                                        \
  -I tree-sitter/include -L tree-sitter/out/Debug -l runtime \
  arithmetic_parser.c test_parser.c -o test_parser
./test_parser