Hints

This is a Proof of Concept for storing a large volume of audit facts in Riak by bulking them together in large objects; and then searching for facts using a map function across hints files which are generated by a pre-commit hook.

Overview

The scenario to be covered in the Proof of concept is as follows:

A large number of overall facts - order(100 billion);
Facts grouped into objects with >> 1000 but < 100000 facts per object;
Most individual facts appear in <1% of objects, so value to be gained in finding facts without trawling through all objects;
All objects will have keys starting with a natural filter to use to split out the Map job (so that we orchestrate small map jobs outside of riak, rather than relying on riak to manage a large long-running job), so that individual map jobs are run against keys stored contiguously.

On loading of the object a project-specific extract_data_forprocess/2 function is called which should extract the Facts to be indexed in the hints files from the original object (as well as any new 2i fields/terms to be added to both the original and the hints object).

The hints file is a binary representation of the facts using a bloom filter with a low false positive rate, but just a single hash. This makes the bloom large, but it is compressed for storage using rice encoding. This provides a good ratio between disk footprint and false positive rate, whilst also providing a natural checksum when processing to protect against any random bit-flipping.

To improve processing time, the bloom is first partitioned - so only a part of the bloom needs to be checked.

The map job (using function map_checkhints/3) can then be run across a subset (or all) of the hints files looking for the presence of one or more facts - outputting {Fact, Key} indicating in which keys each fact can be found.

Caveats

Note standard Basho guidance to use Map sparingly.
Important to use AAE to mitigate risk of r=1 query.
Lots of TODO.
Currently no proof at volume.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hints

Overview

Caveats

About

Uh oh!

Releases

Packages

Languages

License

martinsumner/riak_hints

Folders and files

Latest commit

History

Repository files navigation

Hints

Overview

Caveats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages