Skip to content

martinsumner/riak_hints

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hints

This is a Proof of Concept for storing a large volume of audit facts in Riak by bulking them together in large objects; and then searching for facts using a map function across hints files which are generated by a pre-commit hook.

Overview

The scenario to be covered in the Proof of concept is as follows:

  • A large number of overall facts - order(100 billion);
  • Facts grouped into objects with >> 1000 but < 100000 facts per object;
  • Most individual facts appear in <1% of objects, so value to be gained in finding facts without trawling through all objects;
  • All objects will have keys starting with a natural filter to use to split out the Map job (so that we orchestrate small map jobs outside of riak, rather than relying on riak to manage a large long-running job), so that individual map jobs are run against keys stored contiguously.

On loading of the object a project-specific extract_data_forprocess/2 function is called which should extract the Facts to be indexed in the hints files from the original object (as well as any new 2i fields/terms to be added to both the original and the hints object).

The hints file is a binary representation of the facts using a bloom filter with a low false positive rate, but just a single hash. This makes the bloom large, but it is compressed for storage using rice encoding. This provides a good ratio between disk footprint and false positive rate, whilst also providing a natural checksum when processing to protect against any random bit-flipping.

To improve processing time, the bloom is first partitioned - so only a part of the bloom needs to be checked.

The map job (using function map_checkhints/3) can then be run across a subset (or all) of the hints files looking for the presence of one or more facts - outputting {Fact, Key} indicating in which keys each fact can be found.

Caveats

  • Note standard Basho guidance to use Map sparingly.
  • Important to use AAE to mitigate risk of r=1 query.
  • Lots of TODO.
  • Currently no proof at volume.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages