Add rayon crate for parallel processing #91

GregHanson · 2025-09-25T14:27:59Z

@KonradHoeffner I'm not sure if you and I were on similar thought process with regards to rayon because I found your issue here but this PR has some performance optimizations for the following:

parallel dictionary compression
parallel triple encoding

KonradHoeffner · 2025-09-25T14:40:02Z

Great! I originally wanted to parallelize this part:

        nt::parse_bufread(r)
            .for_each_triple(|q| {
                // execute this in parallel using into_par_iter()
           [...]
       }

Unfortunately that didn't work because Sophia doesn't yield an actual iterator at this step.
Parallelizing the dictionary compression is great as well!
Now we just need to adapt the benchmark so we can measure the impact it has on a large file.

GregHanson · 2025-09-26T14:41:16Z

I apparently need to get better at criterion, for now I have some metrics just manually capturing timing portions of the different parts of FourDictSect::read_nt using logs:

I used a larger file from https://download.bio2rdf.org/files/release/4/taxonomy/taxonomy-nodes.nq.gz which I had riot convert to NT format, which has about 23.5 million triples. Ran 5 converts. Writes, Header build, and BitmapTriples::from_triples() are included in the "total" timings but not part of the individual breakdowns

Just watching my top output I verify that the rayon crate is utilizing multiple CPU's during the tests

So the encoding actually performs better, but building the dictionary takes longer when run in parallel. Maybe trying to run all 4 compress() at once isn't a good idea

GregHanson · 2025-09-26T18:33:24Z

I made an attempt at separating the 3 core components of the FourDictSect::read_nt, so that the functions can be tested individually via criterion.

I used a larger file from https://download.bio2rdf.org/files/release/4/taxonomy/taxonomy-nodes.nq.gz which I had riot convert to NT format

The benchmarks are using this NT file. I tried using the persondata_en from https://github.com/KonradHoeffner/hdt/releases/tag/benchmarkdata , but I got a bunch of validation errors on the resulting NT when I tried the WIP hdt-to-rdf CLI branch, and also from the original ttl.gz file.

$ riot --count -q persondata_en.ttl
14:26:29 WARN  riot            :: [line: 4, col: 94] Lexical form '1860-2-21' not valid for datatype XSD date
14:26:29 WARN  riot            :: [line: 8, col: 94] Lexical form '1927-11-3' not valid for datatype XSD date
14:26:29 WARN  riot            :: [line: 13, col: 91] Lexical form '1884-11-5' not valid for datatype XSD date
...

They take a long time to run so a different sample NT file may have to be used

KonradHoeffner · 2025-09-28T08:38:38Z

Great work! Which CPU did you run those measurements on, does it have enough cores to run all the threads on physical cores?
I'm working on a dual core laptop CPU (i3-1115G4) right now so I can't measure the parallelism very well from here but will try it with my office machine with the Core i9 12900k when I get to it.

KonradHoeffner · 2025-09-29T07:00:42Z

If the speedup is minor even for very large files the question is also if it is worth the developer time and the increase in compilation time and binary size for adding the rayon dependency, especially if this only helps with converting reading NT to HDT, which is probably not the main focus of the library.
However in case the CLI gets adopted by a mainstream audience then the convert function could actually be used a lot.
@GregHanson: What do you think?

KonradHoeffner · 2025-09-29T07:46:00Z

The benchmarks are using this NT file. I tried using the persondata_en from https://github.com/KonradHoeffner/hdt/releases/tag/benchmarkdata , but I got a bunch of validation errors on the resulting NT when I tried the WIP hdt-to-rdf CLI branch, and also from the original ttl.gz file.
[...]
They take a long time to run so a different sample NT file may have to be used

Can you try it again with the CLI branch?
I just rebased it to the current state of main and also print more info now: file size of input and output and also duration of the conversion.

On my low-end dual core laptop (still serial processing) it takes a while but does not print errors:

hdt$ cargo build --release && cp target/release/hdt rdf
hdt$ rdf convert tests/resources/persondata_en.hdt tests/resources/persondata_en.nt
Successfully converted "tests/resources/persondata_en.hdt" (85.6 MiB) to "tests/resources/persondata_en.nt" (1.1 GiB) in 11.11s
hdt$ rdf convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 60.35s

KonradHoeffner · 2025-09-29T12:58:13Z

Testing it on my Intel i9 10900k with 10 cores / 20 threads: (serial original one)

hdt$ rdf convert tests/resources/persondata_en.hdt tests/resources/persondata_en.nt                                                                                                                              cli
Successfully converted "tests/resources/persondata_en.hdt" (85.6 MiB) to "tests/resources/persondata_en.nt" (1.1 GiB) in 8.44s
hdt$ rdf convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt                                                                                                                                         cli
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 54.08s

Interesting, that it is so close in timing to the low-end laptop CPU.

KonradHoeffner · 2025-09-29T13:21:31Z

Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 53.77s

I tried it with scoped threads as well to skip the rayon dependency.
But at least for the persondata_en case it seems a really small part and cause any improvement in the runtime.

        let [shared, predicates, subjects, objects] = thread::scope(|s| {
            [&shared_terms, &predicate_terms_ref, &unique_subject_terms, &unique_object_terms]
                .map(|terms| s.spawn(|| DictSectPFC::compress(terms, block_size)))
                .map(|t| t.join().unwrap())
        });
        println!("compression finished");
        let dict = FourSectDict { shared, predicates, subjects, objects };

GregHanson · 2025-09-30T20:20:59Z

Can you try it again with the CLI branch?

I tried again with the updated CLI branch, same errors from riot. But you're right, the generated NT is still consumable by the HDT crate and queries work against it - so I guess it is an ignorable error

Which CPU did you run those measurements on

I am unfortunately running on WSL so my numbers are going to be on the high side no matter what. Tomorrow I should be able to try tunning it on an HPC node and get some more juicy numbers

GregHanson · 2025-09-30T20:23:46Z

I was surprised it didn't lead to a better performance to be honest. If the numbers are negligible, I'm OK skipping for now

KonradHoeffner · 2025-10-01T08:04:33Z

I am unfortunately running on WSL so my numbers are going to be on the high side no matter what. Tomorrow I should be able to try tunning it on an HPC node and get some more juicy numbers

I would assume that the kind of CPU would not change the relative numbers much, only the absolute numbers.
But can you report how many cores those machines have?

KonradHoeffner · 2025-10-01T08:07:16Z

I was surprised it didn't lead to a better performance to be honest. If the numbers are negligible, I'm OK skipping for now

I think it's still great to have a benchmark for that and we could try it with Rust threads first as then we don't have the downside of an additional dependency.

KonradHoeffner · 2025-10-01T09:51:09Z

Wow, the Intel i9 12900k is quite fast (still serial):

hdt$ rdf convert tests/resources/persondata_en_100k.hdt tests/resources/persondata_en_100k.nt
Successfully converted "tests/resources/persondata_en_100k.hdt" (1.6 MiB) to "tests/resources/persondata_en_100k.nt" (11.2 MiB) in 0.08s
hdt$ rdf convert tests/resources/persondata_en_1M.hdt tests/resources/persondata_en_1M.nt
Successfully converted "tests/resources/persondata_en_1M.hdt" (11.1 MiB) to "tests/resources/persondata_en_1M.nt" (111.1 MiB) in 0.59s
hdt$ rdf convert tests/resources/persondata_en.hdt tests/resources/persondata_en.nt
Successfully converted "tests/resources/persondata_en.hdt" (85.6 MiB) to "tests/resources/persondata_en.nt" (1.1 GiB) in 5.99s
hdt$ rdf convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 37.60s

KonradHoeffner · 2025-10-01T10:02:03Z

build_dict_from_terms 12900k persondata_en_1M.nt

Using 1M because 1 run of just one part takes 40 seconds already.

single thread

        let (shared, subjects, predicates, objects) = (
            DictSectPFC::compress(&shared_terms, block_size),         
            DictSectPFC::compress(&unique_subject_terms, block_size),   
            DictSectPFC::compress(&predicate_terms_ref, block_size),
            DictSectPFC::compress(&unique_object_terms, block_size),           
        );

hdt$ time cargo bench --bench criterion -- dictionary_read_nt
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running benches/criterion.rs (target/release/deps/criterion-ffa779d4837a26db)
dictionary_read_nt/dict_building
                        time:   [278.28 ms 297.21 ms 317.14 ms]
                        change: [−6.6888% −0.0491% +7.0741%] (p = 0.99 > 0.05)
                        No change in performance detected.

cargo bench --bench criterion -- dictionary_read_nt  40.00s user 0.49s system 101% cpu 40.059 total

rayon

        let ((shared, predicates), (subjects, objects)) = rayon::join(
            || {
                rayon::join(
                    || DictSectPFC::compress(&shared_terms, block_size),
                    || DictSectPFC::compress(&predicate_terms_ref, block_size),
                )
            },
            || {
                rayon::join(
                    || DictSectPFC::compress(&unique_subject_terms, block_size),
                    || DictSectPFC::compress(&unique_object_terms, block_size),
                )
            },
        );

hdt$ time cargo bench --bench criterion -- dictionary_read_nt
   Compiling hdt v0.4.0 (/home/konrad/projekte/rust/hdt)
    Finished `bench` profile [optimized] target(s) in 2.37s
     Running benches/criterion.rs (target/release/deps/criterion-ffa779d4837a26db)
dictionary_read_nt/dict_building
                        time:   [300.92 ms 304.38 ms 308.23 ms]
                        change: [−4.1078% +2.4132% +9.4324%] (p = 0.50 > 0.05)
                        No change in performance detected.
cargo bench --bench criterion -- dictionary_read_nt  49.85s user 0.81s system 121% cpu 41.809 total

I don't know why but I only saw one active thread in htop.

std::thread

        use std::thread;
        let [shared, predicates, subjects, objects] = thread::scope(|s| {
            [&shared_terms, &predicate_terms_ref, &unique_subject_terms, &unique_object_terms]
                .map(|terms| s.spawn(|| DictSectPFC::compress(terms, block_size)))
                .map(|t| t.join().unwrap())
        });

hdt$ time cargo bench --bench criterion -- dictionary_read_nt                                                                                                     add-rayon
   Compiling hdt v0.4.0 (/home/konrad/projekte/rust/hdt)
    Finished `bench` profile [optimized] target(s) in 2.34s
     Running benches/criterion.rs (target/release/deps/criterion-ffa779d4837a26db)
dictionary_read_nt/dict_building
                        time:   [241.45 ms 245.12 ms 248.71 ms]
                        change: [−18.237% −17.093% −15.955%] (p = 0.00 < 0.05)
                        Performance has improved.

cargo bench --bench criterion -- dictionary_read_nt  49.56s user 0.74s system 122% cpu 41.142 total

hdt$ time cargo bench --bench criterion -- dictionary_read_nt                                                                                                     add-rayon
    Finished `bench` profile [optimized] target(s) in 0.11s
     Running benches/criterion.rs (target/release/deps/criterion-ffa779d4837a26db)
dictionary_read_nt/dict_building
                        time:   [296.98 ms 299.35 ms 301.95 ms]
                        change: [+20.124% +22.126% +24.255%] (p = 0.00 < 0.05)
                        Performance has regressed.

cargo bench --bench criterion -- dictionary_read_nt  38.53s user 0.52s system 102% cpu 38.169 total

Conclusions

We should test with the complete file as well because there is lots of variations between runs, maybe because of the CPU getting hot or reaching it's boost limits?
Also htop does not show multiple cores being used, maybe there is some shared resource that blocks effective parallelization.

KonradHoeffner · 2025-10-01T10:15:07Z

build_dict_from_terms 12900k persondata_en.nt

single thread

hdt$ time cargo bench --bench criterion -- dictionary_read_nt                                                                                                     add-rayon
   Compiling hdt v0.4.0 (/home/konrad/projekte/rust/hdt)
    Finished `bench` profile [optimized] target(s) in 2.27s
     Running benches/criterion.rs (target/release/deps/criterion-ffa779d4837a26db)
Benchmarking dictionary_read_nt/dict_building: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 44.5s.
dictionary_read_nt/dict_building
                        time:   [4.3421 s 4.3909 s 4.4389 s]
                        change: [+1346.3% +1366.8% +1386.3%] (p = 0.00 < 0.05)
                        Performance has regressed.

cargo bench --bench criterion -- dictionary_read_nt  272.16s user 3.61s system 102% cpu 4:27.85 total

rayon

std::thread

KonradHoeffner · 2025-10-01T10:23:58Z

serial set operations

It seems to me as if most of the time is spent in this spot here:

        let shared_terms: BTreeSet<&str> =
            subject_terms.intersection(object_terms).map(std::ops::Deref::deref).collect();
        let unique_subject_terms: BTreeSet<&str> =
            subject_terms.difference(object_terms).map(std::ops::Deref::deref).collect();
        let unique_object_terms: BTreeSet<&str> =
            object_terms.difference(subject_terms).map(std::ops::Deref::deref).collect();

dictionary_read_nt/dict_building
                        time:   [4.2262 s 4.2380 s 4.2554 s]
                        change: [+6.3368% +6.6206% +7.0208%] (p = 0.01 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

cargo bench --bench criterion -- dictionary_read_nt  265.32s user 3.46s system 103% cpu 4:20.89 total

parallel set operations

        use std::thread;
        let [shared_terms, unique_subject_terms, unique_object_terms]: [BTreeSet<&str>; 3] = thread::scope(|s| {
            [
                s.spawn(|| subject_terms.intersection(object_terms).map(std::ops::Deref::deref).collect()),
                s.spawn(|| subject_terms.difference(object_terms).map(std::ops::Deref::deref).collect()),
                s.spawn(|| object_terms.difference(subject_terms).map(std::ops::Deref::deref).collect()),
            ]
            .map(|t| t.join().unwrap())
        });

dictionary_read_nt/dict_building
                        time:   [3.7513 s 3.7591 s 3.7686 s]
                        change: [−11.706% −11.299% −10.952%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild

cargo bench --bench criterion -- dictionary_read_nt  262.04s user 3.57s system 104% cpu 4:14.50 total

Heaptrack serial peak RSS 35.4MB

Heaptrack parallel peak RSS 35.5MB

Conclusions

Performing the set operations in parallel has a positive impact on performance of around 11%, thus I'm going to merge that.
This does not cause significant increases in RAM consumption, so merge it.

KonradHoeffner · 2025-10-01T12:31:05Z

Visualize performance of serial conversion to see bottlenecks

Note: "rdf" is my alias to the CLI binary.

hdt$ perf record --call-graph=dwarf rdf convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt                                                           add-rayon
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 38.26s
[ perf record: Woken up 4917 times to write data ]
Warning:
Processed 184626 events and lost 1 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 1230.130 MB perf.data (152806 samples) ]
hdt$ perf script > /tmp/convert.pdf
Warning: [...]

KonradHoeffner · 2025-10-01T12:43:15Z

I think I have to merge in the CLI branch, otherwise it's too cumbersome to test those changes.

KonradHoeffner · 2025-10-01T13:06:14Z

There is indeed some speedup, going from ~38s to ~32.5s, but the amount saved by the dictionary compression seems small in comparison to the triple encoding.
Interesting that the triple encoding does not create more time savings.

KonradHoeffner · 2025-10-01T13:28:40Z

memory usage for conversion persondata_en_1M.nt > persondata_en_1M.hdt

With Rayon: 523 MB
Without Rayon: 492 MB

KonradHoeffner · 2025-10-01T14:19:09Z

Using two threads with a channel for parsing also saves a lot of time, down to around 23s.

KonradHoeffner · 2025-10-01T14:30:20Z

The BTreeSet also seems to be problematic because of so many relatively slow insertions, using HashSet speeds it up a bit to 22.9s.

KonradHoeffner · 2025-10-01T14:36:32Z

This also takes a few seconds, I wonder if it's faster to use a HashSet instead:

        raw_triples.sort_unstable();
        raw_triples.dedup();

Result: no, that takes 25.5s.

KonradHoeffner · 2025-10-01T15:02:30Z

Let's just sort the raw triples in another thread while the main thread builds the dictionary.
Now it's down to less than 22s, enough for today.

…avoid duplicates

KonradHoeffner · 2025-10-15T13:29:28Z

I finally managed to rebase this branch onto the current state of main.
Your index optimizations are great, however I think the intermediate functions are not that reusable anymore, so I think it would be better to merge them together again (however I was on the fence about this before anyways).
While this does not allow for your detailed benchmarking, I think given that profiling allows measure the steps of the read_nt operation, that this is not necessary anymore.
On the other hand separating a large function into multiple smaller ones does allow for easier understanding.
However return types like Result<(Vec<[usize; 3]>, HashSet<usize>, HashSet<usize>, HashSet<usize>, Vec<String>)> suggest to me that this is not an easy and natural separation.
What do you think @GregHanson?

KonradHoeffner · 2025-10-16T08:50:14Z

By the way, could the hashing performance be optimized by using some other data structure?
Because the keys are already integers, and we don't care about hashing as a security measure, it seems a bit redundant to convert one integer to another, especially as they are all in the range of 0..length of the string vector - 1.
Maybe a bitmap :-)
I guess then some of the set operations could also be optimized using AND/OR/XOR.
But this is probably not going to be worth the effort, because your approach is already so much more optimized than the previous way, I'm just curious if that is the default approach for such a problem.
Alternatively the HashSet could also be contain pointers into the vec but I guess that is unsafe (does that even work?).

…adedRodeo

KonradHoeffner · 2025-10-16T16:31:05Z

The bottleneck seemed to be the intial reading of the NTriples file and the string interning, so I used Rayon for that part as well.
Unfortunately both the string interning function and the rio-based Sophia parser were not suitable for parallel processing so I had to bring in oxttl with it's parallel NT reading support and the lasso string interning library.
I know you were bringing up oxttl a while ago, so I'm sorry for not believing in it earlier :-)
Binary size is still at 5.3MB, probably because a few other things like HashSets are not used anymore now.
But it could make sense to just put all the NT to HDT code into a separate file and feature flag.

The speedup with a Core i9-12900k is quite drastic, going from ~14s using lasso single threaded back up to around 15.5s with rayon and a single thread (overhead) and then down to ~5.7s when using the maximum amount of parallelism (24 apparently with this CPU).

I also tried to limit the number of NT readers but even using 16 threads was noticeably slower so I just gave it everything.
Given that the persondata_en.nt file used for those measurements is 1.2 GB in size, I think a sub 6s time is quite good and might be a good reason to publish the CLI as this could be a real benefit for users over the C++ and Java implementations.

The C++ implementation took 27.5s in a test with the same file.

However I noticed that when converting the file to and from HDT, certain strings had their double backslashes increased, I'm not sure if this has been the case all along or due to the changes and whether we can put this in the test cases.

191c191
< <http://dbpedia.org/resource/%22Bassy%22_Bob_Brockmann> <http://xmlns.com/foaf/0.1/name> "\\\\\\\\Bassy\\\\\\\\ Bob Brockmann"@en.
---
> <http://dbpedia.org/resource/%22Bassy%22_Bob_Brockmann> <http://xmlns.com/foaf/0.1/name> "\\\\\\\\\\\\\\\\Bassy\\\\\\\\\\\\\\\\ Bob Brockmann"@en.
197c197
< <http://dbpedia.org/resource/%22Big%22_Donnie_MacLeod> <http://xmlns.com/foaf/0.1/name> "\\\\\\\\Big\\\\\\\\ Donnie MacLeod"@en.
---
> <http://dbpedia.org/resource/%22Big%22_Donnie_MacLeod> <http://xmlns.com/foaf/0.1/name> "\\\\\\\\\\\\\\\\Big\\\\\\\\\\\\\\\\ Donnie MacLeod"@en.
****

GregHanson · 2025-10-16T18:29:47Z

However return types like Result<(Vec<[usize; 3]>, HashSet, HashSet, HashSet, Vec)> suggest to me that this is not an easy and natural separation.

Agreed. The separation is nice from a benchmark standpoint, but the complexity of the return types doesn't make sense in the long term.

you were bringing up oxttl a while ago, so I'm sorry for not believing in it earlier :-)

Well originally I did a time comparison between sophia and oxrdf parsers and they were close in performance, with sophia being just a little faster so I was willing to drop it - BUT I never tested which library handled parallelization better. I had wondered if oxttl might handle the parallelization better as we started diving into this investigation - but I backburned it since I thought switching libraries was off the table :D I was about to try a parallel BufReader implementation

GregHanson · 2025-10-16T18:37:36Z

However I noticed that when converting the file to and from HDT, certain strings had their double backslashes increased,

I would not be surprised. Quote escaping was something I had quite a few problems/hacks to incorporate when I was originally using the C++ version for conversion, oxrdf for query evaluation and also the underlying hdt string representation/conversion

GregHanson · 2025-10-16T18:43:00Z

actually, @KonradHoeffner how was the original persondata_en.hdt file generated? Because Tpt pointed out that the original C++ RDF2HDT implementation was NOT escaping quotes properly (link). So the additional backslashes may be ripple effects of that original improper conversion if it was generated using the C++ tooling. What was the original source for the RDF?

GregHanson · 2025-10-16T19:53:19Z

and wow, the latest more than halves the conversion time:

latest as of `1e331d3`

$ for i in {1..5}; do target/release/hdt convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt; done
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 14.79s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 14.66s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 14.60s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 14.58s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 14.87s

numbers reported earlier for `0d00c15`

$ for i in {1..5}; do target/release/hdt convert tests/resources/persondata_en.nt /tmp/persondata_en.hdt; done
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 37.01s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 34.43s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 41.27s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 38.07s
Successfully converted "tests/resources/persondata_en.nt" (1.1 GiB) to "/tmp/persondata_en.hdt" (81.9 MiB) in 41.69s

KonradHoeffner · 2025-10-17T10:16:21Z

actually, @KonradHoeffner how was the original persondata_en.hdt file generated? Because Tpt pointed out that the original C++ RDF2HDT implementation was NOT escaping quotes properly (link). So the additional backslashes may be ripple effects of that original improper conversion if it was generated using the C++ tooling. What was the original source for the RDF?

I copied that over from the Sophia RDF benchmark by @pchampin, but unfortunately I don't know why that file was chosen in particular or if there are known issues for it, only that it was downloaded from http://downloads.dbpedia.org/2016-10/core-i18n/en/persondata_en.ttl.bz2 and then converted to N-Triples.

* parallel triple encoding, parallel dict build * create separate function in FourDictSect::read_nt * make Clippy happy * parallelize set operations * safe a lot of time using channels * more parallelization * avoid passing String's during read_nt, use HashSet for predicates to avoid duplicates * adapt benchmark to new read_nt helper functions * get rid of the mpsc channel again * use bitsets instead of hashsets for string indices * drastically speed up NT -> HDT conversion using Rayon, oxttl and ThreadedRodeo * remove now unnecessary sophia feature flags behind read_nt functions * remove now unnecessary sophia feature flags behind read_nt functions * start refactoring read_nt code into its own file * more refactoring * fix hdt.rs path import * remove cli and nt from default features * feature gate benchmark * make clippy happy * refactore index pool * upgrade version to 0.5.0 --------- Co-authored-by: Konrad Höffner <[email protected]>

GregHanson requested a review from KonradHoeffner September 25, 2025 14:27

KonradHoeffner force-pushed the add-rayon branch 2 times, most recently from 2a111d2 to 41cf010 Compare October 1, 2025 09:02

KonradHoeffner force-pushed the add-rayon branch from d6f0fd2 to ab7566f Compare October 1, 2025 11:55

KonradHoeffner force-pushed the add-rayon branch from ab7566f to aef4e0c Compare October 1, 2025 12:54

KonradHoeffner force-pushed the add-rayon branch from dd00edc to a7e9e21 Compare October 1, 2025 15:04

KonradHoeffner force-pushed the add-rayon branch from 0d00c15 to 355c307 Compare October 15, 2025 13:13

avoid passing String's during read_nt, use HashSet for predicates to …

355c307

…avoid duplicates

KonradHoeffner force-pushed the add-rayon branch from 5c7fc00 to c72f09f Compare October 15, 2025 13:23

adapt benchmark to new read_nt helper functions

c72f09f

get rid of the mpsc channel again

3a8ccf3

KonradHoeffner added 2 commits October 16, 2025 15:42

use bitsets instead of hashsets for string indices

d26bf76

drastically speed up NT -> HDT conversion using Rayon, oxttl and Thre…

1e331d3

…adedRodeo

KonradHoeffner force-pushed the add-rayon branch from 84a57b7 to 83f451f Compare October 17, 2025 11:46

remove now unnecessary sophia feature flags behind read_nt functions

a06c8ab

KonradHoeffner force-pushed the add-rayon branch from 83f451f to a06c8ab Compare October 17, 2025 11:48

KonradHoeffner added 4 commits October 20, 2025 12:36

remove now unnecessary sophia feature flags behind read_nt functions

08af654

start refactoring read_nt code into its own file

e451334

more refactoring

6ef170c

fix hdt.rs path import

0fd01c0

KonradHoeffner force-pushed the add-rayon branch from 608f481 to 0fd01c0 Compare October 21, 2025 09:28

KonradHoeffner added 4 commits October 22, 2025 11:33

remove cli and nt from default features

20f104b

feature gate benchmark

1384923

make clippy happy

660628f

refactore index pool

c060a11

KonradHoeffner merged commit 26c8890 into KonradHoeffner:main Oct 22, 2025
1 check passed

upgrade version to 0.5.0

aa8691a

Uh oh!

Add rayon crate for parallel processing #91

Add rayon crate for parallel processing #91

Uh oh!

Conversation

GregHanson commented Sep 25, 2025

Uh oh!

KonradHoeffner commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GregHanson commented Sep 26, 2025

Uh oh!

GregHanson commented Sep 26, 2025

Uh oh!

KonradHoeffner commented Sep 28, 2025

Uh oh!

KonradHoeffner commented Sep 29, 2025

Uh oh!

KonradHoeffner commented Sep 29, 2025

Uh oh!

KonradHoeffner commented Sep 29, 2025

Uh oh!

KonradHoeffner commented Sep 29, 2025

Uh oh!

GregHanson commented Sep 30, 2025

Uh oh!

GregHanson commented Sep 30, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build_dict_from_terms 12900k persondata_en_1M.nt

single thread

rayon

std::thread

Conclusions

Uh oh!

KonradHoeffner commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build_dict_from_terms 12900k persondata_en.nt

single thread

rayon

std::thread

Uh oh!

KonradHoeffner commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

serial set operations

parallel set operations

Heaptrack serial peak RSS 35.4MB

Heaptrack parallel peak RSS 35.5MB

Conclusions

Uh oh!

KonradHoeffner commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Visualize performance of serial conversion to see bottlenecks

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

memory usage for conversion persondata_en_1M.nt > persondata_en_1M.hdt

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 1, 2025

Uh oh!

KonradHoeffner commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KonradHoeffner commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

KonradHoeffner commented Sep 25, 2025 •

edited

Loading

KonradHoeffner commented Oct 1, 2025 •

edited

Loading

KonradHoeffner commented Oct 1, 2025 •

edited

Loading

KonradHoeffner commented Oct 1, 2025 •

edited

Loading

KonradHoeffner commented Oct 1, 2025 •

edited

Loading

KonradHoeffner commented Oct 15, 2025 •

edited

Loading

KonradHoeffner commented Oct 16, 2025 •

edited

Loading

KonradHoeffner commented Oct 16, 2025 •

edited

Loading

GregHanson commented Oct 16, 2025 •

edited

Loading

GregHanson commented Oct 16, 2025 •

edited

Loading

latest as of `1e331d3`

numbers reported earlier for `0d00c15`

KonradHoeffner commented Oct 17, 2025 •

edited

Loading