-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
[...] found an unhandled bug while testing with dblp, pretty sure as_bytes() barfs if the string is not utf8?
./rdf2hdt convert -i dblp.ttl -o dblp-rust.hdt -vvv
[DEBUG hdt::rdf2hdt::rdf_reader] converting dblp.ttl to nt format
[DEBUG hdt::rdf2hdt::rdf_reader] RDF to NTriple convert time: 939.855296477s
[DEBUG hdt::rdf2hdt::dictionary] Four Section Dictions sort time: 1837.413708669s
[DEBUG hdt::rdf2hdt::dictionary] Encoding triples time: 1299.179455131s
[DEBUG hdt::rdf2hdt::dictionary] Dictionary build time: 3136.655445115s
[DEBUG hdt::rdf2hdt::bitmap_triples] BitmapTriples build time: 81.938644095s
[DEBUG hdt::rdf2hdt::builder] HDT build time: 4187.132880571s
thread 'main' panicked at src/rdf2hdt/dictionary.rs:237:52:
byte index 8 is not a char boundary; it is inside 'μ' (bytes 7..9) of `"$\\mu$μBench: An Open-Source Factory of Benchmark Microservice Applications."`looking at the convert NT triples:
$ grep "An Open-Source Factory of Benchmark Microservice Applications." dblp.nt
<https://dblp.org/rec/journals/tpds/DettiFP23> <http://www.w3.org/2000/01/rdf-schema#label> "Andrea Detti et al.: $\\mu$μBench: An Open-Source Factory of Benchmark Microservice Applications. (2023)".
<https://dblp.org/rec/journals/tpds/DettiFP23> <https://dblp.org/rdf/schema#title> "$\\mu$μBench: An Open-Source Factory of Benchmark Microservice Applications.".and the source TTL:
$ grep "An Open-Source Factory of Benchmark Microservice Applications." dblp.ttl
rdfs:label "Andrea Detti et al.: $\\mu$\u03BCBench: An Open-Source Factory of Benchmark Microservice Applications. (2023)" ;
dblp:title "$\\mu$\u03BCBench: An Open-Source Factory of Benchmark Microservice Applications." ;Originally posted by @GregHanson in #56 (comment)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working