Kmerind is a library in the Parallel Bioinformatics Library for Short Sequences project (ParBLiSS).
Kmerind provides k-mer indexing capability for biological sequence data.
Please take a look at our Wiki.
ParBLiSS is a C++ library for distributed and multi-core bioinformatics algorithms. It requires C++ 11 features and MPI (OpenMP not required). The library is implemented as a set of templated classes. As such, most of the code is in header form, and are incorporated into the user code via #include.
K-merind provides basic parallel sequence file access and k-mer index construction and query. Currently, it supports indices for frequency, position, and quality of kmers from short reads and whole genomes, using FASTQ and FASTA formats.
Required:
- c++11 supporting compiler
--
g++(version 4.8.1+ due to "decltype" and other c++11 features) or --icpc(version 16+ due to constexpr functions and initializers) or --clang(version 3.5+ - cmake generated make file has problems with prior versions. or 3.7+ if openmp is used) cmake(version 2.8+)- an MPI implementation, one of the following
--
openmpi(version 1.7+ due to use of MPI_IN_PLACE) --mpich2(version 1.5 +) --mvapich(tested with version 2.1.7) --intel mpi library(poorly tested)
See http://en.cppreference.com/w/cpp/compiler_support
Optional libraries are:
boost_log, boost_system, boost_thread, boost_program-options
These are only needed if you intend to turn on boost log engine.
Optional tools include:
ccmake(for graphical cmake configuration)perl, and perl packagesTerm::ANSIColor,Getopt::ArgvFile,Getopt::Long,Regexp::Common(for g++ error message formatting)
git clone https://github.com/ParBLiSS/kmerind.git
cd kmerind
git submodule init
git submodule updatemkdir kmerind-build
cd kmerind-build
cmake ../kmerindalternatively, instead of cmake ../kmerind, you can use
ccmake ../kmerindThe following are important parameters:
-
CMAKE_BUILD_TYPE: defaults toRelease. -
ENABLE_TESTING:OnallowsBUILD_TEST_APPLICATIONSto show, which enables building the test applications -
BUILD_EXAMPLE_APPLICATION:Onallows applications in theexamplesdirectory to be built -
LOG_ENGINE: chooses which log engine to use. -
LOGGER_VERBOSITY: chooses the type of messages to prin. -
ENABLE_SANITIZER: turns on g++'s address or thread sanitizer. UseSANITIZER_STYLEto configure. This is for debugging -
ENABLE_STLFILT: turns on g++ error message post processing to make them human readable. Control verbosity viaSTLFIL_VERBOSITY
It is highly recommended that ccmake be used until you've become familiar with the available CMake options and how to specify them on the commandlinie.
makeImportant for developers using Intel Compilers, please see the "Intel Compiler Specific Issues" section at the end of the document.
ctest -T Testor
make testmake docPlease see Wiki.
Cmake typically uses a out-of-source build. to generate eclipse compatible .project and .cproject files, supply
-G"Eclipse CDT4 - Unix Makefiles"
to cmake.
Recommend that ptp, egit, and cmake ed also be installed.
With Intel C Compiler (icc) version 15, the following compilation error is observed:
Internal error: assertion failed at: "shared/cfe/edgcpfe/il.c", line 18295While there is very little information to be found on the internet related to this error, we have theorized that this is a compiler bug related to auto type deduction in templated function instantiation. It appears that ICC is unable to auto deduce the data type and size of a statically sized array of the form
datatype x[len]which as a function parameter is specified as
datatype (&x)[len]with datatype and len being template parameters for the function.
This error appears only for bitgroup_ops.hpp. Attempts to replicate the error in a separate test code was not successful. The workaround is to fully specify the template parameters for the function so to avoid automatic type deduction in this case.
It is not clear if other function parameter forms also cause this error.