Skip to content

Working with third party pipelines

Frédéric Mahé edited this page Jun 2, 2014 · 3 revisions

Several pipelines have been created to deal with amplicon-data: Mothur, QIIME, UPARSE. We will try to show here how-to use swarm clusters with these pipelines.

Produce swarm results compatible with Mothur

That code block runs swarm and swarm post-processing with different d values, and convert swarm's format into Mothur's format.

FASTA="amplicons.fasta"
SWARM=$(readlink -f ./swarm)
CLUSTERS_1=$(mktemp)
CLUSTERS_2=$(mktemp)

# Unique amplicons
OTUs=$(grep -c "^>" "${FASTA}")
(echo -ne "unique\t${OTUs}\t"
    grep "^>" "${FASTA}" | tr -d ">" | tr "\n" "\t"
    echo) > test.list

# Test 20 d values
for ((d=1 ; d<=20 ; d++)) ; do
    # First step
    "${SWARM}" -d "${d}" "${FASTA}" > "${CLUSTERS_1}"
    # Second step
    python ../scripts/swarm_breaker.py -b "${SWARM}" -f "${FASTA}" \
        -s "${CLUSTERS_1}" -d "${d}" > "${CLUSTERS_2}" 2> /dev/null
    # Convert to Mothur format
    OTUs=$(wc -l < "${CLUSTERS_2}")
    (echo -ne "d${d}\t${OTUs}\t"
        tr "\n" "\t" < "${CLUSTERS_2}" | tr " " ","
        echo) >> test.list
done
rm "${CLUSTERS_1}" "${CLUSTERS_2}"

Clone this wiki locally