Skip to content

Merge assembled contigs? #484

@donkirkby

Description

@donkirkby

Sometimes, the IVA assembler can't assemble one large contig, but it assembles several smaller contigs that either have small gaps between them or have some overlap.

For example, 73060A-HCV_S46 from 15 Jul 2016 has a whole-genome contig for HCV-2b, and several HCV-1a contigs with overlaps of hundreds of bases.

When I looked at the IVA source code, I found that it requires 99% identity in the overlap region before it will merge two contigs. The 73060A sample had overlaps with between 96% and 99% identity. When I lowered the requirement to 95%, it combined the HCV-1a contigs nicely, but it still reported a separate contig for HCV-2b.

There are a few other samples that had trouble combining contigs, but I haven't looked at them in detail yet. HIV3428P100IN200-C19-HIV-S51 from 20 Sep 2019 run is the closest, but it looks like one contig has primer at the end. Samples HIV0887-P2D21-HIV_S3 and HIV0887-P2C12-HIV_S32 from 30 Aug 2019 looks even better, but have very little overlap.

For now, Chanson and I decided to leave it as is, and MiCall will combine the nuc and amino counts for overlapping contigs in later steps. In a future version, we should look to see how often we have trouble merging contigs, and how many of them have this overlap identity problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions