Placeholder for datasets, todo items for the ISTC hackathon and demo.
Access the TX-E1 cluster via ssh [email protected].
Make sure your user account is in the istcdata group.
Main folder is /home/gridsan/groups/istcdata/datasets/ocean_metagenome
We need to find a link between the metadata and the genomics data that has been processed in the overlapped_trimmed_data folder.
Simons metagenomics samples
Directories and sample processing steps:
raw_data: directory containing the raw Illumina NextSeq data from BioMicro. Each subfolder (150122Chi, etc) represents the BioMicro project name.renamed_raw_data: directory with symbolic links to the original raw file. Links are renamed in the format S0010_1_sequence.fastq etc, where "S0010" would be sample #10 from Maddie's master extraction spreadsheet. Done to simplify naming and make bulk preprocessing simple. Renaming is based on thesimons_mg_samplekey.txtfile, using the scriptrenamefiles.shoverlapped_trimmed_data: Output of the "metagenome_preprocessing.sh" script. Steps are: a: runcutadapt_1.8.1to remove any adapter sequences in the raw sequence data b: runclc_overlap_readsto overlap any paired-end data (min 75 bp output) c: runclc_quality_trimon nonoverlapping reads to remove low-quality reads/sections of reads (min 75 bp output)