TAD_randomisation_strategy

This repository contains the scripts required to generate random TADs as outlined in:

Making sense of the linear genome, gene function and TADs
Helen S Long, Simon Greenaway, George Powell, Ann-Marie Mallon, Cecilia M Lindgren, Michelle M Simon
doi: https://doi.org/10.1101/2020.09.28.316786

Scripts:
Randomgenome.R:
A script to generate random genomes for mm10.

This r script takes a file containing the coordinates of genes in mm10 ("mm10_proteincodinggenes_biomart_sorted.bed") with the columns (not labelled):

<chr> <start> <end> <strand> <gene ID>

This file was downloaded from ensembl biomart (V96) and sorted using bedtools.

The script will generate 100 "random genomes" by randomising the gene ID column within each chromosome.

The created random genomes can then be overlapped with TADs using bed tools e.g.

bedtools intersect -a <TADs.bed> -b <randomgenome.bed> -wao -F 1 > <output>

randomTADsmethod_Arrowhead.py:
A script to generate random TADs with properties similar to Arrowhead TADs as outlined in the paper. It can be run for either human or mouse.

This python script takes the file "datasetname"_"TADcaller""binsize"kb.txt" with the columns (labelled):

<TAD ID> <No. genes in TAD> <Median length of genes in TAD> <chr> <start> <end>

It also requires hg19.chrom.sizes/mm10.chrom.sizes as downloaded from UCSC and "mm10_proteincodinggenes_biomart_sorted.bed" formated as above or the equivalent for hg19.

The script takes five arguments:

Dataset name
TAD caller
Species
Chromosome
Bin size (kb)

The script is designed to be run as an array job, parallelising per chromsome and the chromsome must be specified such that 1=chr1, 2=chr2 etc. for human 23=chrX and 24=chrY, for mouse 20=chrX and 21=chrY.

e.g. To run for Bonev ESC at 10kb chr1:

In a folder containing the file "BonevESC_Arrowhead10kb.txt"

python randomTADsmethod_Arrowhead.py BonevESC Arrowhead Mouse 1 10

Requires python 3, pybedtools, bedtools, random, pandas and numpy.

randomTADsmethod_TopDom.py:
A script to generate random TADs with properties similar to TopDom TADs as outlined in the paper. It can be run for either human or mouse.

This python script takes the file "datasetname"_"TADcaller""binsize"kb.txt" with the columns (labelled):

<TAD ID> <No. genes in TAD> <Median length of genes in TAD> <chr> <start> <end>

It also requires hg19.chrom.sizes/mm10.chrom.sizes as downloaded from UCSC and "mm10_proteincodinggenes_biomart_sorted.bed" formated as above or the equivalent for hg19.

The script takes five arguments:

Dataset name
TAD caller
Species
Chromosome
Bin size (kb)

The script is designed to be run as an array job, parallelising per chromsome and the chromsome must be specified such that 1=chr1, 2=chr2 etc. for human 23=chrX and 24=chrY, for mouse 20=chrX and 21=chrY.

e.g. To run for Bonev ESC at 10kb chr1:

In a folder containing the file "BonevESC_TopDom10kb.txt"

python randomTADsmethod_Topdom.py BonevESC TopDom Mouse 1 10

Requires python 3, pybedtools, bedtools, random, pandas and numpy.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
Randomgenome.R		Randomgenome.R
randomTADsmethod_Arrowhead.py		randomTADsmethod_Arrowhead.py
randomTADsmethod_TopDom.py		randomTADsmethod_TopDom.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAD_randomisation_strategy

About

Uh oh!

Releases

Packages

Languages

MRC-Harwell/TAD_randomisation

Folders and files

Latest commit

History

Repository files navigation

TAD_randomisation_strategy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages