REACTA: Regional Heritability Advanced Complex Trait Analysis

Developed under a collaborative project between EPCC and The Roslin Institute between 2011 and 2014, Regional Heritability Advanced Complex Trait Analysis (REACTA) is a modified version of GCTA with improved computational performance, support for Graphics Processing Units (GPUs), and additional features. REACTA was formerly known as ACTA. The purpose of REACTA is to quantify the contribution of genetic variation to phenotypic variation for complex traits.

Note that REACTA is no longer under active development or support. The source code remains freely available. REACTA has been superseded by DISSECT:

O. Canela-Xandri, A. Law, A. Gray, J. A. Woolliams, A. Tenesa. DISSECT: A new tool for analyzing extremely large genomic datasets. (2015) bioRxiv doi: http://dx.doi.org/10.1101/020453

Version

The last version to be released was 0.9.7.

Downloads

The GPU version will only work if a CUDA-enabled NVIDIA GPU is present. The source code is available under the Gnu Public Licence, and will appear here in due course (please contact us in the mean time).

Documentation

REACTA broadly supports the same options as GCTA , as described on the GCTA home page.

Unsupported options: the current version of REACTA is based on GCTA 0.93.9. This means that the GCTA options --reml-bivar, --reml-bivar-nocove, --reml-bivar-lrt-rg, --reml-bivar-prevalence and --grm-bin are not yet supported in REACTA, but will appear in future releases.

Additional options: the additional options supported by REACTA are described in the sections below.

GRM calculation on-the-fly

REACTA allows use of the --make-grm and --reml options together. When these are both specified, the GRM will be calculated "on-the-fly" before the REML analysis, and will not be saved to disk. For example,

reacta --out dataset1 \
--pheno dataset1.phen \
--reml \
--bfile dataset1 \
--make-grm \
--autosome

will perform a reml analysis without reading a GRM file, but instead will calculate the GRM on-the-fly using the genotype file specified using the --bfile option. The --pheno option is required to specify the phenotype data file. The --autosome option specifies that all autosomic SNPs should be included in this case, and the --out option specifies the name to be used for output files as usual.

Regional Analysis

REACTA supports the following option:

--multi-region <regionSize> <overlap>

This performs regional analysis. It splits the analyses up into regions each of size <regionSize> SNPs, which may overlap by <overlap> SNPs. For example, specifying --multi-region 300 50 will split into regions of size 300 SNPs, with an overlap of 50 SNPs between regions. For this functionality, the --make-grm and --reml options must be specified together such that GRMS can be calculated on-the-fly (see above). An example full command is

reacta --out dataset1 \
--pheno dataset1.phen \
--reml \
--bfile dataset1 \
--make-grm \
--autosome \
--multi-region 300 50

The code will loop over all regions and will output, for each region, a file containing the reml results dataset1_<region>.hsq, a file containing the list of SNPs within the region dataset1_SNP_<region>.out, and a file containing the standard output from the code dataset1_region_<region>.out, where <region> is a unique label for the region containing numbers corresponding of the first and last SNPs in that region.

REACTA also supports the following option

--multi-region-with-polygenic <regionSize> <overlap> <polygenic_GRM_root_filename>

This performs multi-region analysis as above, but for each region the "local" GRM corresponding that region is analysed simultaneously with a fixed "polygenic" GRM which is read from disk (in a similar fashion to use of the --mgrm option in GCTA). Note that the local contributions are removed from the polygenic GRM for each region, such that the two GRMs are complementary. For example, to run this feature, the above could be adapted as follows

reacta --out dataset1 \
--pheno dataset1.phen \
--reml \
--bfile dataset1 \
--make-grm \
--autosome \
--multi-region-with-polygenic 300 50 dataset1autosome

where the polygenic GRM in this case is read from pre-existing dataset1autosome.grm.gz and dataset1autosome.grm.id files.

MPI multi CPU or GPU support: Regional analysis can become time-consuming for many regions, so we have also developed an MPI parallel version which can distribute the regions across multiple GPUs or CPUs on compute clusters. Please contact us for more information.

Memory management for GRM creation

The GRM estimation can be very demanding on memory usage for large numbers of SNPs.  The W data structure dominates memory usage when the number of SNPs becomes large: it's full size in GB, is (number of individuals) X (number of SNPs) X (4 bytes per entry)/(1024X1024X1024 bytes per GB). REACTA splits the calcuation into stages, allowing you to specify a maximum value for the memory used by the "W" data structure

--w-max-gb <value>

If you find that your analysis crashes due to lack of memory, try specifying a <value>, and lower it until the code runs. Note that <value> does not correspond to the absolute total maximum total memory usage, since there are other data structures. Hence, it may have to be significantly lower than the total memory available.

For example,

reacta --bfile dataset \
--make-grm \
--out dataset1 \
--autosome \
--w-max-gb 0.5 

will set this value to 0.5GB.

More information

Full details

L. Cebamanos, A. Gray, I. Stewart and A. Tenesa, Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures, Bioinformatics 2014; doi: 10.1093/bioinformatics/btt754

A. Gray, I. Stewart and A. Tenesa, Advanced Complex Trait AnalysisBioinformatics 2012; doi: 10.1093/bioinformatics/bts571

Contact

EPCC contact: Alan Gray

Roslin Institute contact: Albert Tenesa

 

Projects: What We Do

Check out our portfolio of projects and see what keeps our clients coming back for more

Related Projects