REACTA: Regional Heritability Advanced Complex Trait Analysis
Developed under a collaborative project between EPCC and The Roslin Institute between 2011 and 2014, Regional Heritability Advanced Complex Trait Analysis (REACTA) is a modified version of GCTA with improved computational performance, support for Graphics Processing Units (GPUs), and additional features. REACTA was formerly known as ACTA. The purpose of REACTA is to quantify the contribution of genetic variation to phenotypic variation for complex traits.
Note that REACTA is no longer under active development or support. The source code remains freely available. REACTA has been superseded by DISSECT:
O. Canela-Xandri, A. Law, A. Gray, J. A. Woolliams, A. Tenesa. DISSECT: A new tool for analyzing extremely large genomic datasets. (2015) bioRxiv doi: http://dx.doi.org/10.1101/020453
The last version to be released was 0.9.7.
The GPU version will only work if a CUDA-enabled NVIDIA GPU is present. The source code is available under the Gnu Public Licence, and will appear here in due course (please contact us in the mean time).
REACTA broadly supports the same options as GCTA , as described on the GCTA home page.
Unsupported options: the current version of REACTA is based on GCTA 0.93.9. This means that the GCTA options --reml-bivar, --reml-bivar-nocove, --reml-bivar-lrt-rg, --reml-bivar-prevalence and --grm-bin are not yet supported in REACTA, but will appear in future releases.
Additional options: the additional options supported by REACTA are described in the sections below.
GRM calculation on-the-fly
REACTA allows use of the --make-grm and --reml options together. When these are both specified, the GRM will be calculated "on-the-fly" before the REML analysis, and will not be saved to disk. For example,
reacta --out dataset1 \ --pheno dataset1.phen \ --reml \ --bfile dataset1 \ --make-grm \ --autosome
will perform a reml analysis without reading a GRM file, but instead will calculate the GRM on-the-fly using the genotype file specified using the --bfile option. The --pheno option is required to specify the phenotype data file. The --autosome option specifies that all autosomic SNPs should be included in this case, and the --out option specifies the name to be used for output files as usual.
REACTA supports the following option:
--multi-region <regionSize> <overlap>
This performs regional analysis. It splits the analyses up into regions each of size <regionSize> SNPs, which may overlap by <overlap> SNPs. For example, specifying --multi-region 300 50 will split into regions of size 300 SNPs, with an overlap of 50 SNPs between regions. For this functionality, the --make-grm and --reml options must be specified together such that GRMS can be calculated on-the-fly (see above). An example full command is
reacta --out dataset1 \ --pheno dataset1.phen \ --reml \ --bfile dataset1 \ --make-grm \ --autosome \ --multi-region 300 50
The code will loop over all regions and will output, for each region, a file containing the reml results dataset1_<region>.hsq, a file containing the list of SNPs within the region dataset1_SNP_<region>.out, and a file containing the standard output from the code dataset1_region_<region>.out, where <region> is a unique label for the region containing numbers corresponding of the first and last SNPs in that region.
REACTA also supports the following option
--multi-region-with-polygenic <regionSize> <overlap> <polygenic_GRM_root_filename>
This performs multi-region analysis as above, but for each region the "local" GRM corresponding that region is analysed simultaneously with a fixed "polygenic" GRM which is read from disk (in a similar fashion to use of the --mgrm option in GCTA). Note that the local contributions are removed from the polygenic GRM for each region, such that the two GRMs are complementary. For example, to run this feature, the above could be adapted as follows
reacta --out dataset1 \ --pheno dataset1.phen \ --reml \ --bfile dataset1 \ --make-grm \ --autosome \ --multi-region-with-polygenic 300 50 dataset1autosome
where the polygenic GRM in this case is read from pre-existing dataset1autosome.grm.gz and dataset1autosome.grm.id files.
MPI multi CPU or GPU support: Regional analysis can become time-consuming for many regions, so we have also developed an MPI parallel version which can distribute the regions across multiple GPUs or CPUs on compute clusters. Please contact us for more information.
Memory management for GRM creation
The GRM estimation can be very demanding on memory usage for large numbers of SNPs. The W data structure dominates memory usage when the number of SNPs becomes large: it's full size in GB, is (number of individuals) X (number of SNPs) X (4 bytes per entry)/(1024X1024X1024 bytes per GB). REACTA splits the calcuation into stages, allowing you to specify a maximum value for the memory used by the "W" data structure
If you find that your analysis crashes due to lack of memory, try specifying a <value>, and lower it until the code runs. Note that <value> does not correspond to the absolute total maximum total memory usage, since there are other data structures. Hence, it may have to be significantly lower than the total memory available.
reacta --bfile dataset \ --make-grm \ --out dataset1 \ --autosome \ --w-max-gb 0.5
will set this value to 0.5GB.
L. Cebamanos, A. Gray, I. Stewart and A. Tenesa, Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures, Bioinformatics 2014; doi: 10.1093/bioinformatics/btt754
A. Gray, I. Stewart and A. Tenesa, Advanced Complex Trait Analysis, Bioinformatics 2012; doi: 10.1093/bioinformatics/bts571