The goal of this workshop is to not only present the theory and properties of MTAG, but also demonstrate the ease of its application.
Specifically,
ldsc
) tool(has already been done for you ..)
numpy
, scipy
, pandas
, argparse
, bitarray
, joblib
.python
(or a similar command) in the command line.README
found in the Github repository.MTAG
from Github¶IGSS_2017_mtag_workshop
in this demonstration)git clone
mtag
thereMTAG
¶mtag
foldergit pull
Trait number | Trait | Filename |
---|---|---|
1 | Subjective well-being (SWB) | 1_OA2016_hm3samp_SWB.txt |
2 | Neuroticism (NEUR) | 1_OA2016_hm3samp_NEUR.txt |
Okbay et al. "Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses." Nature Genetics 48, 624–633 (2016).
data
folderWe will only go over the minimum detail needed to follow along in the workshop. There are many good resources for working with Unix available online.
cd [path]
: change directory to [path]
(can be relative or absolute)ls [path]
: list the files and folder in [path]
head -n [N] [file]
: view the first [N]
lines of [file]
. tail
lets you view the last [N]
lines.python [path] [-options]
: run [path]
as a Python script with some options specified. This is how we run MTAG..bash
or .sh
scriptbash [file.bash]
executes these commands-h
)¶python [PATH_OF_MTAG_CODE] -h
mtag
folder:# 1.1_access_help.bash
!python ../mtag/mtag.py -h
usage: mtag.py [-h] [--sumstats [{File1},{File2}...]] [--gencov_path FILE_PATH] [--residcov_path FILE_PATH] [--out DIR/PREFIX] [--make_full_path] [--snp_name SNP_NAME] [--z_name Z_NAME] [--n_name N_NAME] [--eaf_name EAF_NAME] [--chr_name CHR_NAME] [--bpos_name BPOS_NAME] [--a1_name A1_NAME] [--a2_name A2_NAME] [--include SNPLIST1,SNPLIST2,..] [--exclude SNPLIST1,SNPLIST2,..] [--only_chr CHR_A,CHR_B,..] [--homogNs_frac FRAC] [--homogNs_dist D] [--maf_min MAF_MIN] [--n_min N_MIN] [--n_max N_MAX] [--info_min INFO_MIN] [--drop_ambig_snps] [--no_allele_flipping] [--analytic_omega] [--no_overlap] [--perfect_gencov] [--equal_h2] [--fdr] [--grid_file GRID_FILE] [--fit_ss] [--intervals INTERVALS] [--cores CORES] [--p_sig P_SIG] [--n_approx] [--ld_ref_panel FOLDER_PATH] [--time_limit TIME_LIMIT] [--std_betas] [--tol TOL] [--numerical_omega] [--verbose] [--chunksize CHUNKSIZE] [--stream_stdout] **mtag: Multitrait Analysis of GWAS** This program is the implementation of MTAG method described by Turley et. al. Requires the input of a comma- separated list of GWAS summary statistics with identical columns. It is recommended to pass the column names manually to the program using the options below. The implementation of MTAG makes use of the LD Score Regression (ldsc) for cleaning the data and estimating residual variance-covariance matrix, so the input must also be compatible ./munge_sumstats.py command in the ldsc distribution included with mtag. The default estimation method for the genetic covariance matrix Omega is GMM (as described in the paper). Note below: any list of passed to the options below must be comma-separated without whitespace. optional arguments: -h, --help show this help message and exit Input Files: Input files to be used by MTAG. The --sumstats option is required, while using the other two options take priority of their corresponding estimation routines, if used. --sumstats [{File1},{File2}...] Specify the list of summary statistics files to perform multitrait analysis. Multiple files paths must be separated by ",". Please read the documentation to find the up-to-date set of acceptable file formats. A general guideline is that any files you pass into MTAG should also be parsable by ldsc and you should take the additional step of specifying the names of the main columns below to avoid reading errors. --gencov_path FILE_PATH If specified, will read in the genetic covariance matrix saved in the file path below and skip the estimation routine. The rows and columns of the matrix must correspond to the order of the GWAS input files specified. FIles can either be in whitespace-delimited .txt or .npy format. Use with caution as the genetic covariance matrix specified will be weakly nonoptimal. --residcov_path FILE_PATH If specified, will read in the residual covariance matrix saved in the file path below and skip the estimation routine. The rows and columns of the matrix must correspond to the order of the GWAS input files specified. FIles can either be in .txt or .npy format. Use with caution as the genetic covariance matrix specified will be weakly nonoptimal. File must either be in whitespace-delimited .txt or .npy Output formatting: Set the output directory and common name of prefix files. --out DIR/PREFIX Specify the directory and name prefix to output MTAG results. All mtag results will be prefixed with the corresponding tag. Default is ./mtag_results --make_full_path option to make output path specified in -out if it does not exist. Column names of input files: These options manually pass the names of the relevant summary statistics columns used by MTAG. It is recommended to pass these names because only narrow searches for these columns are performed in the default cases. Moreover, it is necessary that these input files be readable by ldsc's munge_sumstats command. --snp_name SNP_NAME Name of the single column that provides the unique identifier for SNPs in the GWAS summary statistics across all GWAS results. Default is "snpid". This the index that will be used to merge the GWAS summary statistics. Any SNP lists passed to ---include or --exclude should also contain the same name. --z_name Z_NAME The common name of the column of Z scores across all input files. Default is to search for columns beginning with the lowercase letter z. --n_name N_NAME the common name of the column of sample sizes in the GWAS summary statistics files. Default is to search for columns beginning with the lowercase letter n. --eaf_name EAF_NAME The common name of the column of minor allele frequencies (MAF) in the GWAS input files. The default is "freq". --chr_name CHR_NAME Name of the column containing the chromosome of each SNP in the GWAS input. Default is "chr". --bpos_name BPOS_NAME Name of the column containing the base pair of each SNP in the GWAS input. Default is "bpos". --a1_name A1_NAME Name of the column containing the effect allele of each SNP in the GWAS input. Default is "a1". --a2_name A2_NAME Name of the column containing the non-effect allele of each SNP in the GWAS input. Default is "a2". Filter Options: The input summary statistics files can be filtered using the options below. Note that there is some default filtering according to sample size and allele frequency, following the recommendations we make in the corresponding paper. All of these column-based options allow a list of values to be passed of the same length as the number of traits --include SNPLIST1,SNPLIST2,.. Restricts MTAG analysis to the union of snps in the list of snplists provided. The header line must match the SNP index that will be used to merge the GWAS input files. --exclude SNPLIST1,SNPLIST2,.., --excludeSNPs SNPLIST1,SNPLIST2,.. Similar to the --include option, except that the union of SNPs found in the specified files will be excluded from MTAG. Both -exclude and -include may be simultaneously specified, but -exclude will take precedent (i.e., SNPs found in both the -include and -exclude SNP lists will be excluded). --only_chr CHR_A,CHR_B,.. Restrict MTAG to SNPs on one of the listed, comma- separated chromosome. Can be specified simultaneously with --include and --exclude, but will take precedent over both. Not generally recommended. Multiple chromosome numbers should be separated by commas without whitespace. If this option is specified, the GWAS summary statistics must also list the chromosome of each SNPs in a column named \`chr\`. --homogNs_frac FRAC Restricts to SNPs within FRAC of the mode of sample sizes for the SNPs as given by (N-Mode)/Mode < FRAC. This filter is not applied by default. --homogNs_dist D Restricts to SNPs within DIST (in sample size) of the mode of sample sizes for the SNPs. This filter is not applied by default. --maf_min MAF_MIN set the threshold below SNPs with low minor allele frequencies will be dropped. Default is 0.01. Set to 0 to skip MAF filtering. --n_min N_MIN set the minimum threshold for SNP sample size in input data. Default is 2/3*(90th percentile). Any SNP that does not pass this threshold for all of the GWAS input statistics will not be included in MTAG. --n_max N_MAX set the maximum threshold for SNP sample size in input data. Not used by default. Any SNP that does not pass this threshold for any of the GWAS input statistics will not be included in MTAG. --info_min INFO_MIN Minimim info score for filtering SNPs for MTAG. --drop_ambig_snps Drop strand ambiguous SNPs when performing MTAG (they are already not used to estimate Omega or Sigma. --no_allele_flipping Prevents flipping the effect sizes of summary statistics when the effect and non-effect alleles are reversed (reletive the first summary statistics file. Special Cases: These options deal with notable special cases of MTAG that yield improvements in runtime. However, they should be used with caution as they will yield non-optimal results if the assumptions implicit in each option are violated. --analytic_omega Option to turn off the numerical estimation of the genetic VCV matrix in the presence of constant sample size within each GWAS, for which a closed-form solution exists. The default is to typically use the closed form solution as the starting point for the numerical solution to the maximum-likelihood genetic VCV, Use with caution! If any input GWAS does not have constant sample size, then the analytic solution employed here will not be a maximizer of the likelihood function. --no_overlap Imposes the assumption that there is no sample overlap between the input GWAS summary statistics. MTAG is performed with the off-diagonal terms on the residual covariance matrix set to 0. --perfect_gencov Imposes the assumption that all traits used are perfectly genetically correlated with each other. The off-diagonal terms of the genetic covariance matrix are set to the square root of the product of the heritabilities --equal_h2 Imposes the assumption that all traits passed to MTAG have equal heritability. The diagonal terms of the genetic covariance matrix are set equal to each other. Can only be used in conjunction with --perfect_gencov Max FDR calculation: These options are used for the calculation of an upper bound on the false disovery under the model described in Supplementary Note 1.1.4 of Turley et al. (2017). Note that there is one of three ways to define the space of grid points over which the upper bound is searched. --fdr Perform max FDR calculations --grid_file GRID_FILE Pre-set list of grid points. Users can define a list of grid points over which the search is conducted. The list of grid points should be passed in text file as a white-space delimited matrix of dimnesions, G x S, where G is the number of grid points and S = 2^T is the number of possible causal states for SNPs. States are ordered according to a tree-like recursive structure from right to left. For example, for 3 traits, with the triple TFT denoting the state for which SNPs are causal for State 1, not causal for state 2, and causal for state 3, then the column ordering of probabilities should be: FFF FFT FTF FTT TFF TFT TTF TTT There should be no headers, or row names in the file. Any rows for which (i) the probabilities do not sum to 1, the prior of a SNP being is causal is 0 for any of the traits, and (iii) the resulting genetic correlation matrix is non positive definite will excluded in the search. --fit_ss This estimates the prior probability that a SNP is null for each trait and then proceeds to restrict the grid search to the set of probability vectors that sum to the prior null for each trait. This is useful for restrict the search space of larger-dimensional traits. --intervals INTERVALS Number of intervals that you would like to partition the [0,1] interval. For example example, with two traits and --intervals set 10, then maxFDR will calculated over the set of feasible points in {0., 0.1, 0.2,..,0.9,1.0}^2. --cores CORES Number of threads/cores use to compute the FDR grid points for each trait. --p_sig P_SIG P-value threshold used for statistical signifiance. Default is p=5.0e-8 (genome-wide significance). --n_approx Speed up FDR calculation by replacing the sample size of a SNP for each trait by the mean across SNPs (for each trait). Recommended. Winner's curse adjustment: Options related to the winner's curse adjustment of estimates of effect sizes from MTAG that could be used when replicating analyses. Miscellaneous: --ld_ref_panel FOLDER_PATH Specify folder of the ld reference panel (split by chromosome) that will be used in the estimation of the error VCV (sigma). This option is passed to --ref-ld- chr and --w-ld-chr when running LD score regression. The default is to use the reference panel of LD scores computed from 1000 Genomes European subjects (eur_w_ld_chr) that is included with the distribution of MTAG --time_limit TIME_LIMIT Set time limit (hours) on the numerical estimation of the variance covariance matrix for MTAG, after which the optimization routine will complete its current iteration and perform MTAG using the last iteration of the genetic VCV. --std_betas Results files will have standardized effect sizes, i.e., the weights 1/sqrt(2*MAF*(1-MAF)) are not applied when outputting MTAG results, where MAF is the minor allele frequency. --tol TOL Set the relative (x) tolerance when numerically estimating the genetic variance-covariance matrix. Not recommended to change unless you are facing strong runtime constraints for a large number of traits. --numerical_omega Option to use the MLE estimator of the genetic VCV matrix, implemented through a numerical routine. --verbose When used, will include output from running ldsc scripts as well additional information (such as optimization routine information. --chunksize CHUNKSIZE Chunksize for reading in data. --stream_stdout Will streat mtag processing on console in addition to writing to log file.
--sumstats [File1],[File2],..
ldsc
(included in the mtag
folder)--out [output_folder][prefix]
[output_folder]
(e.g., ../output/
or ./
)[prefix]
(e.g., igss_mtag_tutorial_
) is prepended to the files produced by MTAG mtag
does not "guess" the column names (will hopefully do that soon).The workshop files provided contain the "default" column names for MTAG
bash 1.2_view_sumstats.bash
¶# 1.2 view sample sumstats files (for format)
!head ../data/1_OA2016_hm3samp_NEUR.txt # 1.2
snpid chr bpos a1 a2 freq z pval n rs2736372 8 11106041 T C 0.4179 -7.71614161262 1.209e-14 111111.111111 rs2060465 8 11162609 T C 0.6194 7.69444599845 1.422e-14 62500.0 rs10096421 8 10831868 T G 0.4646 -7.561098219 3.989e-14 111111.111111 rs2409722 8 11039816 T G 0.4627 -7.38261601808 1.553e-13 111111.111111 rs11991118 8 10939273 T G 0.5056 7.32202915636 2.443e-13 111111.111111 rs2736371 8 11105529 A G 0.3806 -7.32009158327 2.478e-13 62500.0 rs2736313 8 11086942 T C 0.4646 -7.24228035161 4.411e-13 111111.111111 rs876954 8 8310923 A G 0.4813 -7.15791677597 8.191e-13 62500.0 rs1533059 8 8684953 A G 0.4478 -7.07412856424 1.504e-12 62500.0
All of the columns shown above are necessary for MTAG to run
!tail -5 ../data/1_OA2016_hm3samp_NEUR.txt
rs165119 18 75317995 T C 0.1101 0.0 1.0 27777.7777778 rs1916337 3 165252023 T C 0.2444 0.0 1.0 62500.0 rs4878187 9 38227021 T C 0.8713 0.0 1.0 40000.0 rs8080313 17 35952983 A G 0.8396 0.0 1.0 40000.0 rs851997 6 151985550 A G 0.3787 0.0 1.0 62500.0
!sed -n '103995,104000p' ../data/1_OA2016_hm3samp_NEUR.txt
rs322382 5 172198952 T C 0.3041 -0.648595291061 0.5166 62500.0 rs3795732 1 156564640 A G 0.7108 -0.648595291061 0.5166 62500.0 rs4111166 9 26524416 T C 0.4739 0.648595291061 0.5166 111111.111111 rs4491677 2 128789204 A G 0.2276 -0.648595291061 0.5166 62500.0 rs622272 17 14400878 T G 0.5 -0.648595291061 0.5166 111111.111111 rs842938 2 201404694 T C 0.459 0.648595291061 0.5166 111111.111111
Using mtag
with the default options implements the following steps:
include
and/or exclude
).ldsc
, longest step). Call the mtag.py
script via python
Note: we use the --stream_stdout
so that messages from MTAG are printed to the console in addition to the log file.
bash 1.3_mtag_default.bash
¶# 1.3_mtag_default.bash
!python ../mtag/mtag.py \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/1_basics/1.3_mtag_default_NS \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.3 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --stream-stdout \ --sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \ --out ../output/1_basics/1.3_mtag_default_NS Beginning MTAG analysis... Read in Trait 1 summary statistics (185458 SNPs) from ../data/1_OA2016_hm3samp_NEUR.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 185458 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 185458 SNPs remain. Removed 0 SNPs with duplicated rs numbers (185458 SNPs remain). Removed 162680 SNPs with N < 74074.074074 (22778 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.467 Lambda GC = 1.366 Max chi^2 = 59.523 20 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Thu Oct 19 02:03:44 2017 Total time elapsed: 0.7s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 22778 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (167911 SNPs) from ../data/1_OA2016_hm3samp_SWB.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 167911 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 167911 SNPs remain. Removed 0 SNPs with duplicated rs numbers (167911 SNPs remain). Removed 102537 SNPs with N < 74074.074074 (65374 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.208 Lambda GC = 1.188 Max chi^2 = 29.353 1 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Thu Oct 19 02:03:46 2017 Total time elapsed: 1.21s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 65374 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 20321 SNPs remaining merging with previous traits. ... Merge of GWAS summary statistics complete. Number of SNPs: 20321 Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 1.043 -0.129] [-0.129 0.966]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...hm3samp_NEUR.txt 111111 111111 20321 1.407 1.488 133309 2 ..._hm3samp_SWB.txt 111111 111111 20321 1.269 1.408 168490 Estimated Omega: [[ 3.817e-06 -2.199e-06] [ -2.199e-06 2.336e-06]] Estimated Sigma: [[ 1.043 -0.129] [-0.129 0.966]] MTAG results saved to file. MTAG complete. Time elapsed: 27.0710499287s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multitrait Analysis of GWAS
<> Version: 1.0.1
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: maghzian@nber.org
<> All other correspondence: paturley@broadinstitute.org<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Calling ./mtag.py \
--stream-stdout \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/part_2/igss_mtag_default_NS
Beginning MTAG analysis...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
freq: Allele frequency
z: Directional summary statistic as specified by --signed-sumstats.
...
...
...
...
... Merge of GWAS summary statistics complete. Number of SNPs: 20321
Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity)
Estimating sigma..
Checking for positive definiteness ..
Sigma hat:
[[ 1.043 -0.129]
[-0.129 0.966]]
Beginning estimation of Omega ...
Using GMM estimator of Omega ..
Checking for positive definiteness ..
Completed estimation of Omega ...
Beginning MTAG calculations...
... Completed MTAG calculations.
Writing Phenotype 1 to file ...
Writing Phenotype 2 to file ...
...
Summary of MTAG results:
------------------------
Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N
1 ...hm3samp_NEUR.txt 111111 111111 20321 1.407 1.488 133309
2 ..._hm3samp_SWB.txt 111111 111111 20321 1.269 1.408 168490
Estimated Omega:
[[ 3.817e-06 -2.199e-06]
[ -2.199e-06 2.336e-06]]
Estimated Sigma:
[[ 1.043 -0.129]
[-0.129 0.966]]
[[ 1. , -0.7363645],
[-0.7363645, 1. ]]
bash 1.3b_list_files.bash
¶# 1.3b_list_files.bash
!ls ../output/1_basics/1.3*
../output/1_basics/1.3_mtag_default_NS.log ../output/1_basics/1.3_mtag_default_NS_omega_hat.txt ../output/1_basics/1.3_mtag_default_NS_sigma_hat.txt ../output/1_basics/1.3_mtag_default_NS_trait_1.txt ../output/1_basics/1.3_mtag_default_NS_trait_2.txt
T + 3 files (T number of input sumstats), all whitespace-delimited
*.log
: Log file (shown above)*_omega_hat.txt
: Estimated genetic covariance matrix*_sigma_hat.txt
: Estimated residual covariance matrix*_trait_[t].txt
: MTAG results for trait $t=1,\ldots, T$bash 1.3c_view_mtag_output.bash
¶!head "../output/1_basics/1.3_mtag_default_NS_trait_1.txt"
snpid chr bpos a1 a2 z n freq mtag_beta mtag_se mtag_z mtag_pval rs2736372 8 11106041 T C -7.71614161262 111111.111111 0.4179 -0.0312843727759 0.00401008053375 -7.80143254297 6.12083232605e-15 rs10096421 8 10831868 T G -7.561098219 111111.111111 0.4646 -0.0292358562051 0.0039656035404 -7.37235982046 1.67633824925e-13 rs2409722 8 11039816 T G -7.38261601808 111111.111111 0.4627 -0.0314168522186 0.00396670505601 -7.92013819404 2.37246749e-15 rs11991118 8 10939273 T G 7.32202915636 111111.111111 0.5056 0.0295263257704 0.00395590010421 7.46387041953 8.40172307848e-14 rs2736313 8 11086942 T C -7.24228035161 111111.111111 0.4646 -0.0296988957677 0.0039656035404 -7.48912377779 6.93349296434e-14 rs7016385 8 10779472 T C 7.014366598 111111.111111 0.5373 0.0259293691662 0.00396670505601 6.53675249359 6.28689598667e-11 rs11250130 8 11214455 A G 6.609661126 111111.111111 0.4869 0.026363608706 0.00395701034061 6.66250690212 2.69195672093e-11 rs7001819 8 11650475 T C -6.40845674838 111111.111111 0.5728 -0.0237731953794 0.00399825927266 -5.94588638658 2.74964820715e-09 rs4240671 8 10767748 A G 6.35561088027 111111.111111 0.4981 0.0233347188014 0.00395568054249 5.89904026646 3.65622009607e-09
snpid | a1 | a2 | z | mtag_beta | mtag_se | mtag_z | mtag_pval |
---|---|---|---|---|---|---|---|
rs2736372 | T | C | -7.71614161262 | -0.0312843727759 | 0.00401008053375 | -7.80143254297 | 6.12083232605e-15 |
rs10096421 | T | G | -7.561098219 | -0.0292358562051 | 0.0039656035404 | -7.37235982046 | 1.67633824925e-13 |
rs2409722 | T | G | -7.38261601808 | -0.0314168522186 | 0.00396670505601 | -7.92013819404 | 2.37246749e-15 |
rs11991118 | T | G | 7.32202915636 | 0.0295263257704 | 0.00395590010421 | 7.46387041953 | 8.40172307848e-14 |
rs2736313 | T | C | -7.24228035161 | -0.0296988957677 | 0.0039656035404 | -7.48912377779 | 6.93349296434e-14 |
snpid | a1 | a2 | z | mtag_beta | mtag_se | mtag_z | mtag_pval |
---|---|---|---|---|---|---|---|
rs2736372 | T | C | -7.71614161262 | -0.0312843727759 | 0.00401008053375 | -7.80143254297 | 6.12083232605e-15 |
rs10096421 | T | G | -7.561098219 | -0.0292358562051 | 0.0039656035404 | -7.37235982046 | 1.67633824925e-13 |
rs2409722 | T | G | -7.38261601808 | -0.0314168522186 | 0.00396670505601 | -7.92013819404 | 2.37246749e-15 |
rs11991118 | T | G | 7.32202915636 | 0.0295263257704 | 0.00395590010421 | 7.46387041953 | 8.40172307848e-14 |
rs2736313 | T | C | -7.24228035161 | -0.0296988957677 | 0.0039656035404 | -7.48912377779 | 6.93349296434e-14 |
mtag_
is produced by MTAGmtag_beta
and mtag_se
are unstandardized by genotype--std_betas
options to produce standardized beta/seColumn | option | Default |
---|---|---|
SNP ID | --snp_name |
snpid |
Chromosome | --chr_name |
chr |
Base-pair position | --bpos_name |
bpos |
Z-score | --z_name |
z * |
sample size | --n_name |
n |
expected allele frequency | --eaf_name |
freq |
Allele 1 (effect allele) | --a1_name |
a1 |
Allele 2 (other allele) | --a2_name |
a2 |
--gencov_path
to specify $\Omega$--residcov_path
to specify $\Sigma_{LD}$bash 1.4_mtag_matrices.bash
¶!python ../mtag/mtag.py \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/1_basics/1.4_mtag_withMat_NS \
--gencov_path ../output/1_basics/1.3_mtag_default_NS_omega_hat.txt \
--residcov_path ../output/1_basics/1.3_mtag_default_NS_sigma_hat.txt \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.2 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --gencov-path ../output/1_basics/1.3_mtag_default_NS_omega_hat.txt \ --stream-stdout \ --residcov-path ../output/1_basics/1.3_mtag_default_NS_sigma_hat.txt \ --sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \ --out ../output/1_basics/1.4_mtag_withMat_NS Beginning MTAG analysis... Read in Trait 1 summary statistics (185458 SNPs) from ../data/1_OA2016_hm3samp_NEUR.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 185458 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 185458 SNPs remain. Removed 0 SNPs with duplicated rs numbers (185458 SNPs remain). Removed 162680 SNPs with N < 74074.074074 (22778 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.467 Lambda GC = 1.366 Max chi^2 = 59.523 20 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 16:09:14 2017 Total time elapsed: 0.61s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 22778 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (167911 SNPs) from ../data/1_OA2016_hm3samp_SWB.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 167911 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 167911 SNPs remain. Removed 0 SNPs with duplicated rs numbers (167911 SNPs remain). Removed 102537 SNPs with N < 74074.074074 (65374 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.208 Lambda GC = 1.188 Max chi^2 = 29.353 1 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 16:09:15 2017 Total time elapsed: 0.74s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 65374 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 20321 SNPs remaining merging with previous traits. ... Merge of GWAS summary statistics complete. Number of SNPs: 20321 Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Checking for positive definiteness .. Sigma hat: [[ 1.043 -0.129] [-0.129 0.966]] Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...hm3samp_NEUR.txt 111111 111111 20321 1.407 1.488 133309 2 ..._hm3samp_SWB.txt 111111 111111 20321 1.269 1.408 168490 Estimated Omega: [[ 3.817e-06 -2.199e-06] [ -2.199e-06 2.336e-06]] Estimated Sigma: [[ 1.043 -0.129] [-0.129 0.966]] MTAG results saved to file. MTAG complete. Time elapsed: 3.93074512482s
EducAtt_ea3.txt
EducAtt_ukb.txt
!python ../mtag/mtag.py \
--sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \
--out ../output/2_special_opts/2.0_mtag_EA \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.2 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --stream-stdout \ --sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \ --out ../output/2_special_opts/2.0_mtag_EA Beginning MTAG analysis... Read in Trait 1 summary statistics (8146840 SNPs) from ../data/EducAtt_ea2.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 8146840 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8146840 SNPs remain. Removed 0 SNPs with duplicated rs numbers (8146840 SNPs remain). Removed 4434138 SNPs with N < 74074.074074 (3712702 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.85 Lambda GC = 1.627 Max chi^2 = 106.943 5297 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:12:58 2017 Total time elapsed: 34.62s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 3712702 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (1191004 SNPs) from ../data/EducAtt_ukb.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 1191004 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 12960 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1178044 SNPs remain. Removed 0 SNPs with duplicated rs numbers (1178044 SNPs remain). Removed 0 SNPs with N < 35242.0 (1178044 SNPs remain). Median value of SIGNED_SUMSTATS was -0.000292861940766, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.205 Lambda GC = 1.192 Max chi^2 = 39.518 23 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:13:23 2017 Total time elapsed: 5.41s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 1178044 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 790847 SNPs remaining merging with previous traits. Dropped 1 SNPs due to inconsistent allele pairs from phenotype 2. 790846 SNPs remain. Flipped the signs of of 384828 SNPs to make them consistent with the effect allele orderings of the first trait. ... Merge of GWAS summary statistics complete. Number of SNPs: 790846 Using 790846 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 0.92 0.385] [ 0.385 1.028]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.022 252546 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 1.841 217709 Estimated Omega: [[ 6.492e-06 4.815e-06] [ 4.815e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] MTAG results saved to file. MTAG complete. Time elapsed: 2.0m:58.9194750786s
!tail -26 ../output/2_special_opts/2.0_mtag_EA.log
2017/10/18/06:14:31 PM Beginning estimation of Omega ... 2017/10/18/06:14:31 PM Using GMM estimator of Omega .. 2017/10/18/06:14:31 PM Checking for positive definiteness .. 2017/10/18/06:14:31 PM Completed estimation of Omega ... 2017/10/18/06:14:31 PM Beginning MTAG calculations... 2017/10/18/06:14:32 PM ... Completed MTAG calculations. 2017/10/18/06:14:32 PM Writing Phenotype 1 to file ... 2017/10/18/06:14:49 PM Writing Phenotype 2 to file ... 2017/10/18/06:15:07 PM Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.022 252546 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 1.841 217709 Estimated Omega: [[ 6.492e-06 4.815e-06] [ 4.815e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] 2017/10/18/06:15:07 PM 2017/10/18/06:15:07 PM MTAG results saved to file. 2017/10/18/06:15:07 PM MTAG complete. Time elapsed: 2.0m:58.9194750786s
--no_overlap
: MTAG in the absence of sample overlap¶ldsc
, will speed up estimation.¶bash 2.1_mtag_no_overlap.bash
¶!python ../mtag/mtag.py \
--sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \
--out ../output/2_special_opts/2.1_mtag_no_overlap_EA \
--no_overlap \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.2 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --stream-stdout \ --no-overlap \ --sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \ --out ../output/2_special_opts/2.1_mtag_no_overlap_EA Beginning MTAG analysis... Read in Trait 1 summary statistics (8146840 SNPs) from ../data/EducAtt_ea2.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 8146840 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8146840 SNPs remain. Removed 0 SNPs with duplicated rs numbers (8146840 SNPs remain). Removed 4434138 SNPs with N < 74074.074074 (3712702 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.85 Lambda GC = 1.627 Max chi^2 = 106.943 5297 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:06:09 2017 Total time elapsed: 38.55s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 3712702 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (1191004 SNPs) from ../data/EducAtt_ukb.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 1191004 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 12960 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1178044 SNPs remain. Removed 0 SNPs with duplicated rs numbers (1178044 SNPs remain). Removed 0 SNPs with N < 35242.0 (1178044 SNPs remain). Median value of SIGNED_SUMSTATS was -0.000292861940766, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.205 Lambda GC = 1.192 Max chi^2 = 39.518 23 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:06:37 2017 Total time elapsed: 5.74s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 1178044 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 790847 SNPs remaining merging with previous traits. Dropped 1 SNPs due to inconsistent allele pairs from phenotype 2. 790846 SNPs remain. Flipped the signs of of 384828 SNPs to make them consistent with the effect allele orderings of the first trait. ... Merge of GWAS summary statistics complete. Number of SNPs: 790846 Using 790846 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 0.92 0. ] [ 0. 1.028]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. matrix is not positive definite, performing adjustment.. Completed in 0 iterations Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.501 371136 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 2.468 380236 Estimated Omega: [[ 6.492e-06 5.026e-06] [ 5.026e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0. ] [ 0. 1.028]] MTAG results saved to file. MTAG complete. Time elapsed: 3.0m:1.71074914932s
!tail -26 ../output/2_special_opts/2.1_mtag_no_overLap_EA.log
2017/10/18/06:07:40 PM Checking for positive definiteness .. 2017/10/18/06:07:40 PM matrix is not positive definite, performing adjustment.. 2017/10/18/06:07:40 PM Completed in 0 iterations 2017/10/18/06:07:40 PM Completed estimation of Omega ... 2017/10/18/06:07:40 PM Beginning MTAG calculations... 2017/10/18/06:07:41 PM ... Completed MTAG calculations. 2017/10/18/06:07:41 PM Writing Phenotype 1 to file ... 2017/10/18/06:07:58 PM Writing Phenotype 2 to file ... 2017/10/18/06:08:15 PM Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.501 371136 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 2.468 380236 Estimated Omega: [[ 6.492e-06 5.026e-06] [ 5.026e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0. ] [ 0. 1.028]] 2017/10/18/06:08:15 PM 2017/10/18/06:08:15 PM MTAG results saved to file. 2017/10/18/06:08:15 PM MTAG complete. Time elapsed: 3.0m:1.71074914932s
--perfect_gencov
: Different measures of the same trait¶bash 2.2_perfect_gencov.bash
¶# 2.2_perfect_gencov.bash
!python ../mtag/mtag.py \
--sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \
--out ../output/2_special_opts/2.2_mtag_perf_gencov_EA \
--perfect_gencov \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.3 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --stream-stdout \ --perfect-gencov \ --sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \ --out ../output/2_special_opts/2.2_mtag_perf_gencov_EA Beginning MTAG analysis... Read in Trait 1 summary statistics (8146840 SNPs) from ../data/EducAtt_ea2.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 8146840 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8146840 SNPs remain. Removed 0 SNPs with duplicated rs numbers (8146840 SNPs remain). Removed 4434138 SNPs with N < 74074.074074 (3712702 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.85 Lambda GC = 1.627 Max chi^2 = 106.943 5297 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:18:25 2017 Total time elapsed: 39.93s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 3712702 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (1191004 SNPs) from ../data/EducAtt_ukb.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 1191004 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 12960 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1178044 SNPs remain. Removed 0 SNPs with duplicated rs numbers (1178044 SNPs remain). Removed 0 SNPs with N < 35242.0 (1178044 SNPs remain). Median value of SIGNED_SUMSTATS was -0.000292861940766, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.205 Lambda GC = 1.192 Max chi^2 = 39.518 23 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:18:52 2017 Total time elapsed: 5.76s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 1178044 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 790847 SNPs remaining merging with previous traits. Dropped 1 SNPs due to inconsistent allele pairs from phenotype 2. 790846 SNPs remain. Flipped the signs of of 384828 SNPs to make them consistent with the effect allele orderings of the first trait. ... Merge of GWAS summary statistics complete. Number of SNPs: 790846 Using 790846 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 0.92 0.385] [ 0.385 1.028]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.022 252528 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 2.022 264572 Estimated Omega: [[ 6.492e-06 5.077e-06] [ 5.077e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] MTAG results saved to file. MTAG complete. Time elapsed: 3.0m:0.128737211227s
!tail -26 ../output/2_special_opts/2.2_mtag_perf_gencov_EA.log
2017/10/18/06:19:54 PM Beginning estimation of Omega ... 2017/10/18/06:19:54 PM Using GMM estimator of Omega .. 2017/10/18/06:19:54 PM Checking for positive definiteness .. 2017/10/18/06:19:54 PM Completed estimation of Omega ... 2017/10/18/06:19:54 PM Beginning MTAG calculations... 2017/10/18/06:19:56 PM ... Completed MTAG calculations. 2017/10/18/06:19:56 PM Writing Phenotype 1 to file ... 2017/10/18/06:20:12 PM Writing Phenotype 2 to file ... 2017/10/18/06:20:28 PM Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 2.022 252528 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 2.022 264572 Estimated Omega: [[ 6.492e-06 5.077e-06] [ 5.077e-06 3.970e-06]] Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] 2017/10/18/06:20:28 PM 2017/10/18/06:20:28 PM MTAG results saved to file. 2017/10/18/06:20:28 PM MTAG complete. Time elapsed: 3.0m:0.128737211227s
Analysis | $\Omega$ | Genetic correlation |
---|---|---|
Regular | [6.492e-06 4.815e-06] [4.815e-06 3.970e-06] |
[1.000 0.949] [0.949 1.000] |
No overlap | [6.492e-06 5.026e-06] [5.026e-06 3.970e-06] |
[1.000 0.990] [0.990 1.000] |
Perfect genetic correlation | [6.492e-06 5.077e-06] [5.077e-06 3.970e-06] |
[1.000 1.000] [1.000 1.000] |
--equal_h2
: Performing meta-analysis with mtag¶bash 2.3_mtag_equal_h2.bash
¶!python ../mtag/mtag.py \
--sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \
--out ../output/2_special_opts/2.3_mtag_perf_gencov_EA \
--perfect_gencov \
--equal_h2 \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.3 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --equal-h2 \ --stream-stdout \ --perfect-gencov \ --sumstats ../data/EducAtt_ea2.txt,../data/EducAtt_ukb.txt \ --out ../output/2_special_opts/2.3_mtag_perf_gencov_EA Beginning MTAG analysis... Read in Trait 1 summary statistics (8146840 SNPs) from ../data/EducAtt_ea2.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 8146840 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8146840 SNPs remain. Removed 0 SNPs with duplicated rs numbers (8146840 SNPs remain). Removed 4434138 SNPs with N < 74074.074074 (3712702 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.85 Lambda GC = 1.627 Max chi^2 = 106.943 5297 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:39:18 2017 Total time elapsed: 35.26s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 3712702 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (1191004 SNPs) from ../data/EducAtt_ukb.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 1191004 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 12960 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1178044 SNPs remain. Removed 0 SNPs with duplicated rs numbers (1178044 SNPs remain). Removed 0 SNPs with N < 35242.0 (1178044 SNPs remain). Median value of SIGNED_SUMSTATS was -0.000292861940766, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.205 Lambda GC = 1.192 Max chi^2 = 39.518 23 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:39:47 2017 Total time elapsed: 6.26s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 1178044 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 790847 SNPs remaining merging with previous traits. Dropped 1 SNPs due to inconsistent allele pairs from phenotype 2. 790846 SNPs remain. Flipped the signs of of 384828 SNPs to make them consistent with the effect allele orderings of the first trait. ... Merge of GWAS summary statistics complete. Number of SNPs: 790846 Using 790846 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 0.92 0.385] [ 0.385 1.028]] Beginning estimation of Omega ... --perfect_gencov and --equal_h2 option used Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 1.998 246761 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 1.998 258530 Omega hat not computed because --equal_h2 was used. Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] MTAG results saved to file. MTAG complete. Time elapsed: 3.0m:0.847772121429s
!tail -26 ../output/2_special_opts/2.3_mtag_perf_gencov_EA.log
2017/10/18/06:40:53 PM Checking for positive definiteness .. 2017/10/18/06:40:53 PM Sigma hat: [[ 0.92 0.385] [ 0.385 1.028]] 2017/10/18/06:40:53 PM Beginning estimation of Omega ... 2017/10/18/06:40:53 PM --perfect_gencov and --equal_h2 option used 2017/10/18/06:40:53 PM Completed estimation of Omega ... 2017/10/18/06:40:53 PM Beginning MTAG calculations... 2017/10/18/06:40:54 PM ... Completed MTAG calculations. 2017/10/18/06:40:54 PM Writing Phenotype 1 to file ... 2017/10/18/06:41:10 PM Writing Phenotype 2 to file ... 2017/10/18/06:41:27 PM Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../EducAtt_ea2.txt 250000 134825 790846 2.011 1.998 246761 2 .../EducAtt_ukb.txt 52863 52863 790846 1.204 1.998 258530 Omega hat not computed because --equal_h2 was used. Estimated Sigma: [[ 0.92 0.385] [ 0.385 1.028]] 2017/10/18/06:41:27 PM 2017/10/18/06:41:27 PM MTAG results saved to file. 2017/10/18/06:41:27 PM MTAG complete. Time elapsed: 3.0m:0.847772121429s
--fdr
: Calculates an approximate upper bound on the FDR based on the framework presented. --grid_file
)--intervals
)--cores
)--n_approx
)--intervals 10
and a single core.3.1_max_FDR_basics.bash
¶# 3.1_max_FDR_basics.bash
!python ../mtag/mtag.py \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS \
--stream_stdout \
--fdr \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.3 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --stream-stdout \ --sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \ --fdr \ --out ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS Beginning MTAG analysis... Read in Trait 1 summary statistics (185458 SNPs) from ../data/1_OA2016_hm3samp_NEUR.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 185458 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 185458 SNPs remain. Removed 0 SNPs with duplicated rs numbers (185458 SNPs remain). Removed 162680 SNPs with N < 74074.074074 (22778 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.467 Lambda GC = 1.366 Max chi^2 = 59.523 20 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:44:01 2017 Total time elapsed: 0.63s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 22778 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (167911 SNPs) from ../data/1_OA2016_hm3samp_SWB.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 167911 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 167911 SNPs remain. Removed 0 SNPs with duplicated rs numbers (167911 SNPs remain). Removed 102537 SNPs with N < 74074.074074 (65374 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.208 Lambda GC = 1.188 Max chi^2 = 29.353 1 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 18:44:03 2017 Total time elapsed: 0.68s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 65374 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 20321 SNPs remaining merging with previous traits. ... Merge of GWAS summary statistics complete. Number of SNPs: 20321 Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 1.043 -0.129] [-0.129 0.966]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...hm3samp_NEUR.txt 111111 111111 20321 1.407 1.488 133309 2 ..._hm3samp_SWB.txt 111111 111111 20321 1.269 1.408 168490 Estimated Omega: [[ 3.817e-06 -2.199e-06] [ -2.199e-06 2.336e-06]] Estimated Sigma: [[ 1.043 -0.129] [-0.129 0.966]] MTAG results saved to file. Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... T=2 Number of gridpoints to search: 65 Performing grid search using 1 cores. Grid search: 10.0 percent finished for . Time: 0.004 min Grid search: 20.0 percent finished for . Time: 0.005 min Grid search: 30.0 percent finished for . Time: 0.007 min Grid search: 40.0 percent finished for . Time: 0.008 min Grid search: 50.0 percent finished for . Time: 0.009 min Grid search: 60.0 percent finished for . Time: 0.010 min Grid search: 70.0 percent finished for . Time: 0.011 min Grid search: 80.0 percent finished for . Time: 0.011 min Grid search: 90.0 percent finished for . Time: 0.012 min Grid search: 100.0 percent finished for . Time: 0.013 min Saved calculations of fdr over grid points in ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS_fdr_mat.txt <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> grid point indices for max FDR for each trait: [12 3] Maximum FDR Max FDR of Trait 1: 0.00109655427181 at probs = [ 0. 0.3 0. 0.7] Max FDR of Trait 2: 0.00436959625046 at probs = [ 0. 0. 0.3 0.7] <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Completed FDR calculations. MTAG complete. Time elapsed: 25.1162800789s
!tail -26 ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS.log
2017/10/18/06:44:25 PM 2017/10/18/06:44:25 PM MTAG results saved to file. 2017/10/18/06:44:25 PM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2017/10/18/06:44:25 PM T=2 2017/10/18/06:44:25 PM Number of gridpoints to search: 65 2017/10/18/06:44:25 PM Performing grid search using 1 cores. 2017/10/18/06:44:25 PM Grid search: 10.0 percent finished for . Time: 0.004 min 2017/10/18/06:44:25 PM Grid search: 20.0 percent finished for . Time: 0.005 min 2017/10/18/06:44:25 PM Grid search: 30.0 percent finished for . Time: 0.007 min 2017/10/18/06:44:25 PM Grid search: 40.0 percent finished for . Time: 0.008 min 2017/10/18/06:44:25 PM Grid search: 50.0 percent finished for . Time: 0.009 min 2017/10/18/06:44:25 PM Grid search: 60.0 percent finished for . Time: 0.010 min 2017/10/18/06:44:25 PM Grid search: 70.0 percent finished for . Time: 0.011 min 2017/10/18/06:44:25 PM Grid search: 80.0 percent finished for . Time: 0.011 min 2017/10/18/06:44:25 PM Grid search: 90.0 percent finished for . Time: 0.012 min 2017/10/18/06:44:25 PM Grid search: 100.0 percent finished for . Time: 0.013 min 2017/10/18/06:44:25 PM Saved calculations of fdr over grid points in ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS_fdr_mat.txt 2017/10/18/06:44:25 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2017/10/18/06:44:25 PM grid point indices for max FDR for each trait: [12 3] 2017/10/18/06:44:25 PM Maximum FDR 2017/10/18/06:44:25 PM Max FDR of Trait 1: 0.00109655427181 at probs = [ 0. 0.3 0. 0.7] 2017/10/18/06:44:25 PM Max FDR of Trait 2: 0.00436959625046 at probs = [ 0. 0. 0.3 0.7] 2017/10/18/06:44:25 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2017/10/18/06:44:25 PM Completed FDR calculations. 2017/10/18/06:44:25 PM MTAG complete. Time elapsed: 25.1162800789s
--n_approx
¶ --n_approx Speed up FDR calculation by replacing the sample size
of a SNP for each trait by the mean across SNPs (for
each trait). Recommended.
bash 3.2_maxFDR_approx.bash
¶!python ../mtag/mtag.py \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS \
--stream_stdout \
--fdr \
--intervals 100 \
--n_approx \
--stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multitrait Analysis of GWAS <> Version: 1.0.3 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Calling ./mtag.py \ --intervals 100 \ --stream-stdout \ --n-approx \ --sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \ --fdr \ --out ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS Beginning MTAG analysis... Read in Trait 1 summary statistics (185458 SNPs) from ../data/1_OA2016_hm3samp_NEUR.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 185458 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 185458 SNPs remain. Removed 0 SNPs with duplicated rs numbers (185458 SNPs remain). Removed 162680 SNPs with N < 74074.074074 (22778 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.467 Lambda GC = 1.366 Max chi^2 = 59.523 20 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 19:17:06 2017 Total time elapsed: 0.77s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 22778 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Read in Trait 2 summary statistics (167911 SNPs) from ../data/1_OA2016_hm3samp_SWB.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. freq: Allele frequency z: Directional summary statistic as specified by --signed-sumstats. Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. . done Read 167911 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 167911 SNPs remain. Removed 0 SNPs with duplicated rs numbers (167911 SNPs remain). Removed 102537 SNPs with N < 74074.074074 (65374 SNPs remain). Median value of SIGNED_SUMSTATS was 0.0, which seems sensible. Dropping snps with null values Metadata: Mean chi^2 = 1.208 Lambda GC = 1.188 Max chi^2 = 29.353 1 Genome-wide significant SNPs (some may have been removed by filtering). Conversion finished at Wed Oct 18 19:17:08 2017 Total time elapsed: 0.98s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 65374 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Trait 2 summary statistics: 20321 SNPs remaining merging with previous traits. ... Merge of GWAS summary statistics complete. Number of SNPs: 20321 Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Checking for positive definiteness .. Sigma hat: [[ 1.043 -0.129] [-0.129 0.966]] Beginning estimation of Omega ... Using GMM estimator of Omega .. Checking for positive definiteness .. Completed estimation of Omega ... Beginning MTAG calculations... ... Completed MTAG calculations. Writing Phenotype 1 to file ... Writing Phenotype 2 to file ... Summary of MTAG results: ------------------------ Trait N (max) N (mean) # SNPs used GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...hm3samp_NEUR.txt 111111 111111 20321 1.407 1.488 133309 2 ..._hm3samp_SWB.txt 111111 111111 20321 1.269 1.408 168490 Estimated Omega: [[ 3.817e-06 -2.199e-06] [ -2.199e-06 2.336e-06]] Estimated Sigma: [[ 1.043 -0.129] [-0.129 0.966]] MTAG results saved to file. Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... T=2 Number of gridpoints to search: 33580 Performing grid search using 1 cores. Grid search: 10.0 percent finished for . Time: 0.214 min Grid search: 20.0 percent finished for . Time: 0.442 min Grid search: 30.0 percent finished for . Time: 0.674 min Grid search: 40.0 percent finished for . Time: 0.905 min Grid search: 50.0 percent finished for . Time: 1.134 min Grid search: 60.0 percent finished for . Time: 1.357 min Grid search: 70.0 percent finished for . Time: 1.581 min Grid search: 80.0 percent finished for . Time: 1.794 min Grid search: 90.0 percent finished for . Time: 2.048 min Grid search: 100.0 percent finished for . Time: 2.276 min Saved calculations of fdr over grid points in ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS_fdr_mat.txt <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> grid point indices for max FDR for each trait: [ 829 33576] Maximum FDR Max FDR of Trait 1: 0.00110246986219 at probs = [ 0. 0.28 0. 0.72] Max FDR of Trait 2: 0.0271483484174 at probs = [ 0.97 0. 0.01 0.02] <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Completed FDR calculations. MTAG complete. Time elapsed: 3.0m:6.61157798767s
!tail -26 ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS.log
2017/10/18/07:17:30 PM 2017/10/18/07:17:30 PM MTAG results saved to file. 2017/10/18/07:17:30 PM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2017/10/18/07:17:30 PM T=2 2017/10/18/07:17:54 PM Number of gridpoints to search: 33580 2017/10/18/07:17:54 PM Performing grid search using 1 cores. 2017/10/18/07:18:07 PM Grid search: 10.0 percent finished for . Time: 0.214 min 2017/10/18/07:18:21 PM Grid search: 20.0 percent finished for . Time: 0.442 min 2017/10/18/07:18:35 PM Grid search: 30.0 percent finished for . Time: 0.674 min 2017/10/18/07:18:49 PM Grid search: 40.0 percent finished for . Time: 0.905 min 2017/10/18/07:19:02 PM Grid search: 50.0 percent finished for . Time: 1.134 min 2017/10/18/07:19:16 PM Grid search: 60.0 percent finished for . Time: 1.357 min 2017/10/18/07:19:29 PM Grid search: 70.0 percent finished for . Time: 1.581 min 2017/10/18/07:19:42 PM Grid search: 80.0 percent finished for . Time: 1.794 min 2017/10/18/07:19:57 PM Grid search: 90.0 percent finished for . Time: 2.048 min 2017/10/18/07:20:11 PM Grid search: 100.0 percent finished for . Time: 2.276 min 2017/10/18/07:20:12 PM Saved calculations of fdr over grid points in ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS_fdr_mat.txt 2017/10/18/07:20:12 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2017/10/18/07:20:12 PM grid point indices for max FDR for each trait: [ 829 33576] 2017/10/18/07:20:12 PM Maximum FDR 2017/10/18/07:20:12 PM Max FDR of Trait 1: 0.00110246986219 at probs = [ 0. 0.28 0. 0.72] 2017/10/18/07:20:12 PM Max FDR of Trait 2: 0.0271483484174 at probs = [ 0.97 0. 0.01 0.02] 2017/10/18/07:20:12 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2017/10/18/07:20:12 PM Completed FDR calculations. 2017/10/18/07:20:12 PM MTAG complete. Time elapsed: 3.0m:6.61157798767s
2017/09/28/05:21:04 PM Calculating FDR by setting number of intervals to 10, and True use of approximation.
2017/09/28/05:21:04 PM T=2
2017/09/28/05:21:04 PM Number of gridpoints to search: 92
2017/09/28/05:21:04 PM Performing grid search using 20 cores.
2017/09/28/05:21:05 PM Grid search: 10.0 percent finished for . Time: 0.006 min
2017/09/28/05:21:05 PM Grid search: 20.0 percent finished for . Time: 0.011 min
2017/09/28/05:21:06 PM Grid search: 30.0 percent finished for . Time: 0.016 min
2017/09/28/05:21:06 PM Grid search: 40.0 percent finished for . Time: 0.022 min
2017/09/28/05:21:06 PM Grid search: 50.0 percent finished for . Time: 0.027 min
2017/09/28/05:21:06 PM Grid search: 60.0 percent finished for . Time: 0.032 min
2017/09/28/05:21:07 PM Grid search: 70.0 percent finished for . Time: 0.037 min
2017/09/28/05:21:07 PM Grid search: 80.0 percent finished for . Time: 0.043 min
2017/09/28/05:21:07 PM Grid search: 90.0 percent finished for . Time: 0.048 min
2017/09/28/05:21:08 PM Grid search: 100.0 percent finished for . Time: 0.054 min
2017/09/28/05:21:08 PM Saved calculations of fdr over grid points in /var/genetics/ukb/huili/mtag/output/main/fdr_calculation/interval_10_True_meanapprox_fdr_mat.txt
2017/09/28/05:21:08 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2017/09/28/05:21:08 PM grid point indices for max FDR for each trait: [90 3]
2017/09/28/05:21:08 PM Maximum FDR
2017/09/28/05:21:08 PM Max FDR of Trait 1: 0.104708870765 at probs = [ 0.8 0.1 0. 0.1]
2017/09/28/05:21:08 PM Max FDR of Trait 2: 0.000149109160478 at probs = [ 0. 0. 0.3 0.7]
2017/09/28/05:21:08 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2017/09/28/05:21:08 PM Completed FDR calculations.
2017/09/28/05:21:08 PM Calculating FDR by setting number of intervals to 10, and False use of approximation.
2017/09/28/05:21:08 PM T=2
2017/09/28/05:21:08 PM Number of gridpoints to search: 92
2017/09/28/05:21:08 PM Performing grid search using 20 cores.
2017/09/28/05:22:30 PM Grid search: 10.0 percent finished for . Time: 1.371 min
2017/09/28/05:23:52 PM Grid search: 20.0 percent finished for . Time: 2.743 min
2017/09/28/05:25:15 PM Grid search: 30.0 percent finished for . Time: 4.112 min
2017/09/28/05:26:37 PM Grid search: 40.0 percent finished for . Time: 5.480 min
2017/09/28/05:27:59 PM Grid search: 50.0 percent finished for . Time: 6.858 min
2017/09/28/05:29:22 PM Grid search: 60.0 percent finished for . Time: 8.230 min
2017/09/28/05:30:44 PM Grid search: 70.0 percent finished for . Time: 9.599 min
2017/09/28/05:32:07 PM Grid search: 80.0 percent finished for . Time: 10.978 min
2017/09/28/05:33:29 PM Grid search: 90.0 percent finished for . Time: 12.350 min
2017/09/28/05:34:52 PM Grid search: 100.0 percent finished for . Time: 13.728 min
2017/09/28/05:34:52 PM Saved calculations of fdr over grid points in /var/genetics/ukb/huili/mtag/output/main/fdr_calculation/interval_10_False_meanapprox_fdr_mat.txt
2017/09/28/05:34:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2017/09/28/05:34:52 PM grid point indices for max FDR for each trait: [90 3]
2017/09/28/05:34:52 PM Maximum FDR
2017/09/28/05:34:52 PM Max FDR of Trait 1: 0.104764850463 at probs = [ 0.8 0.1 0. 0.1]
2017/09/28/05:34:52 PM Max FDR of Trait 2: 0.000149016665847 at probs = [ 0. 0. 0.3 0.7]
2017/09/28/05:34:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2017/09/28/05:34:52 PM Completed FDR calculations.
Ways to get help:
MTAG allows analysis of GWAS of multiple correlated traits using summary statistics
Identifies more loci and improves polygenic prediction for depressive symptoms, neuroticism, and subjective well-being
Code is publicly available at: https://github.com/omeed-maghzian/mtag