This vignette shows some examples of using GxEScanR to perform genome-wide association study (GWAS) and genome-wide by environment interaction study (GWEIS) scans using all the options available to the user.

First 5 Subjects
sid	y	e
I1	0	0
I2	0	0
I3	0	0
I4	0	0
I5	0	0

Examples

Linear Regression GWAS

The simplest scan to do is a linear regression GWAS. The following model is first when doing a linear regression GWAS.

Model Fit
model	outcome	predictors	subjects
D\|E,G	Phenotype	All covariates, gene	All

In the example data set, the phenotype, y, is coded 0,1. When GxEScanR sees the phenotype codes this way it assumes the outcome is binary and uses logistic regression. To perform a linear regression GWAS the binary option needs to be set to FALSE. The following shows how to do a linear GWAS along with the results.

The routine outputs the number of subjects used in the analysis and returns a data frame with the results.

lingwas1 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 binary = FALSE)
#> [1] "200 subjects have complete data"

Linear Regression GWAS
snp	betag	lrtg
1:10001	0.1605708	9.0532054
1:10002	-0.0220380	0.1219035
1:10003	-0.0732687	1.3812032
1:10004	0.0326160	0.2355224
1:10005	0.0627953	1.2072107

The output can be redirected to output file that produces a plain test version of the results in a tab delimited file that can be read into R using the read.table routine. In this case, the gwas routine returns a value of 0.

outfile <- tempfile()
lingwas2 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 outfile = outfile,
                 binary = FALSE)
#> [1] "200 subjects have complete data"
lingwas2
#> [1] 0
lingwas2 <- read.table(outfile, header = TRUE, sep ='\t')

Linear Regression GWAS
SNPID	betag	lrtg
1:10001	0.1605708	9.0532054
1:10002	-0.0220380	0.1219035
1:10003	-0.0732687	1.3812032
1:10004	0.0326160	0.2355224
1:10005	0.0627953	1.2072107

Linear Regression GWEIS

The gweis routine takes the same parameters as the gwas function but performs additional tests. The models fit for a linear regression GWAS are.

Models Fit
model	outcome	predictors	subjects
D\|E,G	Phenotype	All covariates, gene	All
D\|E,G,GxE	Phenotype	All covariates, gene, gene x last covariate	All

Note: When doing a GWEIS the interaction covariate is in the last column of the subject data frame.

In this test the minmaf option was used. When minmaf is specified the minor allele for a SNP must exceed minmaf to be test. Notice that only 5 SNPs are in the output data frame. This is because one of the SNPs has a minor allele frequency below 0.2.

lingweis1 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   minmaf = 0.2,
                   binary = FALSE)
#> [1] "200 subjects have complete data"

Linear Regression GWEIS
snp	betadg	lrtdg	betagxe	lrtgxe	lrt2df
1:10001	0.1605708	9.0532054	0.0777308	0.5402899	9.593495
1:10002	-0.0220380	0.1219035	0.1484303	1.3479068	1.469810
1:10004	0.0326160	0.2355224	0.1968781	2.0769149	2.312437
1:10005	0.0627953	1.2072107	-0.0442079	0.1401332	1.347344

If the user is interested in see what happened to SNPs that weren’t included in the data frame, the skipfile option can be used. The skipfile value is the name of a file to write the skipped SNPs to. The skipfile option can be used along with the outfile option. The skip file is in the same format as the output file. Below is an example using the skipfile option.

skipfile = tempfile()
lingweis2 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   skipfile = skipfile,
                   minmaf = 0.2,
                   binary = FALSE)
#> [1] "200 subjects have complete data"

Linear Regression GWEIS
snp	betadg	lrtdg	betagxe	lrtgxe	lrt2df
1:10001	0.1605708	9.0532054	0.0777308	0.5402899	9.593495
1:10002	-0.0220380	0.1219035	0.1484303	1.3479068	1.469810
1:10004	0.0326160	0.2355224	0.1968781	2.0769149	2.312437
1:10005	0.0627953	1.2072107	-0.0442079	0.1401332	1.347344

knitr::kable(skipsnps, caption = "Skipped SNPs")

Skipped SNPs
SNPID	reasondg	reasongxe
1:10003	2	2

The following table lists the reasons SNPs were skipped given the skipped value.

Skipped Reasons
code	reason
1	Excluded by user
2	Minor allele frequency below threshold
3	X matrix singular
4	Singular matrix updating beta top
5	Singular matrix updating beta bottom
6	Maximum iterations exceeded

Logistic Regression GWAS

In this example, the phenotype is coded (0,1). The gwas and gweis routines check for this an will run logistic regressions if the outcome is coded (0,1) unless binary is set to FALSE. If the use wants to make sure the outcome is coded (0,1), the user may set binary to TRUE. In this case, if the outcome is not coded (0,1) an error is produced.

The following model is fit when doing a logistic regression GWAS.

Model Fit
model	outcome	predictors	subjects
D\|E,G	Phenotype	All covariates, gene	All

loggwas1 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 blksize = 2,
                 binary = TRUE)
#> [1] "200 subjects have complete data"

Logistic Regression GWAS
snp	betag	lrtg
1:10001	0.7129	9.0230
1:10002	-0.0943	0.1219
1:10003	-0.3144	1.3777
1:10004	0.1397	0.2356
1:10005	0.2680	1.2003

In this example, the option blksize is used. When an analysis is run several SNPs are read in at one time. This saves disk time. The following are the default values for given the number of subjects. These values were chosen to keep the program running using less than 4GB of RAM. The user is allowed to specify a value up to twice the default value. Little performance gain is seen going with larger values. If the user enters 0 for blksize, the default value is used.

Default blksize
subjects	blksize
less than 10,000	5000
10,000 to 24,999	2000
25,000 to 49,999	1000
50,000 to 99,999	500
100,000 to 249,999	200
250,000 to 499,999	100
500,000 or greater	50

Logistic Regression GWEIS

A logistic regression GWEIS fits an additional 4 models that produce 7 more tests. 3 of these models use the the interaction covariate as the outcome. The following show all the models fit in a logistic regression GWEIS.

The following model is fit when doing a logistic regression GWAS.

Models Fit
model	outcome	predictors	subjects
D\|E,G	Phenotype	All covariates, gene	All
D\|E,G,GxE	Phenotype	All covariates, gene, gene x last covariate	All
E\|G	Last covariate	All other covariates, gene	All
E\|G,D=1, case only	Last covariate	All other covariates, gene	Cases
E\|G,D=0 control only	Last covariate	All other covariates, gene	Controls

Note: When doing a GWEIS the interaction covariate is in the last column of the data frame.

Logistic Regression GWEIS with Binary Covariate

In the example subject data, the covariate is coded (0,1). In this case, the gweis routine will use logistic regression to fit the last 3 models.

loggweis1 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   snps = 1:2,
                   binary = TRUE)
#> [1] "200 subjects have complete data"

Logistic Regression GWEIS
snp	betadg	lrtdg	betagxe	lrtgxe	lrt2df	betaeg	lrteg	lrt3df	betacase	lrtcase	betactrl	lrtctrl
1:10001	0.7129	9.0230	0.4931	0.9520	9.9750	0.4115	3.3606	13.3356	0.4290	2.0147	-0.0477	0.0147
1:10002	-0.0943	0.1219	0.6508	1.3569	1.4789	0.0798	0.0910	1.5699	0.3882	1.1018	-0.2562	0.3735

In this example the snps options was used. The snps option can either be a vector of indices indicating what SNPs to include or a list of SNPs by SNP ID. A vector of indices was used in this example.

Logistic Regression GWEIS with a Continuous Covariate

In the example subject data, the covariate is coded (0,1) which the gweis routine sees a binary covariate to make the routine do a linear regression 1 can be added to the interaction covariate. This will change the coding to (1,2) which the routine sees as a continuous covariate.

covdata2 <- covdata
covdata2$e <- covdata2$e + 1
loggweis2 <- gweis(data = covdata2,
                   bdinfo = bdinfo,
                   snps = c("1:10001", "1:10002"))
#> [1] "200 subjects have complete data"

Logistic Regression GWEIS
snp	betadg	lrtdg	betagxe	lrtgxe	lrt2df	betaeg	lrteg	lrt3df	betacase	lrtcase	betactrl	lrtctrl
1:10001	0.7129	9.0230	0.4931	0.9520	9.9750	0.1001	3.4031	13.3781	0.1044	2.0128	-0.0098	0.0147
1:10002	-0.0943	0.1219	0.6508	1.3569	1.4789	0.0194	0.0912	1.5700	0.0947	1.0967	-0.0516	0.3688

In this example the snps options was used with the SNP IDs.

Using GxEScanR

Introduction

Example Files