Skip to contents

This function adds an index column to the input data frame and filters chromosomes based on whether any SNP within the chromosome crosses a specified threshold. If a chromosome has at least one SNP that meets the threshold, all SNPs on that chromosome are retained. Otherwise, all SNPs on that chromosome are removed.

Usage

filter_chr_basedonSNP_p_qtltools(
  df,
  chr_col = "CHR",
  snp_col = "Variant_ID",
  gene_col = "Gene",
  p_val_col = "nominal_P_value",
  threshold = 0.00157
)

Arguments

df

A data frame containing SNP data.

chr_col

Character string specifying the name of the chromosome column. Default is "CHR".

snp_col

Character string specifying the name of the SNP identifier column. Default is "Variant_ID".

gene_col

Character string specifying the name of the gene column. Default is "Gene".

p_val_col

Character string specifying the name of the p-value column. Default is "nominal_P_value".

threshold

Numeric value specifying the p-value threshold. Default is 1.57e-3 (Thresh hold for the HEIDI test).

Value

A filtered data frame with an added index column.

Examples

library(dplyr)
eqtl_data <- data.frame(
Gene = c("G1", "G1", "G2", "G2", "G3"),
CHR = c("1", "1", "2", "2", "3"),
Variant_ID = c("rs1", "rs2", "rs3", "rs4", "rs5"),
nominal_P_value = c(1e-9, 0.05, 0.2, 1e-7, 0.3)
);eqtl_data
#>   Gene CHR Variant_ID nominal_P_value
#> 1   G1   1        rs1           1e-09
#> 2   G1   1        rs2           5e-02
#> 3   G2   2        rs3           2e-01
#> 4   G2   2        rs4           1e-07
#> 5   G3   3        rs5           3e-01
df_filtered <- filter_chr_basedonSNP_p_qtltools(
  df = eqtl_data,
  chr_col = "CHR",
  snp_col = "Variant_ID",
  p_val_col = "nominal_P_value",
  threshold = 5e-8
);df_filtered
#> # A tibble: 2 × 4
#>   Gene  CHR   Variant_ID nominal_P_value
#>   <chr> <chr> <chr>                <dbl>
#> 1 G1    1     rs1            0.000000001
#> 2 G1    1     rs2            0.05