Filter Chromosomes Based on SNP P-value Threshold for .qtltoolsnomi files
filter_chr_basedonSNP_p_qtltools.RdThis function adds an index column to the input data frame and filters chromosomes based on whether any SNP within the chromosome crosses a specified threshold. If a chromosome has at least one SNP that meets the threshold, all SNPs on that chromosome are retained. Otherwise, all SNPs on that chromosome are removed.
Usage
filter_chr_basedonSNP_p_qtltools(
df,
chr_col = "CHR",
snp_col = "Variant_ID",
gene_col = "Gene",
p_val_col = "nominal_P_value",
threshold = 0.00157
)Arguments
- df
A data frame containing SNP data.
- chr_col
Character string specifying the name of the chromosome column. Default is
"CHR".- snp_col
Character string specifying the name of the SNP identifier column. Default is
"Variant_ID".- gene_col
Character string specifying the name of the gene column. Default is
"Gene".- p_val_col
Character string specifying the name of the p-value column. Default is
"nominal_P_value".- threshold
Numeric value specifying the p-value threshold. Default is
1.57e-3(Thresh hold for the HEIDI test).
Examples
library(dplyr)
eqtl_data <- data.frame(
Gene = c("G1", "G1", "G2", "G2", "G3"),
CHR = c("1", "1", "2", "2", "3"),
Variant_ID = c("rs1", "rs2", "rs3", "rs4", "rs5"),
nominal_P_value = c(1e-9, 0.05, 0.2, 1e-7, 0.3)
);eqtl_data
#> Gene CHR Variant_ID nominal_P_value
#> 1 G1 1 rs1 1e-09
#> 2 G1 1 rs2 5e-02
#> 3 G2 2 rs3 2e-01
#> 4 G2 2 rs4 1e-07
#> 5 G3 3 rs5 3e-01
df_filtered <- filter_chr_basedonSNP_p_qtltools(
df = eqtl_data,
chr_col = "CHR",
snp_col = "Variant_ID",
p_val_col = "nominal_P_value",
threshold = 5e-8
);df_filtered
#> # A tibble: 2 × 4
#> Gene CHR Variant_ID nominal_P_value
#> <chr> <chr> <chr> <dbl>
#> 1 G1 1 rs1 0.000000001
#> 2 G1 1 rs2 0.05