
Specify carrier status for given HLA allele(s) in UK Biobank
ukb_hla_specify.RdExtract per-individual carrier status (0/1/2 copies) for user-specified HLA allele queries
from the output of ukb_hla_typing().
Arguments
- res
Output of
ukb_hla_typing(), i.e.list(calls=..., genotype=...).- specify_alleles
Character vector of allele queries. Multiple formats are supported:
Family-level (2-digit) queries (matches any
xx:yywithin the family):"B27"or"HLA-B27"
Matches anyHLA-B*27:xx."B*27"or"B*27:"
Matches anyHLA-B*27:xx."DQB1*06"
Matches anyHLA-DQB1*06:xx.
Exact (4-digit) allele queries (matches a specific
xx:yyallele):"A*02:01"or"HLA-A*02:01"
Matches exactlyHLA-A*02:01."B*27:05"or"HLA-B*27:05"
Matches exactlyHLA-B*27:05.
UK Biobank header-style queries (column name in Field 22182 header):
"DRB1_1501","B_2705","C_401"
Converted internally toHLA-<locus>*xx:yy(e.g."B_2705"->"HLA-B*27:05"). Three-digit codes are left-padded to four digits (e.g."C_401"->"HLA-C*04:01").
Value
A table with one row per eid × query, containing:
eidIndividual identifier (if the eid has HLA data).queryThe original query string as provided inspecify_alleles.locusParsed locus (e.g.,"B","A","DRB1").copiesInteger 0/1/2 indicating the total number of matched allele copies.carrierLogical;TRUEifcopies > 0.matchedMatchedallele_prettyvalues (semicolon-separated);NAif non-carrier.max_qMaximum posterior (per-allele) among matched calls;NAif non-carrier.
, which is recommend to use code in the examples to convert to a wide format for downstream analysis.
Details
This function searches within res$calls (already filtered by the posterior threshold
in ukb_hla_typing()) and returns, for each query, the copy number carried by each individual.
The returned copy number (copies) is derived by summing allele_copies among the
matched calls in res$calls. Since res$calls has already been filtered by the
posterior threshold in ukb_hla_typing(), this function reports carriers based on those
retained (high-confidence) allele calls.
Examples
if (FALSE) { # \dontrun{
# `res` are from ukb_hla_typing()
# 1) Single query: family-level (2-digit)
res_s1 <- ukb_hla_specify(res, "B27")
# 2) Multiple queries: mixed formats
res_s2 <- ukb_hla_specify(res, c("B27", "A*02:01", "DRB1_1501"))
# 3) Queries with optional HLA- prefix and family-level with "*"
res_s3 <- ukb_hla_specify(res, c("HLA-B27", "DQB1*06"))
### Now normally we only need to know if one is a carrier for each allele
# Create a downstream-friendly wide table:
# eid, <query1>, <query1>_carrier, <query2>, <query2>_carrier, ...
carrier_wide <- res_s2 %>%
transmute(eid, query,
copies = as.integer(copies),
carrier01 = as.integer(carrier)) %>%
pivot_longer(c(copies, carrier01), names_to = "stat", values_to = "value") %>%
mutate(key = if_else(stat == "copies", query, paste0(query, "_carrier"))) %>%
select(eid, key, value) %>%
pivot_wider(names_from = key, values_from = value,
values_fill = list(value = 0L))
} # }