createMutabilityMatrix - Builds a mutability model
createMutabilityMatrix builds a 5-mer nucleotide mutability model by counting
the number of mutations occuring in the center position for all 5-mer motifs.
createMutabilityMatrix(db, substitutionModel, model = c("RS", "S"), sequenceColumn = "SEQUENCE_IMGT", germlineColumn = "GERMLINE_IMGT_D_MASK", vCallColumn = "V_CALL", multipleMutation = c("independent", "ignore"), minNumSeqMutations = 500, numSeqMutationsOnly = FALSE, returnSource = FALSE)
- data.frame containing sequence data.
- matrix of 5-mer substitution rates built by createSubstitutionMatrix.
- type of model to create. The default model, “RS”, creates a model by counting both replacement and silent mutations. The “S” specification builds a model by counting only silent mutations.
- name of the column containing IMGT-gapped sample sequences.
- name of the column containing IMGT-gapped germline sequences.
- name of the column containing the V-segment allele call.
- string specifying how to handle multiple mutations occuring
within the same 5-mer. If
"independent"then multiple mutations within the same 5-mer are counted indepedently. If
"ignore"then 5-mers with multiple mutations are excluded from the total mutation tally.
- minimum number of mutations in sequences containing each 5-mer
to compute the mutability rates. If the number is smaller
than this threshold, the mutability for the 5-mer will be
inferred. Default is 500. Not required if
TRUE, return only a vector counting the number of observed mutations in sequences containing each 5-mer. This option can be used for parameter tuning for
minNumSeqMutationsduring preliminary analysis using minNumSeqMutationsTune. Default is
- return the sources of 5-mer mutabilities (measured vs.
inferred). Default is
FALSE, a named numeric vector of 1024
normalized mutability rates for each 5-mer motif with names defining the 5-mer
TRUE, a named numeric
vector of length 1024 counting the number of observed mutations in sequences containing
Caution: The targeting model functions do NOT support ambiguous
characters in their inputs. You MUST make sure that your input and germline
sequences do NOT contain ambiguous characters (especially if they are
clonal consensuses returned from
- Yaari G, et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol. 2013 4(November):358.
# Subset example data to one isotype and sample as a demo data(ExampleDb, package="alakazam") db <- subset(ExampleDb, ISOTYPE == "IgA" & SAMPLE == "-1h") # Create model using only silent mutations sub_model <- createSubstitutionMatrix(db, model="S") mut_model <- createMutabilityMatrix(db, sub_model, model="S", minNumSeqMutations=200, numSeqMutationsOnly=FALSE)
Warning:Insufficient number of mutations to infer some 5-mers. Filled with 0.
# Count the number of mutations in sequences containing each 5-mer mut_count <- createMutabilityMatrix(db, sub_model, model="S", numSeqMutationsOnly=TRUE)