consensusSequence - Construct a consensus sequence
Description¶
Construct a consensus sequence
Usage¶
consensusSequence(
sequences,
db = NULL,
method = c("mostCommon", "thresholdedFreq", "catchAll", "mostMutated", "leastMutated"),
minFreq = NULL,
muFreqColumn = NULL,
lenLimit = NULL,
includeAmbiguous = FALSE,
breakTiesStochastic = FALSE,
breakTiesByColumns = NULL
)
Arguments¶
- sequences
- character vector of sequences.
- db
data.frame
containing sequence data for a single clone. Applicable to and required for the"mostMutated"
and"leastMutated"
methods. Default isNULL
.- method
- method to calculate consensus sequence. One of
"thresholdedFreq"
,"mostCommon"
,"catchAll"
,"mostMutated"
, or"leastMutated"
. See “Methods” under collapseClones for details. - minFreq
- frequency threshold for calculating input consensus sequence.
Applicable to and required for the
"thresholdedFreq"
method. A canonical choice is 0.6. Default isNULL
. - muFreqColumn
character
name of the column in db containing mutation frequency. Applicable to and required for the"mostMutated"
and"leastMutated"
methods. Default isNULL
.- lenLimit
- limit on consensus length. if
NULL
then no length limit is set. - includeAmbiguous
- whether to use ambiguous characters to represent positions at
which there are multiple characters with frequencies that are at least
minimumFrequency
or that are maximal (i.e. ties). Applicable to and required for the"thresholdedFreq"
and"mostCommon"
methods. Default isFALSE
. See “Choosing ambiguous characters” under collapseClones for rules on choosing ambiguous characters. Note: this argument refers to the use of ambiguous nucleotides in the output consensus sequence. Ambiguous nucleotides in the input sequences are allowed for methods catchAll, mostMutated and leastMutated. - breakTiesStochastic
- In case of ties, whether to randomly pick a sequence from sequences that
fulfill the criteria as consensus. Applicable to and required for all methods
except for
"catchAll"
. Default isFALSE
. See “Methods” under collapseClones for details. - breakTiesByColumns
- A list of the form
list(c(col_1, col_2, ...), c(fun_1, fun_2, ...))
, wherecol_i
is acharacter
name of a column indb
, andfun_i
is a function to be applied on that column. Currently, onlymax
andmin
are supported. Note that the twoc()
’s inlist()
are essential (i.e. if there is only 1 column, the list should be of the formlist(c(col_1), c(func_1))
. Applicable to and optional for the"mostMutated"
and"leastMutated"
methods. If supplied,fun_i
’s are applied oncol_i
’s to help break ties. Default isNULL
. See “Methods” under collapseClones for details.
Value¶
A list containing cons
, which is a character string that is the consensus sequence
for sequences
; and muFreq
, which is the maximal/minimal mutation frequency of
the consensus sequence for the "mostMutated"
and "leastMutated"
methods, or
NULL
for all other methods.
Details¶
See collapseClones for detailed documentation on methods and additional parameters.
Examples¶
# Subset example data
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, c_call %in% c("IGHA", "IGHG") & sample_id == "+7d")
clone <- subset(db, clone_id == "3192")
# First compute mutation frequency for most/leastMutated methods
clone <- observedMutations(clone, frequency=TRUE, combine=TRUE)
# Manually create a tie
clone <- rbind(clone, clone[which.max(clone$mu_freq), ])
# ThresholdedFreq method.
# Resolve ties deterministically without using ambiguous characters
cons1 <- consensusSequence(clone$sequence_alignment,
method="thresholdedFreq", minFreq=0.3,
includeAmbiguous=FALSE,
breakTiesStochastic=FALSE)
cons1$cons
[1] "GAGGTGCAGCTGGTGGTCTCTGGGGGA...GGCTTGGTACAGCCAGGGCGGTCCCTAAGACTCTCCTGTACAGTTTCTGGATTCACCTTT............GGTGATTATGCTATGACGTGGATCCGCCAGGCTCCTGGGAAGGGGCTGGAGTGGGTCGGTTTCATTAGAAGCAAAACTTTTGGTGGGACAGCAGATTACGCCGCGTTTGTGAGA...GGCAGATTCACCATCTCAAGAGATGATTCCAAAAACATCGCCTATCTGCAATTGAACAGCCTGAAAACCGAGGACACAGGCGTCTATTACTGTGGTAGAGATCTCGCCGTAACTGACACAATAGGTGGTACTAACTGGTTCGACCCCTGGGGCCAGGGGACCCCGGTCACCGTCTCCTCAG"