calcBaseline - Calculate the BASELINe PDFs
calcBaseline calculates the BASELINe posterior probability density
functions (PDFs) for sequences in the given Change-O
calcBaseline(db, sequenceColumn = "CLONAL_SEQUENCE", germlineColumn = "CLONAL_GERMLINE", testStatistic = c("local", "focused", "imbalanced"), regionDefinition = NULL, targetingModel = HH_S5F, mutationDefinition = NULL, calcStats = FALSE, nproc = 1)
data.framecontaining sequence data and annotations.
charactername of the column in
dbcontaining input sequences.
charactername of the column in
dbcontaining germline sequences.
characterindicating the statistical framework used to test for selection. One of
c("local", "focused", "imbalanced").
- RegionDefinition object defining the regions and boundaries of the Ig sequences.
- TargetingModel object. Default is HH_S5F.
- MutationDefinition object defining replacement
and silent mutation criteria. If
NULLthen replacement and silent are determined by exact amino acid identity. Note, if the input data.frame already contains observed and expected mutation frequency columns then mutations will not be recalculated and this argument will be ignored.
logicalindicating whether or not to calculate the summary statistics
data.framestored in the
statsslot of a Baseline object.
- number of cores to distribute the operation over. If
clusterhas already been set and will not be reset.
Baseline object containing the modified
db and BASELINe
posterior probability density functions (PDF) for each of the sequences.
Calculates the BASELINe posterior probability density function (PDF) for
sequences in the provided
Note: Individual sequences within clonal groups are not, strictly speaking,
independent events and it is generally appropriate to only analyze selection
pressures on an effective sequence for each clonal group. For this reason,
it is strongly recommended that the input
db contains one effective
sequence per clone. Effective clonal sequences can be obtained by calling
the collapseClones function.
db does not contain the
required columns to calculate the PDFs (namely MU_COUNT & MU_EXPECTED)
then the function will:
- Calculate the numbers of observed mutations.
- Calculate the expected frequencies of mutations and modify the provided
db. The modified
dbwill be included as part of the returned
testStatistic indicates the statistical framework used to test for selection.
local= CDR_R / (CDR_R + CDR_S).
focused= CDR_R / (CDR_R + CDR_S + FWR_S).
imbalanced= CDR_R + CDR_S / (CDR_R + CDR_S + FWR_S + FRW_R).
regionDefinition must only contain two regions. If more
than two regions are defined the
local test statistic will be used.
For further information on the frame of these tests see Uduman et al. (2011).
- Hershberg U, et al. Improved methods for detecting selection by mutation analysis of Ig V region sequences. Int Immunol. 2008 20(5):683-94.
- Uduman M, et al. Detecting selection in immunoglobulin sequences. Nucleic Acids Res. 2011 39(Web Server issue):W499-504.
- Yaari G, et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol. 2013 4(November):358.
# Load and subset example data data(ExampleDb, package="alakazam") db <- subset(ExampleDb, ISOTYPE == "IgG" & SAMPLE == "+7d") # Collapse clones db <- collapseClones(db, sequenceColumn="SEQUENCE_IMGT", germlineColumn="GERMLINE_IMGT_D_MASK", method="thresholdedFreq", minimumFrequency=0.6, includeAmbiguous=FALSE, breakTiesStochastic=FALSE) # Calculate BASELINe baseline <- calcBaseline(db, sequenceColumn="CLONAL_SEQUENCE", germlineColumn="CLONAL_GERMLINE", testStatistic="focused", regionDefinition=IMGT_V, targetingModel=HH_S5F, nproc=1)
Calculating the expected frequencies of mutations... Calculating BASELINe probability density functions...