expectedMutations - Calculate expected mutation frequencies

Description

expectedMutations calculates the expected mutation frequencies for each sequence in the input data.frame.

Usage

expectedMutations(
db,
sequenceColumn = "sequence_alignment",
germlineColumn = "germline_alignment",
targetingModel = HH_S5F,
regionDefinition = NULL,
mutationDefinition = NULL,
nproc = 1,
cloneColumn = "clone_id",
juncLengthColumn = "junction_length"
)

Arguments

db
data.frame containing sequence data.
sequenceColumn
character name of the column containing input sequences.
germlineColumn
character name of the column containing the germline or reference sequence.
targetingModel
TargetingModel object. Default is HH_S5F.
regionDefinition
RegionDefinition object defining the regions and boundaries of the Ig sequences. To use regions definitions, sequences in sequenceColumn and germlineColumn must be aligned, following the IMGT schema.
mutationDefinition
MutationDefinition object defining replacement and silent mutation criteria. If NULL then replacement and silent are determined by exact amino acid identity.
nproc
numeric number of cores to distribute the operation over. If the cluster has already been set the call function with nproc = 0 to not reset or reinitialize. Default is nproc = 1.
cloneColumn
clone id column name in db
juncLengthColumn
junction length column name in db

Value

A modified db data.frame with expected mutation frequencies for each region defined in regionDefinition.

The columns names are dynamically created based on the regions in
regionDefinition. For example, when using the IMGT_V definition, which defines positions for CDR and FWR, the following columns are added:

  • mu_expected_cdr_r: number of replacement mutations in CDR1 and CDR2 of the V-segment.
  • mu_expected_cdr_s: number of silent mutations in CDR1 and CDR2 of the V-segment.
  • mu_expected_fwr_r: number of replacement mutations in FWR1, FWR2 and FWR3 of the V-segment.
  • mu_expected_fwr_s: number of silent mutations in FWR1, FWR2 and FWR3 of the V-segment.

Details

Only the part of the sequences defined in regionDefinition are analyzed. For example, when using the IMGT_V definition, mutations in positions beyond 312 will be ignored.

Examples

# Subset example data
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, c_call %in% c("IGHA", "IGHG") & sample_id == "+7d")
set.seed(112)
db <- dplyr::slice_sample(db, n=100)
# Calculate expected mutations over V region
db_exp <- expectedMutations(db,
sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment_d_mask",
regionDefinition=IMGT_V,
nproc=1)

# Calculate hydropathy expected mutations over V region
db_exp <- expectedMutations(db,
sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment_d_mask",
regionDefinition=IMGT_V,
mutationDefinition=HYDROPATHY_MUTATIONS,
nproc=1)

See also

calcExpectedMutations is called by this function to calculate the expected mutation frequencies. See observedMutations for getting observed mutation counts. See IMGT_SCHEMES for a set of predefined RegionDefinition objects.