**createSubstitutionMatrix** - *Builds a substitution model*

## Description¶

`createSubstitutionMatrix`

builds a 5-mer nucleotide substitution model by counting
the number of substitution mutations occuring in the center position for all 5-mer
motifs.

## Usage¶

```
createSubstitutionMatrix(db, model = c("RS", "S"),
sequenceColumn = "SEQUENCE_IMGT", germlineColumn = "GERMLINE_IMGT_D_MASK",
vCallColumn = "V_CALL", multipleMutation = c("independent", "ignore"),
returnModel = c("5mer", "1mer", "1mer_raw"), minNumMutations = 50,
numMutationsOnly = FALSE)
```

## Arguments¶

- db
- data.frame containing sequence data.
- model
- type of model to create. The default model, “RS”, creates a model by counting both replacement and silent mutations. The “S” specification builds a model by counting only silent mutations.
- sequenceColumn
- name of the column containing IMGT-gapped sample sequences.
- germlineColumn
- name of the column containing IMGT-gapped germline sequences.
- vCallColumn
- name of the column containing the V-segment allele call.
- multipleMutation
- string specifying how to handle multiple mutations occuring
within the same 5-mer. If
`"independent"`

then multiple mutations within the same 5-mer are counted indepedently. If`"ignore"`

then 5-mers with multiple mutations are excluded from the total mutation tally. - returnModel
- string specifying what type of model to return; one of
`c("5mer", "1mer", "1mer_raw")`

. If`"5mer"`

(the default) then a 5-mer nucleotide context model is returned. If`"1mer"`

or`"1mer_raw"`

then a single nucleotide substitution matrix (no context) is returned; where`"1mer_raw"`

is the unnormalized version of the`"1mer"`

model. Note, neither 1-mer model may be used as input to createMutabilityMatrix. - minNumMutations
- minimum number of mutations required to compute the 5-mer
substitution rates. If the number of mutations for a 5-mer
is below this threshold, its substitution rates will be
estimated from neighboring 5-mers. Default is 50.
Not required if
`numMutationsOnly=TRUE`

. - numMutationsOnly
- when
`TRUE`

, return counting information on the number of mutations for each 5-mer, instead of building a substitution matrix. This option can be used for parameter tuning for`minNumMutations`

during preliminary analysis. Default is`FALSE`

. Only applies when`returnModel`

is set to`"5mer"`

. The`data.frame`

returned when this argument is`TRUE`

can serve as the input for minNumMutationsTune.

## Value¶

For `returnModel = "5mer"`

:

When `numMutationsOnly`

is `FALSE`

, a 4x1024 matrix of column
normalized substitution rates for each 5-mer motif with row names defining
the center nucleotide, one of `c("A", "C", "G", "T")`

, and column names
defining the 5-mer nucleotide sequence.

When `numMutationsOnly`

is
`TRUE`

, a 1024x4 data frame with each row providing information on
counting the number of mutations for a 5-mer. Columns are named
`fivemer.total`

, `fivemer.every`

, `inner3.total`

, and
`inner3.every`

, corresponding to, respectively,
the total number of mutations when counted as a 5-mer,
whether there is mutation to every other base when counted as a 5-mer,
the total number of mutations when counted as an inner 3-mer, and
whether there is mutation to every other base when counted as an inner 3-mer.

For `returnModel = "1mer"`

or `"1mer_raw"`

:
a 4x4 normalized or un-normalized 1-mer substitution matrix respectively.

## References¶

- Yaari G, et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol. 2013 4(November):358.

## Examples¶

```
# Subset example data to one isotype and sample as a demo
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, ISOTYPE == "IgA" & SAMPLE == "-1h")
# Count the number of mutations per 5-mer
subCount <- createSubstitutionMatrix(db, model="S", multipleMutation="independent",
returnModel="5mer", numMutationsOnly=TRUE)
# Create model using only silent mutations
sub <- createSubstitutionMatrix(db, model="S", multipleMutation="independent",
returnModel="5mer", numMutationsOnly=FALSE,
minNumMutations=20)
```

## See also¶

extendSubstitutionMatrix, createMutabilityMatrix, createTargetingMatrix, createTargetingModel, minNumMutationsTune.