Version 1.1.1: May 23, 2022¶
General:
- Removed dependency: kedd. The CRAN kedd package (by Arsalane Chouaib Guidoum) has been scheduled for archival on 2022-05-25. We have adapted the functions used by shazam and removed the dependency.
New feature:
- Added the function
convertNumbering
to convert between numbering systems (IMGT, Kabat).
Mutation Profiling:
-
shmulateTree
has new argumentnproc
to specify the number of cores. Default valuesmutThresh
andwindowSize
have been set tomutThresh=6
andwindowSize=10
. -
Added the option
plotFiltered=NULL
toslideWindowTunePlot
. -
Fixed a bug in
listObservedMutations
not returning a list whendb
had one sequence with one mutation. -
Fixed bars shifted in
plotMutability
.
Version 1.1.0: July 8, 2021¶
General:
- Updated dependencies to alakazam >= 1.1.0 and ggplot2 >= 3.3.4.
Selection Analysis:
observedMutations
,expectedMutations
, andcalcBaseline
can analyze mutations in all regions (CDR1, CDR2, CDR3, FWR1, FWR2, FWR3 and FWR4) by specifyingregionDefinition=IMGT_VDJ
orregionDefinition=IMGT_VDJ_BY_REGIONS
.- Added the function
setRegionBoundaries
to build sequence-specificRegionDefinition
objects extending to CDR3 and FWR4. - Added the function
makeGraphDf
to facilitate mutational analysis on lineage trees.
Distance Profiling:
- Fixed a bug in
distToNearest
where TRB and TRD sequences where ignored in distance calculation. - Fixed a bug in
distToNearest
causing a fatal error whencross
was set. - Fixed a bug in
nearestDist
causing a fatal error when usingmodel="aa"
andcrossGroups
.
Targeting Models:
- Fixed an incompatibility with newer versions of ggplot2 in
plotMutability
.
Version 1.0.2: August 10, 2020¶
Mutation Profiling:
- Fixed a bug in
observedMutations
andcalcObservedMutations
causing mutation counting to fail when there are gap (-
) characters in the germline sequence.
Targeting Models:
- Fixed a bug in
createTargetingModel
causing empty counts in thenumMutS
andnumMutR
slots.
Version 1.0.1: July 18, 2020¶
Distance Profiling:
- Added support for TCR genes to
distToNearest
. - Renamed the
groupUsingOnlyIGH
argument ofdistToNearest
toonlyHeavy
.
Version 1.0.0 May 9, 2020¶
Backwards Incompatible Changes:
- Changed default expected data format from the Change-O data format to the
AIRR Rearrangement standard. For example: where functions used the column
name
V_CALL
(Change-O) as the default to identify the field that stored the V gene calls, they now usev_call
(AIRR). That means, scripts that relied on default values (previously,v_call="V_CALL"
), will now fail if calls to the functions are not updated to reflect the correct value for the data. If data are in the Change-O format, the current default valuev_call="v_call"
will fail to identify the column with the V gene calls as the columnv_call
doesn’t exist. In this case,v_call="V_CALL"
needs to be specified in the function call. ExampleDb
converted to the AIRR Rearrangement standard and examples updated accordingly.- For consistency with the style of the new data format default, other field
names have been updated to use the same capitalization. This change affects:
- Region definitions. For example, the
labels
slot ofIMGT_V
has changed fromCDR_R
,CDR_S
,FWR_R
andFWR_S
tocdr_r
,cdr_s
,fwr_r
andfwr_s
, respectively. - Mutations in
CODON_TABLE
and the differentMUTATION_SCHEMES
change fromR
,S
andStop
tor
,s
andstop
, respectively. - Mutation profiling function output columns. For example, from
MU_COUNT_SEQ
tomu_count_seq
. calcBaseline
and related function output columns and S4 object slots. For example, fromPVALUE
,REGION
andBASELINE_CI_PVALUE
topvalue
,region
andbaseline_ci_pvalue
, respectively.
- Region definitions. For example, the
- Model names used by
createSubstitutionMatrix
,createMutabilityMatrix
andcreateTargetingModel
, changed frommodel=c("S","RS")
tomodel=c("s","rs")
.
General:
- License changed to AGPL-3.
Targeting Models:
createMutabilityMatrix
,extendMutabilityMatrix
,createTargetingMatrix
, andcreateTargetingModel
now also returns the numbers of silent and replacement mutations used for estimating the 5-mer mutabilities. These numbers are recorded in thenumMutS
andnumMutR
slots in the newly definedMutabilityModel
,MutabilityModelWithSource
, andTargetingMatrix
classes.
Mutation Profiling:
shmulateSeq
now also supports specifying the frequency of mutations to be introduced. (Previously, only the number of mutations was supported.)
Version 0.2.3 February 5, 2020¶
General:
- Removed SDMTools dependency.
Version 0.2.2 December 15, 2019¶
General:
- Fixed an incompatibility with R 4.0 matrix changes.
Version 0.2.1 July 19, 2019¶
Distance Calculation:
- Fixed a bug in
distToNearest
that could potentially cause sequences from different partitions to be used for distance calculation.
Version 0.2.0 July 18, 2019¶
General:
- Upgraded to alakazam >= 0.3.0 and dplyr >= 0.8.1.
Distance Calculation:
- Fixed a bug in
plotDensityThreshold
for negative densities. - Fixed a bug in
distToNearest
for performing subsampling while calculating cross-group nearest neighbor distances. - For partitioning sequences,
distToNearest
now supports, via a new argumentVJthenLen
, either a 2-stage partitioning (first by V gene and J gene, then by junction length), or a 1-stage partitioning (simultaneously by V gene, J gene, and junction length). For 1-stage partitioning,distToNearest
supports export of the partitioning information as a new column viakeepVJLgroup
. distToNearest
now supports single-cell input data with the addition of new argumentscellIdColumn
,locusColumn
, andgroupUsingOnlyIGH
.
Mutation Profiling:
shmulateTree
has new arguments,start
andend
, to specify the region in the sequence where mutations can be introduced.
Selection Analysis:
- Added the function
consensusSequence
which can be used to build a consensus sequence using a variety of methods.
Version 0.1.11: January 27, 2019¶
General:
- Fixed a bug in the prototype declarations for the
TargetingModel
andRegionDefinition
S4 classes.
Version 0.1.10: September 19, 2018¶
General:
- Added
subsample
argument todistToNearest
function. - Removed some internal utility functions in favor of importing them from
alakazam
. Specifically,progressBar
,getBaseTheme
andcheckColumns
. - Removed
clearConsole
,getnproc
, andgetPlatform
functions.
Distance Calculation:
- Changed default
findThreshold
method todensity
. - Significantly reduced run time of the
density
method by retuning the bandwidth detection process. Thedensity
method should now also yield more consistent thresholds, on average. - The
subsample
argument tofindThreshold
now applies to both thedensity
andgmm
methods. Subsampling of distance is not performed by default. - Fixed a bug in
plotDensityThreshold
andplotGmmThreshold
wherein thebreaks
argument was ignored when specifyingxmax
and/orxmin
.
Selection Analysis:
- Fixed a plotting bug in
plotBaselineDensity
arising when thegroupColumn
andidColumn
arguments were set to the same column. - Added the
sizeElement
argument toplotBaselineDensity
to control line size - Renamed the
field_name
argument tofield
ineditBaseline
.
Version 0.1.9: March 30, 2018¶
Selection Analysis:
- Fixed a bug in
plotBaselineDensity
which caused an empty plot to be generated if there was only a single value in theidColumn
. - Fixed a bug in
calcBaseline
which caused a crash insummarizeBaseline
andgroupBaseline
when inputbaseline
is based on only 1 sequence (i.e. whennrow(baseline@db)
is 1). - Set default
plot
call on aBaseline
object toplotBaselineDensity
. - Removed
getBaselineStats
function. - Added a
summary
method forBaseline
objects that callssummarizeBaseline
and returns a data.frame.
Mutation Profiling:
- Fixed a bug in
shmulateSeq
which caused a crash when the input sequence contains gaps (.
). - Renamed the argument
mutations
inshmulateSeq
tonumMutations
. - Improved help documentation for
shmulateSeq
andshmulateTree
. - Added vignette for simulating mutated sequences.
calcExpectedMutations
will now treat non-ACTG characters as Ns rather than produce an error.- Added two new
RegionDefinition
objects for the full V segment as single region (IMGT_V_BY_SEGMENTS
) and the V segment with each codon as a separate region (IMGT_V_BY_CODONS
).
Targeting Models:
- Added the
calculateMutability
function which computes the aggregate mutability for sequences. - Fixed a bug that caused
createSubstitutionMatrix
to fail for data containing only a single V family. - Changed the default model to silent mutations only (
model="S"
) increateSubstitutionMatrix
,createSubstitutionMatrix
andcreateTargetingModel
- Set default
plot
call on aTargetingModel
object toplotMutability
.
Version 0.1.8: June 30, 2017¶
General:
- Corrected several functions so that they accept both tibbles and data.frames.
Distance Calculation:
- Adding new fitting procedures to the
"gmm"
method offindThreshold()
that allows users to choose a mixture of two univariate density distribution functions among four available combinations:"norm-norm"
,"norm-gamma"
,
"gamma-norm"
, or"gamma-gamma"
. - Added the ability to choose the threshold selection criteria in the
"gmm"
method offindThreshold()
from the best average sensitivity and specificity, the curve intersection or user defined sensitivity or specificity. - Renamed the
cutEdge
argument offindThreshold()
toedge
.
Mutation Profiling:
- Redesigned
collapseClones()
, adding various deterministic and stochastic methods to obtain effective clonal sequences, support for including ambiguous IUPAC characters in output, as well as extensive documentation. RemovedcalcClonalConsensus()
from exported functions. - Added support for including ambiguous IUPAC characters in input for
observedMutations()
andcalcObservedMutations()
. - Fixed a minor bug in calculating the denominator for mutation frequency in
calcObservedMutations()
for sequences with non-triplet overhang at the tail. - Renamed column names of observed mutations (previously
OBSERVED
) and expected mutations (previouslyEXPECTED
) returned byobservedMutations()
andexpectedMutations()
toMU_COUNT
andMU_EXPECTED
respectively.
Selection Analysis:
calcBaseline()
no longer callscollapseClones()
automatically if aCLONE
column is present. As indicated by the documentation forcalcBaseline()
users are advised to obtain effective clonal sequences (for example, callingcollapseClones()
) before runningcalcBaseline()
.- Updated vignette to reflect changes in
calcBaseline()
.
Version 0.1.7: May 14, 2017¶
Mutation Profiling:
- Fixed a bug in
collapseClones()
that prevented it from running whennproc
is greater than 1.
Version 0.1.6: May 12, 2017¶
General:
- Internal changes for compatibility with dplyr v0.6.0.
- Removed data.table dependency.
Mutation Profiling:
- Fixed a bug in
collapseClones()
that resulted in erroneousCLONAL_SEQUENCE
andCLONAL_GERMLINE
being returned. - Added a vignette describing basic mutational analysis.
- Remove console notification that
observedMutations
was running.
Version 0.1.5: March 23, 2017¶
General:
- License changed to Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Selection Analysis:
- Fixed a bug in p-value calculation in
summarizeBaseline()
. The returned p-value can now be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the seLicense chalection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection. - Added
editBaseline()
to exported functions, and a corresponding section in the vignette. - Fixed a bug in counting the total number of observed mutations when performing
a local test for codon-by-codon selection analysis in
calcBaseline()
.
Targeting Models:
- Added
numMutationsOnly
argument tocreateSubstitutionMatrix()
, enabling parameter tuning forminNumMutations
. - Added functions
minNumMutationsTune()
andminNumSeqMutationsTune()
to tune for parametersminNumMutations
andminNumSeqMutations
in functionscreateSubstitutionMatrix()
andcreateMutabilityMatrix()
respectively. Also added functionplotTune()
which helps visualize parameter tuning using the abovementioned two new functions. - Added human kappa and lambda light chain, silent, 5-mer, functional targeting
model (
HKL_S5F
). - Renamed
HS5FModel
asHH_S5F
,MRS5NFModel
asMK_RS5NF
, andU5NModel
asU5N
. - Added human heavy chain, silent, 1-mer, functional substitution model (
HH_S1F
), human kappa and lambda light chain, silent, 1-mer, functional substitution model (HKL_S1F
), and mouse kappa light chain, replacement and silent, 1-mer, non-functional substitution model (MK_RS1NF
). - Added
makeDegenerate5merSub
andmakeDegenerate5merMut
which make degenerate 5-mer substitution and mutability models respectively based on the 1-mer models. Also addedmakeAverage1merSub
andmakeAverage1merMut
which make 1-mer substitution and mutability models respectively by averaging over the 5-mer models.
Mutation Profiling:
- Added
returnRaw
argument tocalcObservedMutations()
, which if true returns the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations (hence “raw”). - Added new functions
slideWindowSeq()
andslideWindowDb()
which implement a sliding window approach towards filtering a single sequence or sequences in a data.frame which contain(s) equal to or more than a given number of mutations in a given number of consecutive nucleotides. - Added new function
slideWindowTune()
which allows for parameter tuning for usingslideWindowSeq()
andslideWindowDb()
. - Added new function
slideWindowTunePlot()
which visualizes parameter tuning byslideWindowTune()
.
Distance Calculation:
- Fixed a bug in
distToNearest
whereinnormalize="length"
for 5-mer models was resulting in distances normalized by junction length squared instead of raw junction length. - Fixed a bug in
distToNearest
whereinsymmetry="min"
was calculating the minimum of the total distance between two sequences instead of the minimum distance at each mutated position. - Added
findThreshold
function to infer clonal distance threshold from nearest neighbor distances returned bydistToNearest
. - Renamed the
length
option for thenormalize
argument ofdistToNearest
tolen
so it matches Change-O. - Deprecated the
HS1FDistance
andM1NDistance
distance models, which have been renamed tohs1f_compat
andm1n_compat
in themodel
argument ofdistToNearest
. These deprecated models should be used for compatibility with DefineClones in Change-O v0.3.3. These models have been replaced by replaced byhh_s1f
andmk_rs1nf
, which are supported by Change-O v0.3.4. - Renamed the
hs5f
model indistToNearest
tohh_s5f
. - Added support for
MK_RS5NF
models todistToNearest
. - Updated
calcTargetingDistance()
to enable calculation of a symmetric distance matrix given a 1-mer substitution matrix normalized by row, such asHH_S1F
. - Added a Gaussian mixture model (GMM) approach for threshold determination to
findThreshold
. The previous smoothed density method is available via themethod="density"
argument and the new GMM method is available viamethod="gmm"
. - Added the functions
plotGmmThreshold
andplotDensityThreshold
to plot the threshold detection results fromfindThreshold
for the"gmm"
and"density"
methods, respectively.
Region Definition:
- Deleted
IMGT_V_NO_CDR3
andIMGT_V_BY_REGIONS_NO_CDR3
. UpdatedIMGT_V
andIMGT_V_BY_REGIONS
so that neither includes CDR3 now.
Version 0.1.4: August 5, 2016¶
Selection Analysis:
- Fixed a bug in calcBaseline wherein the germline column was incorrected hardcoded, leading to erroneous mutation counts for some clonal consensus sequences.
Targeting Models:
- Added
numSeqMutationsOnly
argument tocreateMutabilityMatrix()
, enabling parameter tuning forminNumSeqMutations
.
Version 0.1.3: July 31, 2016¶
General:
- Added ape and igraph dependency
- Removed the
InfluenzaDb
data object, in favor of the updatedExampleDb
provided in alakazam 0.2.4. - Added conversion of sequence to uppercase for several functions to support data that was not generated via Change-O.
Distance Calculation:
- Added the
cross
argument todistToNearest()
which allows restriction of distances to only distances across samples (ie, excludes within-sample distances). - Added
mst
flag todistToNearest()
, which will return all distances to neighboring nodes in a minimum spanning tree. - Updated single nucleotide distance models to use the new C++ distance methods in alakazam 0.2.4 for better performance.
- Fixed a bug leading to failed distance calculations for the
aa
model ofdistToNearest()
. - Fixed a bug wherein gap characters where being translated into Ns (Asn)
rather than Xs within the
aa
model ofdistToNearest()
.
Mutation Profiling:
- Added the
MutationDefinition
VOLUME_MUTATIONS
. - Added the functions
shmulateSeq()
andshmulateTree()
to simulate mutations on sequences and lineage trees, respectively, using a 5-mer targeting model. - Renamed
collapseByClone
,calcDbExpectedMutations
andcalcDbObservedMutations
tocollapseClones
,expectedMutations
, andobservedMutations
, respectively.
Selection Analysis:
- Fixed a bug wherein passing a
Baseline
object throughgroupBaseline()
multiple times resulted in incorrect normalization. - Added
title
options toplotBaselineSummary()
andplotBaselineDensity()
. - Added more control over colors and group ordering to
plotBaselineSummary()
andplotBaselineDensity()
. - Added the
testBaseline()
function to test the significance of differences between two selection distributions. - Improved selection analysis vignette.
Version 0.1.2: February 20, 2016¶
General:
- Renamed package from shm to shazam.
- Internal changes to conform to CRAN policies.
- Compressed and moved example database to the data object
InfluenzaDb
. - Fixed several bugs where functions would not work properly when passed
a
dplyr::tbl_df
object instead of adata.frame
. - Changed R dependency to R >= 3.1.2.
- Added stringi dependency.
Distance Calculation:
- Fixed a bug wherein
distToNearest()
did not return the nearest neighbor with a non-zero distance.
Targeting Models:
- Performance improvements to
createSubstitutionMatrix()
,
createMutabilityMatrix()
, andplotMutability()
. - Modified color scheme in
plotMutability()
. - Fixed errors in the targeting models vignette.
Mutation Profiling:
- Added the
MutationDefinition
objectsMUTATIONS_CHARGE
,MUTATIONS_HYDROPATHY
,MUTATIONS_POLARITY
providing alternate approaches to defining replacement and silent annotations to mutations when callingcalcDBObservedMutations()
andcalcDBExpectedMutations()
. - Fixed a few bugs where column names, region definitions or mutation models were not being recognized properly when non-default values were used.
- Made the behavior of
regionDefinition=NULL
consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly. calcDBObservedMutations()
returns R and S mutations also whenregionDefinition=NULL
. Older versions reported the sum of R and S mutations. The function will add the columnsOBSERVED_SEQ_R
andOBSERVED_SEQ_S
whenfrequency=FALSE
, andMU_FREQ_SEQ_R
andMU_FREQ_SEQ_R
whenfrequency=TRUE
.
Version 0.1.1: December 18, 2015¶
General:
- Swapped dependency on doSNOW for doParallel.
- Swapped dependency on plyr for dplyr.
- Swapped dependency on reshape2 for tidyr.
- Documentation clean up.
Distance Calculation:
- Changed underlying method of calcTargetingDistance to be negative log10 of the probability that is then centered at one by dividing by the mean distance.
- Added
symmetry
parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B. - Updated error handling in distToNearest to issue warning when unrecognized character is in the sequence and return an NA.
- Fixed bug in ‘aa’ model in distToNearest that was calculating distance incorrectly when normalizing by length.
- Changed behavior to return nearest nonzero distance neighbor.
Mutation Profiling:
- Renamed calcDBClonalConsensus to collapseByClone Also, renamed argument collapseByClone to expandedDb.
- Fixed a (major) bug in calcExpectedMutations. Previously, the targeting calculation was incorrect and resulted in incorrect expected mutation frequencies. Note, that this also resulted in incorrect BASELINe Selection (Sigma) values.
- Changed denominator in calcObservedMutations to be based on informative (unambiguous) positions only.
- Added nonTerminalOnly parameter to calcDBClonalConsensus indicating whether to consider mutations at leaves or not (defaults to false).
Selection Analysis:
- Updated groupBaseline. Now when regrouping a Baseline object (i.e. grouping previously grouped PDFs) weighted convolution is performed.
- Added “imbalance” test statistic to the Baseline selection calculation.
- Extended the Baseline Object to include binomK, binomN and binomP Similar to numbOfSeqs, each of these are a matrix. They contain binomial inputs for each sequence and region.
Targeting Models:
- Added
minNumMutations
parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers. - Added
minNumSeqMutations
parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred. - Added
returnModel
parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model. - Added
returnSource
parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred. - In createSubstitutionMatrix and createMutabilityMatrix, fixed a bug when multipleMutation is set to “ignore”.
- Changed inference procedure for the 5-mer substitution model.
- Added inference procedure for 5-mers without enough observed mutations in the mutability model.
- Fixed a bug in background 5-mer count for the RS model.
- Fixed a bug in IMGT gap handling in createMutabilityMatrix.
- Fixed a bug that occurs when sequences are in lower cases.
Version 0.1.0: June 18, 2015¶
Initial public release.
General:
- Restructured the S4 class documentation.
- Fixed bug wherein example
Influenza.tab
file did not load on Mac OS X. - Added citations for
citation("shazam")
command. - Added dependency on data.table >= 1.9.4 to fix bug that occured with earlier versions of data.table.
Distance Calculation:
- Added a human 1-mer substitution matrix,
HS1FDistance
, based on the Yaari et al, 2013 data. - Set the
hs1f
as the default distance model fordistToNearest()
. - Added conversion of sequences to uppercase in
distToNearest()
. - Fixed a bug wherein unrecongized (including lowercase) characters would lead to silenting returning a distance of 0 to the neared neighbor. Unrecognized characters will now raise an error.
Mutation Profiling:
- Fixed bug in
calcDBClonalConsensus()
so that the function now works correctly when called with the argumentcollapseByClone=FALSE
. - Added the
frequency
argument tocalcObservedMutations()
andcalcDBObservedMutations()
, which enables return of mutation frequencies rather the default of mutation counts.
Targeting Models:
- Removed
M3NModel
and all options for using said model. - Fixed bug in
createSubstitutionMatrix()
andcreateMutabilityMatrix()
where IMGT gaps were not being handled.
Version 0.1.0.beta-2015-05-30: May 30, 2015¶
General:
- Added more error checking.
Targeting Models:
- Updated the targeting model workflow to include a clonal consensus step.
Version 0.1.0.beta-2015-05-11: May 11, 2015¶
Targeting Models:
- Added the
U5NModel
, which is a uniform 5-mer model. - Improvements to
plotMutability()
output.
Version 0.1.0.beta-2015-05-05: May 05, 2015¶
Prerelease for review.