Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dawnandrew100 authored Jul 16, 2024
1 parent 1917b1a commit bd2769b
Showing 1 changed file with 25 additions and 21 deletions.
46 changes: 25 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Limestone
This project contains several sequence alignment algorithms that can also produce scoring matrices for Needleman-Wunsch, Smith-Waterman, Wagner-Fischer, and Waterman-Smith-Beyer algorithms.
This project contains several sequence alignment algorithms that can also produce scoring matrices for Needleman-Wunsch, Smith-Waterman, Wagner-Fischer, Waterman-Smith-Beyer, Wagner-Fischer, Lowrance-Wagner, Longest Common Subsequence, and Shortest Common Supersequence algorithms.

***Please ensure that numpy is installed so that this project can work correctly***

Expand Down Expand Up @@ -42,45 +42,45 @@ This project contains several sequence alignment algorithms that can also produc

**Hamming Distance**
```python
from limestone.editdistance import hammingDist
from limestone import hamming

qs = "AFTG"
ss = "ACTG"

print(hammingDist.distance(qs, ss))
print(hamming.distance(qs, ss))
# 1
print(hammingDist.similarity(qs, ss))
# 3
print(hammingDist.binary_distance_array(qs, ss))
# [1,0,1,1]
print(hammingDist.binary_similarity_array(qs, ss))
print(hamming.similarity(qs, ss))
# 3
print(hamming.binary_distance_array(qs, ss))
# [0,1,0,0]
print(hammingDist.normalized_distance(qs, ss))
print(hamming.binary_similarity_array(qs, ss))
# [1,0,1,1]
print(hamming.normalized_distance(qs, ss))
# 0.25
print(hammingDist.normalized_similarity(qs, ss))
print(hamming.normalized_similarity(qs, ss))
# 0.75
```

**Needleman-Wunsch**
```python
from limestone.editdistance import needlemanWunsch
from limestone import needlemanWunsch

print(needlemanWunsch.distance("ACTG","FHYU"))
print(needleman_wunsch.distance("ACTG","FHYU"))
# 4
print(needlemanWunsch.distance("ACTG","ACTG"))
print(needleman_wunsch.distance("ACTG","ACTG"))
# 0
print(needlemanWunsch.similarity("ACTG","FHYU"))
print(needleman_wunsch.similarity("ACTG","FHYU"))
# 0
print(needlemanWunsch.similarity("ACTG","ACTG"))
print(needleman_wunsch.similarity("ACTG","ACTG"))
# 4
print(needlemanWunsch.normalized_distance("ACTG","AATG"))
print(needleman_wunsch.normalized_distance("ACTG","AATG"))
#0.25
print(needlemanWunsch.normalized_similarity("ACTG","AATG"))
print(needleman_wunsch.normalized_similarity("ACTG","AATG"))
#0.75
print(needlemanWunsch.align("BA","ABA"))
print(needleman_wunsch.align("BA","ABA"))
#-BA
#ABA
print(needlemanWunsch.matrix("AFTG","ACTG"))
print(needleman_wunsch.matrix("AFTG","ACTG"))
[[0. 2. 4. 6. 8.]
[2. 0. 2. 4. 6.]
[4. 2. 1. 3. 5.]
Expand All @@ -90,13 +90,17 @@ print(needlemanWunsch.matrix("AFTG","ACTG"))

# Work In Progress

-- To be continued
Jaro and Jaro-Winkler algorithms.
Importing and parsing FASTA, FASTQ, and PDB files.

# Caveats

Due to the recursive nature of the Hirschberg algorithm, if a distance score or matrix is needed it is best to use the Needleman-Wunsch algorithm instead.

Note that due to the fact that the Hamming distance does not allow for substitutions, insertions, or deletions, the "aligned sequence" that is returned is just the original sequences in a formatted string.
Note that due to the fact that the Hamming distance does not allow for insertions, or deletions, the "aligned sequence" that is returned is just the original sequences in a formatted string.
This is due to the fact that actually aligning the two sequences using this algorithm would just lead to two lines of the query sequence.
It should also be noted that the Hamming distance is intended to only be used with sequences of the same length.
To compensate for strings of differing lengths, my algorithm adds 1 extra point to the distance for every additional letter in the longer sequence since this can be seen as "swapping" the empty space for a letter or vice versa. However, any distance obtained this way **will not reflect an accurate Hamming distance**.

My Waterman-Smith-Beyer implementation does not always align with that of [Freiburg University](http://rna.informatik.uni-freiburg.de/Teaching/index.jsp?toolName=Waterman-Smith-Beyer), the site I've been using for alignment validation.
It is possible that their implementation has an issue and not mine but I wanted to mention this here and provide the link to my [StackOverflow](https://bioinformatics.stackexchange.com/questions/22683/waterman-smith-beyer-implementation-in-python) question for the sake of posterity.
Expand Down

0 comments on commit bd2769b

Please sign in to comment.