Discussion:
[Tutor] Need advise on protein sequence alignment code in python 2.7 giving wrong output
Белякова Анастасия
2015-08-11 08:53:38 UTC
Permalink
I am trying to develop code to perform global sequence alignment in python
2.7 for proteins with gap penalty of -5 and blossom64 scoring matrix. Code
I have now give wrong output.

When I give 'PLEASANTLY' and 'MEANLY' as input, I get 'PLEASANTLY' and
'M-EAN--L-Y' as output instead of 'PLEASANTLY' and '-MEA--N-LY'. Blossom62
scoring matrix is given as dictionary {'A': {'A': 4, 'C': 0, etc. -5 is
insertion/deletion penalty.

Here is a code:

def scoring(v,w,blossom62):
S = [[0]*(len(w)+1) for _ in xrange(len(v)+1)]
for i in range(0,len(w)+1):
S[0][i]=i*-5
for j in range (0,len(v)+1):
S[j][0]=j*-5
for i in xrange(len(v)):
for j in xrange(len(w)):
S[i+1][j+1] =
max(S[i+1][j]-5,S[i][j+1]-5,S[i][j]+blossom62[v[i]][w[j]] )
backtrack=[[0]*(len(w)) for _ in xrange(len(v))]
for i in xrange(len(v)):
for j in xrange(len(w)):
if max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j]:
backtrack[i][j]= 0
elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i][j-1]:
backtrack[i][j]= 1
elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j-1]:
backtrack[i][j]= 2
def insert_indel(word,pos):
return word[:pos]+'-'+word[pos:]
v_aligned, w_aligned = v, w
i,j =len(v)-1,len(w)-1
while i*j!=0:
if backtrack[i][j]==0:
i-=1
w_aligned=insert_indel(w_aligned,j)
elif backtrack[i][j]==1:
j-=1
v_aligned=insert_indel(v_aligned,i)
elif backtrack[i][j]==2:
j-=1
i-=1
print v_aligned
print w_aligned

What is wrong and how to fix it?
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-11 11:42:15 UTC
Permalink
Post by Белякова Анастасия
I am trying to develop code to perform global sequence alignment in python
2.7 for proteins with gap penalty of -5 and blossom64 scoring matrix. Code
I have now give wrong output.
This list is for folks learning the Python language and
standard library. Given that you must assume that most
of us have no idea what you are talking about. We are
not bio-scientists.

So you will need to give uis a bit more explanation about what's
going on. What is blossom64 scoring? How does it work?
Whats a gap penalty? and so on.

There may also be more appropriate fora where you will find
people who know your area? Perhaps in the SciPy community?

Given all of that I can only make a few comments below.
Post by Белякова Анастасия
When I give 'PLEASANTLY' and 'MEANLY' as input, I get 'PLEASANTLY' and
'M-EAN--L-Y' as output instead of 'PLEASANTLY' and '-MEA--N-LY'. Blossom62
scoring matrix is given as dictionary {'A': {'A': 4, 'C': 0, etc. -5 is
insertion/deletion penalty.
Again that means little to me.
Post by Белякова Анастасия
S = [[0]*(len(w)+1) for _ in xrange(len(v)+1)]
S[0][i]=i*-5
I think this could be clearer as

for i in range(len(S[0])):
S[0][i] = i * -5
Post by Белякова Анастасия
S[j][0]=j*-5
for i,row in enumerate(S):
row[0] = i * -5
Post by Белякова Анастасия
S[i+1][j+1] =
max(S[i+1][j]-5,S[i][j+1]-5,S[i][j]+blossom62[v[i]][w[j]] )
I'll have to assume that's correct but I'd probably
separate the max() terms to make them easier to see:

max(S[i+1][j]-5,
S[i][j+1]-5,
S[i][j]+blossom62[v[i]][w[j]] )
Post by Белякова Анастасия
backtrack=[[0]*(len(w)) for _ in xrange(len(v))]
backtrack[i][j]= 0
backtrack[i][j]= 1
backtrack[i][j]= 2
Again I'd separate the max terms - even if only by adding a space.

if max(S[i][j-1], S[i-1][j], S[i-1][j-1]) == S[i-1][j]:

Readability helps a lot in debugging.
In fact I'd be tempted to define a function max_index()
to return the max index of the input tuple. It would
improve readability and you can test it independently.

Something like:

def max_index(*t):
return tuple(t).index(max(t))

Then your loops become

for i in xrange(len(v)):
for j in xrange(len(w)):
backtrack[i][j]= max_index(...)

which expresses the logic more clearly and avoids
the potential error implicit in duplicating the
terms for each test.
Post by Белякова Анастасия
return word[:pos]+'-'+word[pos:]
Defining a function in the middle of your program logic is just
confusing. If it must be an inner function move it to the top.
Better still move it outside to module scope. As it is, it just
interrupts the flow of your code.
Post by Белякова Анастасия
v_aligned, w_aligned = v, w
i,j =len(v)-1,len(w)-1
i-=1
w_aligned=insert_indel(w_aligned,j)
j-=1
v_aligned=insert_indel(v_aligned,i)
j-=1
i-=1
Again I need to assume that's correct.
Post by Белякова Анастасия
print v_aligned
print w_aligned
I doubt any of those comments solve your problem but the
increased clarity may make debugging easier.

You could also try splitting the last while loop out into a
separate function. You can then test it in isolation, which
may help identify where the error lies.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https:
Mark Lawrence
2015-08-11 18:49:32 UTC
Permalink
Post by Белякова Анастасия
I am trying to develop code to perform global sequence alignment in python
2.7 for proteins with gap penalty of -5 and blossom64 scoring matrix. Code
I have now give wrong output.
Will BioPython fit your needs?

http://biopython.org/wiki/Main_Page
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription opt
Laura Creighton
2015-08-11 18:54:05 UTC
Permalink
You posted incomplete code -- at any rate I cannot get it to work.

However, I think the problem is here:

for i in xrange(len(v)):
for j in xrange(len(w)):
if max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j]:

When j is 0, j-1 refers to the end of the list, which makes no sense.

I think you want:

for i in xrange(1, len(v)+1):
for j in xrange(1, len(w)+1):
if max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j]:

But I could be wrong about this, since I could never get the code to work.

The other question is, should you be writing your own code to do this
at all?

In trying to figure out what it was you want to do, I downloaded
https://pypi.python.org/pypi/nwalign
nwalign.global_align("PLEASANTLY", "MEANLY", gap_open=-5, gap_extend=-5, ma
trix='./BLOSUM62.txt')
('PLEASANTLY', '-MEA--N-LY')

Which seems to be the answer you are looking for.

The original code for this is at
https://bitbucket.org/brentp/biostuff/src/282b504ac9020fe1449e23f800b20b5bd7d12061/nwalign/pairwise.py?at=default .

though it writes to Cython for reasons of speed.

The bitbucket code says:
for i in range(1, max_i + 1):
ci = seqi[i - 1]
for j in range(1, max_j + 1):
cj = seqj[j - 1]

Which is why I think that what I wrote will fix your code, but again, I
never got it to work so all of this is untested.

Laura


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-08-11 19:14:18 UTC
Permalink
nwalign.global_align("PLEASANTLY", "MEANLY", gap_open=-5, gap_extend=-5, matrix='./BLOSUM62.txt')
('PLEASANTLY', '-MEA--N-LY')
I forgot to mention that I got my BLOSUM62.txt from
http://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt

in case that matters.

Laura
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Mark Lawrence
2015-08-11 21:09:58 UTC
Permalink
Post by Laura Creighton
You posted incomplete code -- at any rate I cannot get it to work.
Having taken a closer look how can it work when range() and xrange() are
both used?
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-11 21:16:10 UTC
Permalink
Post by Mark Lawrence
Post by Laura Creighton
You posted incomplete code -- at any rate I cannot get it to work.
Having taken a closer look how can it work when range() and xrange() are
both used?
OK, I'll bite. Why is that a problem?
It's not conventional but I can't think of a reason why not.
What am I missing?
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Mark Lawrence
2015-08-11 21:26:40 UTC
Permalink
Post by Alan Gauld
Post by Mark Lawrence
Post by Laura Creighton
You posted incomplete code -- at any rate I cannot get it to work.
Having taken a closer look how can it work when range() and xrange() are
both used?
OK, I'll bite. Why is that a problem?
It's not conventional but I can't think of a reason why not.
What am I missing?
Whoops, I'm so used to thinking Python 3 I'd completely forgotten that
both exist in Python2.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Loading...