Programmer Virtue in Molecular Biology
Larry Wall taught us the three great virtues: laziness, hubris, and ego. I have long cultivated this in my life, and continue to do so now that I am a biologist.
My latest act of virtue (or perhaps virtù) involves selecting restriction enzymes for some cloning. Starting with a plasmid and a sequence I want to replicate in it, I need enzymes A, B, C, D, E, and X with the following properties:
- A and B have compatible ends, but when those two ends are ligated, none of the enzymes can recut them.
- None of the others have compatible ends.
- A must appear one time in the plasmid without the insert.
- B and C must not appear in the plasmid.
- D, E, and X must appear one time in the plasmid.
Don’t worry about where these rules come from for now. I’ll explain another time. Instead, let us focus on my tool for laziness: Prolog. In Prolog, you write down a set of logical propositions and rules, and then query them. This sounds fairly trivial, but it’s in fact really, really useful.
I typed in NEB’s compatible cohesive ends table for their restriction enzymes, which consisted of many lines that looked like this:
compatible(spei, [avrii,nhei,styi,xbai], [bfai]).
This is an example of a fact in prolog. It consists of a label for the fact (compatible), and a series of arguments. The first argument is the enzyme, the second is the list of compatible enzymes (lists are enclosed in [], and the entries are separated by commas), and the third is the list of enzymes that recleave such a ligation. I can create as many facts as I like with the same label and different arguments. In this case, it’s one for every line of that NEB table. Then I can define rules which let Prolog derive more facts from the simple facts I have given it:
is_compatible(X,X) :- true.
is_compatible(X,Y) :- compatible(X,F,_), member(Y,F).
The first line says that any sites cleaved by the same enzyme are compatible. The second says two enzymes are compatible is the second is a member of the list of compatible enzymes of the first.
is_incompatible(X,Y) :- \+ is_compatible(X,Y).
And to be incompatible, they must not be compatible (\+ is ‘not’ in Prolog). Here’s another predicate which is fairly straightforwards:
is_recleaved_by(A,A,A) :- true.
is_recleaved_by(X,Y,Z) :- compatible(X,G,F),member(Y,G),member(Z,F).
Then I put in the list of the restriction enzymes I have on hand: ncutter(aatii). and so forth. I also declare two lists for sites in the plasmid: restriction sites which appear once in the plasmid, and sites that do not appear in the plasmid (among those enzymes I have on hand):
m361_1([aatii, ecori, hindiii, hpai, kpni, mlui, ndei, nhei, noti, pvuii, spei, sphi, xhoi, xmni]).
m361_0([asci, ecorv, ncoi, paci, sbfi, scai, sfoi, snabi]).
And some convenience predicates so I don’t have to dig into the lists all the time:
singleM361(A) :- m361_1(B), member(A,B).
zeroM361(A) :- m361_0(B), member(A,B).
As an exercise, try to figure out what these do:
unrecleavable_pair(A,B) :-
is_compatible(A,B),
\+ is_recleaved_by(A,B,A),
\+ is_recleaved_by(A,B,B).
incompatible_with(_,[]) :- true.
incompatible_with(X,[Y|Ys]) :-
is_incompatible(X,Y),
incompatible_with(X,Ys).
incompatible_set([_]) :- true.
incompatible_set([X,Y|Xs]) :-
incompatible_with(X,[Y|Xs]),
incompatible_set([Y|Xs]).
Then I encode most of my rules into one query:
enzyme_set(A,B,C,X) :-
ncutter(A), ncutter(B),
unrecleavable_pair(A,B),
ncutter(C), ncutter(X),
incompatible_set([A,C,X]),
incompatible_set([B,C,X]).
Now I ask for sets of A, B, C, and X which work:
?- enzyme_set(A,B,C,X), singleM361(A), singleM361(X), zeroM361(B), zeroM361(C).
Prolog gives me every possible set. I look at my plasmid map and pick one that seems reasonable (A=MluI, B=AscI, C=EcoRV, X=AatII). Then I cheat a bit, look for a pair of enzymes in the polylinker that only appear once in the plasmid, and which are incompatible with everything else (which I check in Prolog as well).
It’s not as automated as it could be, nor is it elegant, but compare this to searching through enzymes descriptions by hand! Nor should you be intimidated by the language. I began this with a bare minimum of Prolog, which I picked up in about two hours by reading the first three chapters of Clocksin and Mellish’s Programming in Prolog.