Programmer Virtue in Molecular Biology
Larry Wall taught us the three great virtues: laziness, hubris, and ego. I have long cultivated this in my life, and continue to do so now that I am a biologist.
My latest act of virtue (or perhaps virtù) involves selecting restriction enzymes for some cloning. Starting with a plasmid and a sequence I want to replicate in it, I need enzymes A, B, C, D, E, and X with the following properties:
- A and B have compatible ends, but when those two ends are ligated, none of the enzymes can recut them.
- None of the others have compatible ends.
- A must appear one time in the plasmid without the insert.
- B and C must not appear in the plasmid.
- D, E, and X must appear one time in the plasmid.
Don’t worry about where these rules come from for now. I’ll explain another time. Instead, let us focus on my tool for laziness: Prolog. In Prolog, you write down a set of logical propositions and rules, and then query them. This sounds fairly trivial, but it’s in fact really, really useful.
I typed in NEB’s compatible cohesive ends table for their restriction enzymes, which consisted of many lines that looked like this:
compatible(spei, [avrii,nhei,styi,xbai], [bfai]).
This is an example of a fact in prolog. It consists of a label for the fact (compatible), and a series of arguments. The first argument is the enzyme, the second is the list of compatible enzymes (lists are enclosed in [], and the entries are separated by commas), and the third is the list of enzymes that recleave such a ligation. I can create as many facts as I like with the same label and different arguments. In this case, it’s one for every line of that NEB table. Then I can define rules which let Prolog derive more facts from the simple facts I have given it:
is_compatible(X,X) :- true.
is_compatible(X,Y) :- compatible(X,F,_), member(Y,F).
The first line says that any sites cleaved by the same enzyme are compatible. The second says two enzymes are compatible is the second is a member of the list of compatible enzymes of the first.
is_incompatible(X,Y) :- \+ is_compatible(X,Y).
And to be incompatible, they must not be compatible (\+ is ‘not’ in Prolog). Here’s another predicate which is fairly straightforwards:
is_recleaved_by(A,A,A) :- true.
is_recleaved_by(X,Y,Z) :- compatible(X,G,F),member(Y,G),member(Z,F).
Then I put in the list of the restriction enzymes I have on hand: ncutter(aatii). and so forth. I also declare two lists for sites in the plasmid: restriction sites which appear once in the plasmid, and sites that do not appear in the plasmid (among those enzymes I have on hand):
m361_1([aatii, ecori, hindiii, hpai, kpni, mlui, ndei, nhei, noti, pvuii, spei, sphi, xhoi, xmni]).
m361_0([asci, ecorv, ncoi, paci, sbfi, scai, sfoi, snabi]).
And some convenience predicates so I don’t have to dig into the lists all the time:
singleM361(A) :- m361_1(B), member(A,B).
zeroM361(A) :- m361_0(B), member(A,B).
As an exercise, try to figure out what these do:
unrecleavable_pair(A,B) :-
is_compatible(A,B),
\+ is_recleaved_by(A,B,A),
\+ is_recleaved_by(A,B,B).
incompatible_with(_,[]) :- true.
incompatible_with(X,[Y|Ys]) :-
is_incompatible(X,Y),
incompatible_with(X,Ys).
incompatible_set([_]) :- true.
incompatible_set([X,Y|Xs]) :-
incompatible_with(X,[Y|Xs]),
incompatible_set([Y|Xs]).
Then I encode most of my rules into one query:
enzyme_set(A,B,C,X) :-
ncutter(A), ncutter(B),
unrecleavable_pair(A,B),
ncutter(C), ncutter(X),
incompatible_set([A,C,X]),
incompatible_set([B,C,X]).
Now I ask for sets of A, B, C, and X which work:
?- enzyme_set(A,B,C,X), singleM361(A), singleM361(X), zeroM361(B), zeroM361(C).
Prolog gives me every possible set. I look at my plasmid map and pick one that seems reasonable (A=MluI, B=AscI, C=EcoRV, X=AatII). Then I cheat a bit, look for a pair of enzymes in the polylinker that only appear once in the plasmid, and which are incompatible with everything else (which I check in Prolog as well).
It’s not as automated as it could be, nor is it elegant, but compare this to searching through enzymes descriptions by hand! Nor should you be intimidated by the language. I began this with a bare minimum of Prolog, which I picked up in about two hours by reading the first three chapters of Clocksin and Mellish’s Programming in Prolog.
if the values obtained by successively repeating the operation on the body, beginning at and extending over decreasing intervals of time previous to
. In general this is a property. Consider a model for tuberculosis in a hospital. Tuberculosis is carried by small droplets which are expelled into the air when an actively tuberculous patient coughs. We can construct a device which gathers droplets which come into contact with some area of its surface, and then checks for the presence of the bacillus in the droplets (for instance, by chemically extracting the DNA and amplifying genes specific to TB). Then we install this device in the hospital’s air handling units and record for some period of time. We can calibrate the device by sending a known quantity of bacilli containing droplets at it in the laboratory. In this case
, the probability that an individual with specified properties will contract the disease in question when exposed to an infectious dose.
from the rate at which they are infected.
is given by a Poisson distribution with parameter
. We can’t tell how many of these people received multiple doses, so the only reliable number we get is those who are still healthy. The probability of zero infectious doses actually causing disease in time
. So if we have
healthy individuals out of
total, then
.
. This method is due to