Posts RSS Comments RSS 63 Posts and 41 Comments till now

Archive for January, 2007

Crawling amoebas

I just finished implementing the Nelder-Mead algorithm for finding the minima of functions. This is a fairly robust algorithm that operates in n dimensions. It requires no information on derivatives, and can efficiently crawl through valleys and the eyes of needles. How? Simulate an idealized amoeba!

Say you have a function f : \mathbb{R}^n \rightarrow \mathbb{R} which you want to minimize. Construct an n-dimensional simplex (the simplex is the higher dimensional analog of the triangle or tetrahedron). Now let the simplex, consisting of points S = \{ x_i, i = 1, \dots, n+1 \} crawl around \mathbb{R}^n.

For each iteration, let x_w be the worst of the \{ x_i \} (in the sense that it has the largest value of f), x_s be the second worst, and x_b be the best. Also, let S' = S – \{ x_w \}.

We will also need the vector which describes reflecting one point through the hyperplane formed by the rest of the points of the simplex. This is given by r = \frac{1}{n-1} \sum_{x_i \in S'} (x_i – x_w), the vector to the center of mass of that hyperplane.

We make the simplex crawl according to the following algorithm:

  1. If f(x_w + 2r) < f(x_b), try going twice as far beyond the simplex: x_w + 3r, then replace x_w with whichever gives of a lower value of f: x_w + 2r or x_w + 3r.
  2. Otherwise, if f(x_w + 2r) < f(x_s), then we just replace x_w with x_w + 2r (the reflection isn’t promising enough to try the extension).
  3. At this point life is looking bad. Try just pulling x_w in towards the rest of the simplex: if f(x_w + \frac{1}{2}r) < f(x_w), replace x_w with x_w + \frac{1}{2}r.
  4. If even that isn’t an improvement, we try to go to a smaller length scale (it may be that we’re hovering over a narrow valley, or even across several valleys, and are being confused by this). To do this, we contract the simplex around x_b. Take S' – \{ x_b \}, add to it whichever is better of x_w and x_w + 2r (remember that we only tested up to the point where x_w + 2r was better than x_s). Then we start again with the new simplex S” = \{ x_b + \frac{1}{2} ( x_i – x_b ) \} where the \{ x_i \} are drawn from (S' – \{ x_b \} ) \cup the better of x_w + 2r and x_w.

Wikipedia has a very nice animated gif of a simplex crawling according to these rules.

I have one serious problem with the simplex algorithm: I have no idea to prove when or if it will work. There are modifications of it to try to improve its reliability, particularly in high dimensions, but I know of no theoretical examination of these properties.

Koch’s Postulates I

How do we know that Mycobacterium tuberculosis actually causes tuberculosis? More generally, how do you prove that a given organism is the cause of a given disease? Robert Koch was the first to demonstrate such a connection, and set forth the logical foundations required in 1882 in his lecture Über die Ätiologie der Tuberkulose. I highly recommend reading the original (available here in English; I don’t have a link handy to the original German).

Modern expositions put forward the postulates in a didactic manner. Wikipedia’s is a fair sample:

  1. The organism must be found in all animals suffering from the disease, but not in healthy animals.
  2. The organism must be isolated from a diseased animal and grown in pure culture.
  3. The cultured organism should cause disease when introduced into a healthy animal.
  4. The organism must be reisolated from the experimentally infected animal.

Koch begins, “It was first necessary to determine if characteristic elements occurred in the diseased parts of the body, which do not belong to the constituents of the body, and which have not arisen from body constituents…and if they show any of the characteristics of independent organisms, such as motility, growth, reproduction, and fructification.” The example he gives is from anthrax: “If the blood of an animal dying of anthrax is examined, one finds in it a large number of regular, rod-shaped, colorless, immotile structures.

Koch does not state the second part of the postulate as it appears above. Tuberculosis can lie dormant in the lungs of a host for decades without causing illness. It was Koch who isolated the causative agent, Mycobacterium tuberculosis. He knew he could not insist upon such a condition.

There are other causes which can make such a requirement fail. A host may be infectious, but not ill. Typhoid Mary is the classic example of this. A given species may also not be universally disease causing. Vibrio cholerae, the bacterium which causes cholera, is generally a benign organism that lives in the water or quietly in aquatic animals. There are certain genes which it can acquire, however, which turn it into a scourge. Thus you are likely to find Vibrio cholera in many individuals, but in a form which is totally incapable of causing disease.

The next three correspond to an experimental technique. We have found some creature in the tissues of infected individuals, but how do we demonstrate that it is the causative agent? Koch’s answer is to separate “the parasite from the diseased organism, and from all of the products of the disease which could be subscribed to a disease-inducing influence, and then introducing the isolated parasite into healthy organisms and induce the disease anew with all its characteristic symptoms and properties.

Koch’s anthrax example is particularly clear. Anthrax will grow quite happily outside of a host if it is provided with nutrients, but the blood cells of the host do not, and all the chemical makeup of the blood besides cells don’t reproduce. If we take some blood from an infected animal, and spread it over a nutrient rich medium — a slice of potato, say, boiled to make it sterile — then the anthrax bacteria will grow, and the other components will not. If some of the resulting bacteria are spread on a new potato, and this is repeated again and again, then eventually the blood of the animal becomes negligible.

What if there are other bacteria which come along as well? The easiest way to rule this out is to examine what is growing on the potato under a microscope. If all you see are the “rod shaped…structures,” then you have some confidence that there are no other bacteria. Further, if you spread the bacteria very thinly on the new potato, you can get their density low enough where you get colonies that arise from only a few cells. If you have a second bacterial species, then you should get some colonies with only one and some with only the other as well as the mixed ones. If all your colonies still produce the same disease in another animal, then you can have some confidence that there is no other bacterium.

But what about a virus? With modern tools, you might separate the bacteria from any viruses by centrifuging them. The bacteria will settle at the bottom of the tube long before any viruses will. Then you can proceed with both separately. There may even be cases where you have to have both bacterium and virus in order to get the disease.

There are bacteria that won’t grow outside of a host. Mycobacterium leprae, a relative of tuberculosis, and the causative organism of leprosy is the classic example. It is cultured in the lab in the footpads of armadillos, as it has lost many of the genes it would need to grow outside of a host. How do you show this connection? Gerhard Hansen, the Norwegian physician who first found the bacterium, had trouble convincing others of his findings for this very reason.

Viruses cause similar problems. A virus cannot reproduce without a host. Generally viruses are maintained in the lab by letting them prey on cultures of cancer cells, but this makes the isolation much more difficult. However, in the century since Koch, our knowledge of the makeup of mammalian cells has grown enormously. Today a biochemist can with some confidence remove everything from such a culture of viruses and cancer cells except the virus.

There is one great weakness to this logical structure: you must have an animal model for the disease, a cheap one which lets you infect many individuals. Anthrax will happily infect most mammals. Tuberculosis is perfectly capable of killing mice, guinea pigs, and rabbits as well as humans. But HIV will infect only humans and, with difficulty, chimpanzees. How do you demonstrate that the virus is the cause then?

Systems Biology at NYAS

I attended a New Vistas lecture at the New York Academy of Sciences this evening.  David Botstein of Princeton’s Sigler Institute hosted Mike Elowitz, formerly of Rockefeller, now at Caltech, and Saaed Tavazole, also at the Sigler Institute.

Elowitz studies stochastic variation in clonal populations.  It was a fad for a while, but it was important to break the mindset that identical genes mean identical cells which ruled the minds of most microbiologists.  Circuitously, it led to my project.  He just rehashed the fact that there was cell to cell variation.

Tavazole is looking for gene expression signals that might correlate with ecological niches in bacteria.  However, he’s looking in E. coli K12 MG1655, the standard lab strain that has been passaged in a test tube for forty years.  He found a correlation between the regulation of genes induced by increasing oxygen and decreasing temperature (corresponding to leaving the mammalian gastrointestinal tract for the soil), and then tried to break the connection between the two by cycling between high oxygen, high temperature and low oxygen, low temperature conditions in a chemostat.  Quickly strains evolved which outcompeted the ancestors in this situation, but it is predictably an artifact: the growth rate of the evolved strains goes up very quickly when they are first switched to high oxygen, high temperature, but it quickly decays back to that of the ancestral strain.  The strain has sped up its adaptation, and possibly done something strange like undergoing high speed, reductive cell division when it is shifted between conditions.  He showed that the correlations from the ancestral strain between oxygen and temperature induced genes are weakened or broken, but I’m not sure how significant that is.  If the bacteria are undergoing some wild, uncontrolled cell division, there is no reason to expect their gene expression profiles to be the same.

Systems biology as far as I can tell is a rehashing of the old cybernetics program on genes, wedded with some bits and pieces of statistical genetics.  The endgame appears to be a directed graph, possibly with multiple kinds of arrows, each corresponding to a term in a (possibly stochastic) differential equation.  I have two particular concerns with this program.  First, you can map systems of differential equations \dot{x}_i = f(x_j), where f is a polynomial of low degree, probably two, unambiguously onto such a directed graph (I don’t offer this as a rigorous theorem, merely an observation), but you cannot map partial differential equations, deterministic or stochastic, without an infinite number of edges.  Second, I feel like the program does not allow you to formulate questions that aren’t fairly trivial.  The directed graph models were inherited from chemistry, where they describe reactions in well mixed solutions.  You can talk about sodium ions binding to and coming off from chloride ions, but it won’t tell you about the crystalline forms that form as you saturate the solution.

Controlling ImageJ from Scheme

I just spent ten days with the Bioimaging Group at EPFL, Lausanne. They’re great folks, and I at long last am on the way to having a segmenter for our microscope images. There’s just one catch: they do everything as plugins for ImageJ.

I never really learned to write Java. It’s basically C with classes and memory management bolted on top, so it’s not exactly hard to learn, but I don’t appreciate managing memory anymore. I’m not willing to track syntax errors through hundreds of lines of code. And most damningly, I can’t program in languages where functions are first-class objects anymore. I just can’t seem to do it. At every turn, I say, “Ah, I’ll just pass this function as an argument to…oh wait.”

Thankfully, I was able to escape. Kawa is a Scheme REPL and compiler for the Java virtual machine that let’s you call native Java objects seamlessly. You can define new classes which can be instantiated from Java. It has multidimensional arrays that you can reshape at runtime, and which are compiled to unboxed, one dimensional Java arrays.

I spent a couple hours while I was there trying to write ImageJ plugins in Kawa. I failed. Horribly. I still don’t know why. However, I realized I was going at the problem backwards. Instead I opened up Kawa, and typed (<ij.ImageJ>). Up pops ImageJ! Another couple of hours learning the details of Kawa and coding gave me full control of the program from native Scheme, and access to the image pixels in place as a native two dimensional array. Further, the view of the array in Scheme is dynamically typed, so code you write to manipulate images automatically works on all four pixel types (Byte, Short, Color, and Float).

Then I started coding in earnest. I ported Todd Eigenschink’s matrix05.scm linear algebra library to Kawa using the aforesaid native arrays. Then I started coding my own cubic B-spline library. This is reinventing the wheel, but I would have had to in Java as well, and matrix05.scm is far superior to either of the Java linear algebra libraries.

When I get it cleaned up and documented a bit more, I’ll probably release the interface so everyone else can stop beating their heads against Java without sacrificing the sizeable codebase already in ImageJ.