Posts RSS Comments RSS 63 Posts and 41 Comments till now

Archive for the 'biology' Category

Protein names in titles

I formulated a principle at a conference I just attended: ignore talks and posters with protein names in their title or the first sentence of their introduction.

It’s an issue of relevance. Each major discipline has its own concept of relevance. Roughly, a problem gains relevance as it constrains swathes of a field. Alain Aspect’s experimental tests of Bell’s inequalities was eminently relevant to physics because of the amount of speculation in quantum mechanics which they ended. Integrable models in statistical mechanics are less physically relevant because they only point out directions of interest in the limited corners where they were formulated.

Biology has its own relevance, largely dictated by evolution. Results are less relevant the more narrowly they apply to specific genera, species, or strains. Results on specific organisms are relevant only insofar as they can be combined with results in other organisms to produce a larger picture.

Chemistry has its own relevance, largely based on the fact that a carbon is always a carbon, a hydrogen is always a hydrogen.

Much of molecular biology, cell biology, and immunology is almost chemical in its aesthetic. Immunoglobulins share so much structure throughout vertebrates that studying them with an eye only to chemical relevance, as if an IgB were always and IgB, still has biologically relevance. DNA, that ubiquitous molecule, invites a chemical approach. Type III secretion systems vary enough among species to doom the purely chemical aesthetic.

As a rule of thumb, any molecule familiar to a random biologist from a random field is fair game for a chemical approach. DNA is justified. IgB is justified. A particular Rac GTPase or kinase involved in synthesizing a particular phosphoinositol are not.

If it is not, you must provide the biological context. What organisms, and what variation among them? What phenomenon caused this protein to rear its head? If the protein’s name has already appeared in the title, then relevance has been squeezed out.

BIO2010 (Part 3)

In my previous post on the Bio2010 report I gave some principles to guide curriculum design. Remember, the goal is a mature and sophisticated mind, not a knowledge of any particular subfield. I’m not qualified to select the content for such a curriculum, so please suggest changes. We’ll begin with a survey of the advanced undergraduate courses.

Neurobiology: I give neurobiology its own place because certain aspects of it are very mature and therefore a prime field for training minds. For similar reasons, immunology makes no appearance in this list, as the field is too contorted, at least as represented in the textbooks, to usefully train anyone.

The first milestone of the course is a physical understanding of the Hodgkin-Huxley model, and how it came to be. This will probably take half a semester.

After Hodgkin-Huxley, I am out of my depth. The two obvious directions are the essential features of the synapse, and the basics of neural architecture, probably using vision and hearing as the models. This brief discussion (warning: PDF) has some interesting references, but I have not had time to follow them up.

Molecular biology has two aspects. The first is structural understanding of the three major biopolymers (DNA, RNA, and protein); the second is designing protocols to manipulate them. Students should not only understand the structural differences between GC rich vs. AT rich helices (at the level of something like Calladine’s Understanding DNA), but also why each step in a miniprep is there, and how to calculate what it should be in the absence of a protocol. This means they need quantitative descriptions of sedimentation and separation by centrifugation, chromatography, and amplification by various kinds of PCR. Then move on to models of transcription and translation, and mechanics of molecular motors. This course will probably take about a year.

Evolution and ecology begins with population genetics (Gillespie’s Population Genetics seems about right), epidemiology (some selection from Diekmann and Heesterbeek’s Mathematical Epidemiology of Infectious Diseases), something about species interactions and evolution of behavior (a couple bits and pieces of Gintis’s Game Theory Evolving), bioenergetics and flows of energy carrying molecules both in organisms, which gives a chance to discuss metabolism, and in ecosystems. These selected topics constitute a semester.

The next semester is genetic mapping and screening (design of screens and selections, RFLP and other techniques) and phylogeny (constructing the universal phylogenetic tree, with visits to representative areas such as evolution of horses, gene transfer in bacteria, and parallel evolution of fluorescent proteins in corals).

Biomechanics
is the successor to anatomy, merged with mechanical engineering and developmental biology. It begins with a couple weeks of statics such as Galileo’s argument on the scaling of bones, pressures on bacterial membranes and cell walls, DNA pressure and packing in viral capsids.

Then it moves to dynamics: swimming of microscopic and macroscopic creatures, motility of cells by actin, circulation in the body, and basic models of cell migration and patterning in development.

What about supporting courses?

Single and multivariable calculus are necessary in all the courses, as are ordinary differential equations, with an emphasis on phase space analysis and qualitative features, but definitely including the Fourier transform. Partial differential equations are used only in biomechanics and evolution and ecology, and in neither place in really intricate ways, so they can be handled in those classes. Probability and stochastic processes appear in all of the courses except biomechanics, and possibly there as well depending on the exact selection of topics. It is also necessary to statistics.

Chemistry of ions in solutions is necessary for neurobiology, chemical thermodynamics for evolution and ecology and molecular biology. Organic chemistry is necessary to make sense of the pieces in molecular biology. A general chemistry course with a really physical bent is probably enough as a starting point.

Physics of point particles and rigid bodies, thermodynamics, and basic electromagnetism are all that need to be covered outside of the courses. I expect most science students to take introductory physics, chemistry, and biology as part of choosing their field of study, so these courses cannot be altered too much.

All of the classes can provide opportunities for students to dig into real data, so a separate data analysis and statistics course makes sense. I think a selection from Tukey’s Exploratory Data Analysis and Data Analysis and Regression and Kiefer’s Introduction to Statistical Inference would form a good basis for such a course. All science students should have such a course, so there is no reason to specialize it for biology.

The data analysis course and the evolution and ecology course both involve an understanding of programming and computational cost of algorithms. A computer science course on basic numerical analysis and search, sorting, and alignment algorithms would be a useful companion to both. Students should write all their own code for the assignments in this class, and use almost no libraries and certainly no algorithmic black boxes. About the only way to actually do this is in a semester is to teach the course in Scheme or a similar language.

What goes in the introductory course? My inclination is the first two lectures of material from each major section of the four courses, rearranged into a more unified year-long presentation.

What laboratory classes should students take? I would propose a separate introductory laboratory class of experiments that can be done in one or two sessions, separate from the introductory biology course, but covering what of that material is checkable in the lab in reasonable time.

Molecular biology is an obvious for a candidate for lab, learning how to prepare DNA, digest and religate it, run western blots, and do genetic engineering in E. coli.

Almost everyone uses a microscope, so a one semester laboratory on optical microscopy covering material something like Shinya Inoué’s Video Microscopy would be a good time investment. Then an intermediate lab where students carry out three or four longer experiments in small groups throughout the semester.

The Bio2010 report recommends a research seminar, which I think is a good idea. The Rockefeller University requires such a course of its graduate students. The class reads two of the classic papers in biology each week and meets with two faculty members over lunch to go over them.

In the end we have a schedule that looks like this:

First year:
Introductory biology (all year)
Introductory chemistry (all year)
Introductory physics (all year)
Single and multivariable calculus (all year)

Second year:
Introductory biology laboratory (all year)
Differential equations (fall) / Probability and stochastic processes (spring)
Neurobiology (fall) / Biomechanics (spring)
Numerical analysis (fall) / Data analysis and statistics (spring)

Third year:
Evolution and Ecology (all year)
Molecular biology (all year)
Microscopy lab (fall) / Intermediate lab (spring)
Research seminar (all year)

Fourth year:
Molecular biology laboratory (fall)

This leaves plenty of space for electives and general requirements. Now everyone have at and rip this to shreds.

BIO2010 (Part 2)

In an earlier post I gave my quibbles with the language of the BIO2010 report. I promised to lay out an alternative curriculum proposal, but first I’m going to set forth my underlying philosophy. First a few principles:

Repeat a canon several times.
Choose a core set of techniques and ideas which will properly shape the students’ minds. The shape of their mind when they come out is far more important than any particular collection of facts.

My sister once noted in surprise, the night before a calculus exam, "I can’t study for this class! I can only practice." There is no body of things that a science student should know, only a body of things he should be able to do. Skills and mindsets take much longer to teach than facts, so it is important to by stingy with what gets space in the curriculum. For example, given familiarity with the simple SIR epidemiology model and with partial differential equations from mechanics, there is no conceptual difficulty with adding spatial effects to the SIR model. Unless it is a step to some other compelling mental skill, ditch it.

If math is your language, don’t work in translation.
You would not expect students of Italian literature to have everything taught in English translation except for one course where they played with Italian translation. Just so, if mathematics isn’t the language in which you teach your classes, adding a random course will not fix the situation. Having students take carefully designed courses in math, computer science, and physics is not the right approach. They should only take those courses which impart tools they will use in every biology class from there on out. If no biology class in your department uses complex analysis, then don’t require the class of your students. If almost every class needs stochastic processes, then abstract that out and require a course in it. If no biology class in your department uses math, then if you find this state of affairs unacceptable, it falls upon you to correct it, not to delegate the task to a mathematician who spends his days worrying about category theory.
Prerequisite courses are for the universal, not the potentially useful.
Physicists have separate calculus courses not because it’s a good idea for students to know calculus, but because the tools are used in every physics course the students take, so it is more efficient to deal with the tools once, uniformly, and assume them from there on out. Topics like Laguerre polynomials which show up in only some of the following physics classes are simply taught alongside the physics, not abstracted out.

Does every course require the students to know what a calcium ion is, what it does in solution, and how it combines with things? Then general chemistry seems like a compelling prerequisite.

A course is the minimal mental pattern of a practitioner.
Students taking a course shouldn’t have to memorize facts. They are there to pick up a core of tools and structures which shapes their mind for research in a field. I propose a rule of thumb: if a practitioner uses something weekly, it should be in the introductory course; monthly, in the advanced undergraduate course; annually, in the graduate course.

Finally, the report wants physics and chemistry departments to modify their introductory courses for the needs of the biology department, or to offer special introductory courses. No. Many students are undecided as to their major in their first year, and take the introductory courses in all the sciences. I think each of these courses should be taught as if to a class full of majors in the department teaching the course. For instance, why should physics departments throw out relativity? And if an ostensible biology student seeing relativity decides that they like that more, shouldn’t they become a physics major?

I’ll post the curriculum proposal next.

(Update: Talking to my adviser, his comment was that the single most important thing that could be done would be to cut the administrative burden (not including teaching) of principal investigators from half to two thirds of their time to something much, much smaller. I think perhaps five to ten percent should be the upper limit.)

BIO2010 (Part 1)

The NIH and HHMI organized a much trumpeted report entitled BIO2010: Transforming Undergraduate Education for Future Research Biologists. It came out in 2003 to much acclaim. I reach the crux of it (chapter 2, "A New Biology Curriculum") last night. I don’t like it. I have some basic philosophical objections to their approach, and a lot of quibbles with the details as they stand. In this post I’ll go through the quibbles, and in later ones I’ll attack the underlying philosophy, and propose a curriculum in my turn.

The chapter is organized into a list of core concepts to be taught from biology, physics, chemistry, engineering, and math and computer science, then a set of suggested curricula. I’ve broken my gripes down by section:

Biology: Some of the concepts listed are important ("Biological systems obey the laws of chemistry and physics." and "Lipids assemble with proteins to form membranes, which surround cells to separate them from their environment. Membranes also form distinct compartments within eukaryotic cells.") and I dearly hope they are being taught already. Others are sloppy, tautological, or espouse absurdities such as "holistic" science.

Chemistry: I know almost know chemistry and am not willing to comment on this section except for the fact that "computational methods and modeling" is a bullet point right alongside Lewis structures.

Physics: The physics recommendations largely consist of a standard intro physics course for hard science majors with a few topics moved here or there. This isn’t a bad thing. I think that the best approach would be to take an 1960’s edition of Halliday and Resnick and go through that in a year (for those who don’t know it, the early editions of Halliday and Resnick were clear, comprehensive, and had a doable number of extremely well selected problems).

The report also wants the students to do learn by interacting with simulations. I can tell you from being on the receiving end of such an attempt and from anecdotal evidence in the community that such approaches are dismal failures. Let the course be analytical and experimental. Leave the computer out of it.

Engineering: Aside from the incredibly broad heading, this has about the only interesting, concrete recommendation in the report. The introduction was written by the biologists who are convinced that systems engineering is all that they might really want. Then someone wrote a curriculum outline for the first third of a really solid neurobiology course. Whoever wrote that, bravo!

Mathematics and Computer Science: Some of the topics confuse me (computability theory? why?). The gist of this section is to give the biology students special math classes where the computer can do all the work for them. Hamming said, "The computer is an extension of the body, not the mind." (I found this quote in the preface to McNeil’s Interactive Data Analysis) I agree that biologists should be able to program, and should really understand what BLAST does. This would be better taught as a one or two semester course on numerical analysis and relevant search methods, plus data structures such as ropes to handle long sequences. And it should be taught in Scheme (or this for the ambitious), not in "higher-level languages such as Matlab, Perl, or C" as the report recommends. Everyone who knows programming languages just stopped taking the computing recommendations of the report seriously with that list.

Apparently students should be taught data analysis. I completely agree. Astonishingly, John Tukey wrote some great books for this back in the ’70’s because he was teaching people data analysis. And it’s not just biologists who need this. Every scientist would benefit. Call up your local statistics department and get them to institute an "interdisciplinary" course based on John Tukey’s two classics Exploratory Data Analysis and Data Analysis and Regression, or whatever modernized version they want to teach.

The recommended math comes down to the following classes in the math department: single and multivariable calculus, linear algebra, differential equations (more like Arnol’d’s book than what the engineers are taught), and probability and stochastic processes (this would be a great place to use Nelson’s Radically Elementary Probability Theory). There, three years of math, a semester of computer science, and a semester of data analysis. That’s not so bad, is it?

The report also mentions that medical school admissions requirements govern a lot of what biology departments cover. My approach: ignore them. Physics departments do. If the biologists ignore the MCATs completely, then they will change to follow the biology.

The recommended solutions for dealing with not having the expertise to teach a course are frightening: "…taught…by a collaborating team of faculty from multiple departments" or "A mathematician or computer scientist might also be invited to give a guest lecture or two." Two lectures isn’t long enough to have any real effect on students’ mental processes, and team teaching makes the course scattered and disorderly. I speak from experience. Biology courses are often taught this way, and it’s absolutely useless.

There is an obsession throughout with having a course on modeling and simulation. I believe the rationale is that all the ugly mathematics stuff can be put into this, taught by one of those egg-headed math people, and the biology professors can go on doing exactly as they have been.

At one point it says, "Opportunities to learn mathematical skills in a rich content context will enhance conceptual understanding and procedural fluency." No it won’t. Math professors should (and usually do not) ask students, "Why is this theorem interesting?" If a student has not reached a level of mental abstraction sufficient to answer such a question in the context of pure mathematics, the mathematics education has failed. Later it even recommends remedial math courses before calculus! What are you doing in college if you need that?

There is no ladder

There’s been a little cluster of blog entries on the erroneous idea that evolution is an increase of complexity from bacterium to man. I work in a pathogen lab, which means we’re all acutely aware that we’re not at the top of the food chain. But I’m going to go one step further and really annoy a lot of people.

There is no biological complexity.

The only definition of complexity I’m aware of which isn’t unobservable handwaving is the concept of algorithmic complexity in computer science, which refers to how much computational power is required to compute something, from computable in constant time, to NP-complete (give up and go home). But there’s a catch: the computational power is that required by a universal Turing machine or equivalent. And you can often approximate the solution to NP-complete problems with statistical approaches much, much more cheaply (there’s a nice sort of introduction to this at the Quantum Pontiff). I have no idea how to define the complexity class of being able to live off of iron ore in a cave, but I don’t think it matters, as we’re dealing with evolution, not a Turing machine. It’s a genetic algorithm, by its very nature. That puts it squarely in the statistical mechanics side of things. (I would appreciate it if an actual complexity theorist would slap me down if I’ve gotten totally confused here.)

The other definitions people try to use are usually from “software complexity,” which is an attempt to measure how much work programmers have done without having to know anything about the programs they wrote. To anecdotally understand how screwed up this can get, go read the Evolution of a Programmer (or for the categorically minded, the Evolution of a Haskell Programmer).

If we stop worrying about complexity, however, we can get some really nice results about how things scale. Stride length, leg length, and running for instance. Giraffes can run. Their legs are just short enough. Elephants can’t. They physically can’t maintain a gait which has all the legs off the ground. Their legs are too long, but they can shuffle so fast that it doesn’t much matter. The blue whale’s circulation is a convective cooling system: when they die they cook because their own body heat can’t diffuse fast enough into the surrounding water.

And when the scaling laws break down, then you’re on to something really interesting. A certain phage geneticist is following one of these leads: DNA viruses usually encode their own DNA polymerase because they want to operate at a higher error rate than their host…but small DNA viruses which don’t have space to encode a DNA polymerase still, as a population, mutate much faster than their host. Here’s a case where simple arguments from genome size fail miserably.

There is a Zen concept that scientists really need to adopt: mu. The answer to a question may not be yes or no. It may be mu: there is no answer. Has complexity increased over the course of evolution from bacterium to man? Mu! Go check your assumptions.

Math for biologists

Here’s a post which poses some questions about mathematics training for biologists. I’m close enough where I figured I’d try to answer:

  1. Are you a biologist, if so what kind? Sort of. My PhD will be in biology, on the response of mycobacteria to antibiotics. My undergraduate degrees were in physics and math.
  2. What math did you take in college? Calculus through multivariable, ordinary differential equations, a two semester course on math methods for physics students covering special functions, complex analysis through contour integration, and some partial differential equations, real analysis through the last course they shove the doctoral students through, the probability course that comes just after, a bit of functional analysis, and a two semester graduate sequence in algebra.
  3. What math do you or have you used? My image analysis requires some partial differential equations, a lot of calculus, and a bit of differential geometry. I do some theoretical biology on the side which particularly takes the probability and random processes, and partial differential equations as the limiting cases of the former. My bench work itself doesn’t take more than highschool algebra, but designing and analyzing my experiments requires statistics, which I’ve been teaching myself. Before this, when I was training to be a mathematical physicist, I obviously used everything I knew and it wasn’t enough.
  4. What math do you wish you’d studied? I wish I had actually endured the second year of graduate algebra, and a graduate course on combinatorics. A couple semesters of statistics would have saved me a lot of time now, but I can fill in the gaps. I really wish I had taken serious courses on differential geometry and topology.
  5. How do you use math in your job (or research)? See above.

I think a case could be made that biologists should take all the math required to take a full scale mathematical statistics course including multivariate and nonparametric methods, and a heavy dose of experimental design. Let’s see: real analysis through measure theory and Lebesgue integration and Stokes’s theorem on manifolds, a hefty course on probability from the axiomatic basis to the beginnings of random processes, and then blast through statistics because a t-test would only require about ten minutes and a homework problem, and so would a U-test. This is four years (two of analysis, one of probability, one of statistics), but it’s only one course a term.

Admittedly, biology departments would suddenly have as many undergraduates as physics departments.

ImageJ plugins

I’ve started building a new plugin system for ImageJ which will be much easier to work with.

Just to review, the current plugin system uses two interfaces (ij.plugin.PlugIn and ij.plugin.PlugInFilter) to implement these. The second is meant for implementing image processing algorithms. It defines two functions:

int setup(String cmd, ImagePlus imp)
All the setup work for the plugin is supposed to be done here. All parameters are passed through cmd, which is a holdover from ImageJ’s reprehensible macro language. The actual image to process is passed through imp.
void run(ImageProcessor ip)
After ImageJ has called the setup function, it calls run for each element of the image stack. ImageProcessors hold the actual pixel data for each image in the stack, which is wrapped up in the ImagePlus.

Note several obvious problems:

  • Plugins receive all their parameters from a string, so you have to write a mini-parser for each plugin. Everyone writes different parsers.
  • The ij.plugin.filter.PlugInFilter interface insists that setup returns an int (telling the plugin’s capabilities, which is reasonable), and run returns a void. How do you get the new image back?
  • If you expect to run the plugin from a menu instead of the macro language or directly from Java, you have to write an interface to get parameters from the user. Then you have to make it not appear when you’re called from the other two contexts.

A fix should automatically support all three contexts (Java, the macro language, and menus), without any additional programmer effort. This is impossible, but we can get fairly close, as I’ll describe next.

Intrinsic and Extrinsic Noise in Gene Expression

The question of noise in gene expression has been more and more important recently, both as an interesting question in its own right, and as a necessary piece of information for constructing random processes to describe other aspects of biology.

There is a problem with measuring noise, however: how much of it is actually noise associated with the components of a pathway operating in a thermal bath, and how much is cell to cell variation, the local density of a necessary enzymes, and other things that are extraneous to the pathway?

Enter two papers: Intrinsic and extrinsic contributions to stochasticity in gene expression by Swain, Elowitz, and Siggia and Stochastic gene expression in a single cell by Elowitz, Levine, Siggia, and Swain. The first is theoretical, the second a test of the theory in E. coli.

Take an ensemble of cells whose expression X of some gene we can monitor (say, with a fluorescent protein on the same promoter). We define the noise in X as

\eta^2(t)=\frac{\mbox{var}(X)}{E[X]}

We use the variance instead of the standard deviation because in most cases we are talking about small numbers of things, and this form gives us a good reference point: if X is Poisson, \eta^2 = 1.

We want to decompose this into something measuring the stochastic details of our process of interest and something measuring the variation due using this or that particular cell. We do this with a well known technique from the theory of linear models: Let \mathcal{I} and \mathcal{E} be the intrinsic and extrinsic variables respectively which together entirely determine the state of X. Then (bringing the mean over to the left hand side so we don’t have to carry it through)

\eta^2(t)\cdot E[X] = \mbox{var}(X) = E[X^2] – E[X]^2
  = E[E[X^2|\mathcal{E}]] – E[E[X|\mathcal{E}]^2] + E[E[X|\mathcal{E}]^2] – E[X]^2
  = E[\mbox{var}(X|\mathcal{E})] + E[E[X|\mathcal{E}]^2] – E[E[k|\mathcal{E}]]^2
  = E[\mbox{var}(X|\mathcal{E})] + E[E[X|\mathcal{E}]^2 – E[E[X|\mathcal{E}]]^2]
  = E[\mbox{var}(X|\mathcal{E})] + E[\mbox{var}(E[X|\mathcal{E}])]
  = E[\mbox{var}(X|\mathcal{E})] + \mbox{var}(E[X|\mathcal{E}])

The first term we can take as a measure of the intrinsic noise: \mbox{var}(X|\mathcal{E}) is the variance due to the intrinsic variables, and its expectation if just averaged over the ensemble of different cells (and their corresponding different extrinsic conditions).

The second term is a measure of the extrinsic noise. E[X|\mathcal{E}] is the average expression for a given set of extrinsic conditions (i.e., in one cell), and we take its variance across many cells, and this is purely the variation caused by changes in the extrinsic conditions.

The term we added and subtracted in our computation E[E[X|\mathcal{E}]^2] we can experimentally measure: put two different genes which we can assay on the same promoter in a given cell. They will have the same extrinsic conditions, and different intrinsic variables (since they will be in different parts of the heat bath). Let X_1 and X_2 be the different expressions for the two. Then

E[E[X|\mathcal{E}]^2] = \int dP(\mathcal{E}) (\int x dP(X=x|\mathcal{E}))^2
  = \int dP(\mathcal{E}) \int x_1 dP(X_1 = x_1|\mathcal{E}) \int x_2 dP(X_2 = x_2 |\mathcal{E})
  = E[E[X_1|\mathcal{E}] E[X_2|\mathcal{E}] ]

It is important to remember what is actually measurable here. We can measure mean levels of expression in individual cells. Thus \eta^2 is measurable, and we are trying to work backwards to the two noise terms. Consider the observable E[ E[X_1 - X_2 | \mathcal{E}]^2 ].

E[ E[X_1 - X_2|\mathcal{E}]^2 ] = E[ E[X_1|\mathcal{E}]^2] + E[E[X_2 | \mathcal{E}]^2] – 2E[E[X_1|\mathcal{E}]E[X_2|\mathcal{E}]]
  = E[E[X_1|\mathcal{E}]^2] + E[E[X_2|\mathcal{E}]^2] – 2E[E[X|\mathcal{E}]^2]

The first two terms are both direct observables as well, so we can find the last term: E[E[X|\mathcal{E}]^2] = \frac{1}{2}E[E[X_1|\mathcal{E}]^2] + E[X_2|\mathcal{E}]^2 – E[X_1 - X_2|\mathcal{E}]^2. Note that the other term in our extrinsic noise is an observable as well: in its original form, it is E[X]^2. We have the whole extrinsic noise now, and the total noise, and their difference is the intrinsic noise. Voilà!

Koch’s postulates II

(This is a continuations of Koch’s Postulates I)

Three serious issues remain with the Koch program for establishing that a given organism is the etiological agent of a disease: the accuracy of animal models, how to proceed in the absence of animal models, and how to define “etiologic agent” in more complex cases.

HIV has two animal models: mice, in which the virus does not cause AIDS, and chimpanzees, which can only be infected by the less virulent of the two major strains, and which are simply impractical for large studies. It can infect cells in tissue culture, but many times these cells are disrupted by the culturing process, and even if they are not, the larger physiology is lost.

Actually, HIV has fulfilled Koch’s postulates: laboratory workers who have accidentally been infected with various virus cultures develop precisely the same disease as normal transmission (see here). But this is not data you can intentionally collect. That was firmly established by the Nuremberg Code and the aftermath of the Tuskagee experiment.

Animal models go further afield. Some groups are studying broad spectrum pathogens in C. elegans, a tiny worm. Others are studying the same pathogens is systems closer to humans. What if the mouse model disagrees with the C. elegans model? There is no
general solution to how to select what is relevant from a given animal model. For instance, tuberculosis in mice causes macrophages to release strong bursts of nitric oxide. Some investigators find that human macrophages infected in a culture dish don’t release these same bursts. Is this an important difference or an artifact of culturing?

But what if there is truly no animal model available? Generally we fall back on a phantom “fifth postulate”: treat the disease in patients as if it were caused by the proposed organism. If the treatment works, then perhaps you have found the right organism.

This is medically sound, but scientifically inconclusive. Consider a bacterium which has become symbiotic with a virus. It carries the virus’s genome in its own, and when it infects a host, expresses that genome to create virus particles which it secretes into the host. These virus particles infect and destroy cells of the immune system which would control the bacterium.

When you study the disease, you see the immune system being destroyed, and search for what is doing it. You find the virus, and devise retrovirals against it. When you treat the patient, it controls the disease. The fifth postulate would say that the virus was the etiological agent of the disease.

The fifth postulate can be used to disprove connections, however. The bacteria and virus symbiosis I describe above bears a vague resemblance to one disproved idea about HIV, but we can show that HIV is not like this: an AIDS patient will recieve large quantities of antibiotics to fight off other diseases. Tuberculosis alone, one of
the major killers, requires administration of four powerful antibiotics at once. In short, HIV and any possible symbiont are exposed to every class of antibiotics known. Any bacterial symbiont should have been handily slaughtered by at least a subset of these.

But why can’t this show that HIV is the cause? What if there were another virus affected by the same retrovirals that we for some reason just weren’t noticing? There is no way to control for this short of isolating the HIV virus and showing that it, and it alone, is sufficient to cause disease — in short Koch’s program.

These same difficulties show up more strongly in what are termed microbiota shift diseases, diseases resulting from a change in the composition of the body’s native microbes. The standard example is bacterial vaginosis. There is an incredibly strong correlation between a certain shift in the bacterial population of the human vagina (from mostly Gram positive species to mostly Gram negative species) and possible minor symptoms like vaginal discharge and a bad odor, and premature births. If you treat with antibiotics that target Gram negative bacteria, you can restore the original balance, and remove the symptoms. But how do you even go about defining “etiological agent” in a complex of many species?

Koch’s Postulates I

How do we know that Mycobacterium tuberculosis actually causes tuberculosis? More generally, how do you prove that a given organism is the cause of a given disease? Robert Koch was the first to demonstrate such a connection, and set forth the logical foundations required in 1882 in his lecture Über die Ätiologie der Tuberkulose. I highly recommend reading the original (available here in English; I don’t have a link handy to the original German).

Modern expositions put forward the postulates in a didactic manner. Wikipedia’s is a fair sample:

  1. The organism must be found in all animals suffering from the disease, but not in healthy animals.
  2. The organism must be isolated from a diseased animal and grown in pure culture.
  3. The cultured organism should cause disease when introduced into a healthy animal.
  4. The organism must be reisolated from the experimentally infected animal.

Koch begins, “It was first necessary to determine if characteristic elements occurred in the diseased parts of the body, which do not belong to the constituents of the body, and which have not arisen from body constituents…and if they show any of the characteristics of independent organisms, such as motility, growth, reproduction, and fructification.” The example he gives is from anthrax: “If the blood of an animal dying of anthrax is examined, one finds in it a large number of regular, rod-shaped, colorless, immotile structures.

Koch does not state the second part of the postulate as it appears above. Tuberculosis can lie dormant in the lungs of a host for decades without causing illness. It was Koch who isolated the causative agent, Mycobacterium tuberculosis. He knew he could not insist upon such a condition.

There are other causes which can make such a requirement fail. A host may be infectious, but not ill. Typhoid Mary is the classic example of this. A given species may also not be universally disease causing. Vibrio cholerae, the bacterium which causes cholera, is generally a benign organism that lives in the water or quietly in aquatic animals. There are certain genes which it can acquire, however, which turn it into a scourge. Thus you are likely to find Vibrio cholera in many individuals, but in a form which is totally incapable of causing disease.

The next three correspond to an experimental technique. We have found some creature in the tissues of infected individuals, but how do we demonstrate that it is the causative agent? Koch’s answer is to separate “the parasite from the diseased organism, and from all of the products of the disease which could be subscribed to a disease-inducing influence, and then introducing the isolated parasite into healthy organisms and induce the disease anew with all its characteristic symptoms and properties.

Koch’s anthrax example is particularly clear. Anthrax will grow quite happily outside of a host if it is provided with nutrients, but the blood cells of the host do not, and all the chemical makeup of the blood besides cells don’t reproduce. If we take some blood from an infected animal, and spread it over a nutrient rich medium — a slice of potato, say, boiled to make it sterile — then the anthrax bacteria will grow, and the other components will not. If some of the resulting bacteria are spread on a new potato, and this is repeated again and again, then eventually the blood of the animal becomes negligible.

What if there are other bacteria which come along as well? The easiest way to rule this out is to examine what is growing on the potato under a microscope. If all you see are the “rod shaped…structures,” then you have some confidence that there are no other bacteria. Further, if you spread the bacteria very thinly on the new potato, you can get their density low enough where you get colonies that arise from only a few cells. If you have a second bacterial species, then you should get some colonies with only one and some with only the other as well as the mixed ones. If all your colonies still produce the same disease in another animal, then you can have some confidence that there is no other bacterium.

But what about a virus? With modern tools, you might separate the bacteria from any viruses by centrifuging them. The bacteria will settle at the bottom of the tube long before any viruses will. Then you can proceed with both separately. There may even be cases where you have to have both bacterium and virus in order to get the disease.

There are bacteria that won’t grow outside of a host. Mycobacterium leprae, a relative of tuberculosis, and the causative organism of leprosy is the classic example. It is cultured in the lab in the footpads of armadillos, as it has lost many of the genes it would need to grow outside of a host. How do you show this connection? Gerhard Hansen, the Norwegian physician who first found the bacterium, had trouble convincing others of his findings for this very reason.

Viruses cause similar problems. A virus cannot reproduce without a host. Generally viruses are maintained in the lab by letting them prey on cultures of cancer cells, but this makes the isolation much more difficult. However, in the century since Koch, our knowledge of the makeup of mammalian cells has grown enormously. Today a biochemist can with some confidence remove everything from such a culture of viruses and cancer cells except the virus.

There is one great weakness to this logical structure: you must have an animal model for the disease, a cheap one which lets you infect many individuals. Anthrax will happily infect most mammals. Tuberculosis is perfectly capable of killing mice, guinea pigs, and rabbits as well as humans. But HIV will infect only humans and, with difficulty, chimpanzees. How do you demonstrate that the virus is the cause then?

Next »