Archive for February 2007

Intrinsic and Extrinsic Noise in Gene Expression

The question of noise in gene expression has been more and more important recently, both as an interesting question in its own right, and as a necessary piece of information for constructing random processes to describe other aspects of biology.

There is a problem with measuring noise, however: how much of it is actually noise associated with the components of a pathway operating in a thermal bath, and how much is cell to cell variation, the local density of a necessary enzymes, and other things that are extraneous to the pathway?

Enter two papers: Intrinsic and extrinsic contributions to stochasticity in gene expression by Swain, Elowitz, and Siggia and Stochastic gene expression in a single cell by Elowitz, Levine, Siggia, and Swain. The first is theoretical, the second a test of the theory in E. coli.

Take an ensemble of cells whose expression X of some gene we can monitor (say, with a fluorescent protein on the same promoter). We define the noise in X as

\eta^2(t)=\frac{\mbox{var}(X)}{E[X]}

We use the variance instead of the standard deviation because in most cases we are talking about small numbers of things, and this form gives us a good reference point: if X is Poisson, \eta^2 = 1.

We want to decompose this into something measuring the stochastic details of our process of interest and something measuring the variation due using this or that particular cell. We do this with a well known technique from the theory of linear models: Let \mathcal{I} and \mathcal{E} be the intrinsic and extrinsic variables respectively which together entirely determine the state of X. Then (bringing the mean over to the left hand side so we don’t have to carry it through)

\eta^2(t)\cdot E[X] = \mbox{var}(X) = E[X^2] - E[X]^2
  = E[E[X^2|\mathcal{E}]] - E[E[X|\mathcal{E}]^2] + E[E[X|\mathcal{E}]^2] - E[X]^2
  = E[\mbox{var}(X|\mathcal{E})] + E[E[X|\mathcal{E}]^2] - E[E[k|\mathcal{E}]]^2
  = E[\mbox{var}(X|\mathcal{E})] + E[E[X|\mathcal{E}]^2 - E[E[X|\mathcal{E}]]^2]
  = E[\mbox{var}(X|\mathcal{E})] + E[\mbox{var}(E[X|\mathcal{E}])]
  = E[\mbox{var}(X|\mathcal{E})] + \mbox{var}(E[X|\mathcal{E}])

The first term we can take as a measure of the intrinsic noise: \mbox{var}(X|\mathcal{E}) is the variance due to the intrinsic variables, and its expectation if just averaged over the ensemble of different cells (and their corresponding different extrinsic conditions).

The second term is a measure of the extrinsic noise. E[X|\mathcal{E}] is the average expression for a given set of extrinsic conditions (i.e., in one cell), and we take its variance across many cells, and this is purely the variation caused by changes in the extrinsic conditions.

The term we added and subtracted in our computation E[E[X|\mathcal{E}]^2] we can experimentally measure: put two different genes which we can assay on the same promoter in a given cell. They will have the same extrinsic conditions, and different intrinsic variables (since they will be in different parts of the heat bath). Let X_1 and X_2 be the different expressions for the two. Then

E[E[X|\mathcal{E}]^2] = \int dP(\mathcal{E}) (\int x dP(X=x|\mathcal{E}))^2
  = \int dP(\mathcal{E}) \int x_1 dP(X_1 = x_1|\mathcal{E}) \int x_2 dP(X_2 = x_2 |\mathcal{E})
  = E[E[X_1|\mathcal{E}] E[X_2|\mathcal{E}] ]

It is important to remember what is actually measurable here. We can measure mean levels of expression in individual cells. Thus \eta^2 is measurable, and we are trying to work backwards to the two noise terms. Consider the observable E[ E[X_1 - X_2 | \mathcal{E}]^2 ].

E[ E[X_1 - X_2|\mathcal{E}]^2 ] = E[ E[X_1|\mathcal{E}]^2] + E[E[X_2 | \mathcal{E}]^2] - 2E[E[X_1|\mathcal{E}]E[X_2|\mathcal{E}]]
  = E[E[X_1|\mathcal{E}]^2] + E[E[X_2|\mathcal{E}]^2] - 2E[E[X|\mathcal{E}]^2]

The first two terms are both direct observables as well, so we can find the last term: E[E[X|\mathcal{E}]^2] = \frac{1}{2}E[E[X_1|\mathcal{E}]^2] + E[X_2|\mathcal{E}]^2 - E[X_1 - X_2|\mathcal{E}]^2. Note that the other term in our extrinsic noise is an observable as well: in its original form, it is E[X]^2. We have the whole extrinsic noise now, and the total noise, and their difference is the intrinsic noise. Voilà!

Koch’s postulates II

(This is a continuations of Koch’s Postulates I)

Three serious issues remain with the Koch program for establishing that a given organism is the etiological agent of a disease: the accuracy of animal models, how to proceed in the absence of animal models, and how to define “etiologic agent” in more complex cases.

HIV has two animal models: mice, in which the virus does not cause AIDS, and chimpanzees, which can only be infected by the less virulent of the two major strains, and which are simply impractical for large studies. It can infect cells in tissue culture, but many times these cells are disrupted by the culturing process, and even if they are not, the larger physiology is lost.

Actually, HIV has fulfilled Koch’s postulates: laboratory workers who have accidentally been infected with various virus cultures develop precisely the same disease as normal transmission (see here). But this is not data you can intentionally collect. That was firmly established by the Nuremberg Code and the aftermath of the Tuskagee experiment.

Animal models go further afield. Some groups are studying broad spectrum pathogens in C. elegans, a tiny worm. Others are studying the same pathogens is systems closer to humans. What if the mouse model disagrees with the C. elegans model? There is no
general solution to how to select what is relevant from a given animal model. For instance, tuberculosis in mice causes macrophages to release strong bursts of nitric oxide. Some investigators find that human macrophages infected in a culture dish don’t release these same bursts. Is this an important difference or an artifact of culturing?

But what if there is truly no animal model available? Generally we fall back on a phantom “fifth postulate”: treat the disease in patients as if it were caused by the proposed organism. If the treatment works, then perhaps you have found the right organism.

This is medically sound, but scientifically inconclusive. Consider a bacterium which has become symbiotic with a virus. It carries the virus’s genome in its own, and when it infects a host, expresses that genome to create virus particles which it secretes into the host. These virus particles infect and destroy cells of the immune system which would control the bacterium.

When you study the disease, you see the immune system being destroyed, and search for what is doing it. You find the virus, and devise retrovirals against it. When you treat the patient, it controls the disease. The fifth postulate would say that the virus was the etiological agent of the disease.

The fifth postulate can be used to disprove connections, however. The bacteria and virus symbiosis I describe above bears a vague resemblance to one disproved idea about HIV, but we can show that HIV is not like this: an AIDS patient will recieve large quantities of antibiotics to fight off other diseases. Tuberculosis alone, one of
the major killers, requires administration of four powerful antibiotics at once. In short, HIV and any possible symbiont are exposed to every class of antibiotics known. Any bacterial symbiont should have been handily slaughtered by at least a subset of these.

But why can’t this show that HIV is the cause? What if there were another virus affected by the same retrovirals that we for some reason just weren’t noticing? There is no way to control for this short of isolating the HIV virus and showing that it, and it alone, is sufficient to cause disease — in short Koch’s program.

These same difficulties show up more strongly in what are termed microbiota shift diseases, diseases resulting from a change in the composition of the body’s native microbes. The standard example is bacterial vaginosis. There is an incredibly strong correlation between a certain shift in the bacterial population of the human vagina (from mostly Gram positive species to mostly Gram negative species) and possible minor symptoms like vaginal discharge and a bad odor, and premature births. If you treat with antibiotics that target Gram negative bacteria, you can restore the original balance, and remove the symptoms. But how do you even go about defining “etiological agent” in a complex of many species?

Physics “proofs”

Fortunately, I had just finished a good dinner before I read this. Otherwise I would have gone through the roof:

His response chilled me: “maybe it’s time mathematics started accepting string theory proofs as valid.”

This is from the mouth of a mathematics grad student who did his undergraduate physics. I might make various kind arguments. I could make some statement about forging ahead and filling in the gaps if the way proved useful.

I won’t.

Instead, I will call deny the student the title of mathematician, for he is ignorant of his heritage. Mathematics underwent a great formalization during the end of the 19th century and beginning of the 20th. The great framework erected during the 18th and early 19th century on “physics proofs” (for that is the origin of that standard of proof) kept falling to pieces, and the mathematics community undertook to try to fix the structural deficencies.

The problems turned out to be huge, and yielded whole fields of mathematics. Cantor’s set paradoxes led down a path that gave us Lebesgue integration (which lets us define integrals on any measurable space), Hilbert’s program, and Gödel’s theorems.

Fractal geometry sprang from the same abyss and provides the operating principle in place of analytic functions when systems become scale invariant — which happens surprisingly often.

Physics proofs consist of empirical techniques, a set of intuitive shortcuts that seem to work in a number of common cases. The history of mathematics provides us with one overwhelming empirical principle governing physics proofs: don’t layer them more than one or two deep or you’ll see oceans as infinitesimal drops, and you may well drown.