Back to Overview Page
CR, 02/03

ChipCheck - A program calculating total hybridization equilibria for small oligonucleotide microarrays

- An Introduction -

1. DNA Oligonucleotide Microarrays
DNA microarrays or 'DNA chips' are arrays of oligonucleotides immobilized on surfaces.  Figure 1 shows schematically what a DNA chip is and how it functions.  Spots bearing oligonucleotide strands of defined sequence (one sequence per spot) are arrayed on a flat surface.  Upon incubation with a sample to be analyzed, target DNA strands in solution bind to complementary strands on the surface via Watson-Crick base pairing.  Usually, the target DNA is labeled with a fluorophore.  After the incubation, excess unbound target DNA is washed away, and the chip is scanned for fluorescence at each spot.

DNA-Chip displayed in three expansion levels

Currently, the most important application for DNA chips is gene expression analysis or gene expression profiling.  For this, total mRNA is isolated from a tissue or cell culture of interest, reverse transcribed into cDNA, and PCR amplified with concurrent fluorophore labeling, yielding the sample to be analyzed.

2. ChipCheck Calculations
A hybridization experiment with a DNA chip is a rather complex exercise in molecular recognition.  Every strand of immobilized DNA has to bind selectively to its target DNA in a sea of competitors that are also DNA and differ only somewhat in sequence.  Necessarily, the binding (hybridization) is not 100% sequence selective.  Strands that are not fully Watson-Crick complementary, but contain complementary stretches, can still form duplexes, leading to false positive signals.  Also, some duplexes may not form despite full Watson-Crick complementarity, since the G/C-content of the sequence is too low to lead to a stable duplex under the conditions chosen (false negative).
Figure 2 shows some of the terms used in the context of this work in graphical form, including the formation of mismatched duplexes leading to false positive signals.  The term "probe" for the strands immobilized on the surface and "target" for those strands that are to be detected is important and therefore highlighted here explicitly.

Schematic display of hybridization taking place on a DNA-chip

When the fluorescence signal from several strands that are only partially complementary to an immobilized probe add up, a significant signal can result.  These low level false positive signals that are invariably found in DNA chip experiments with complex samples are usually treated as "noise", making it impossible to detect small quantities of the true analyte.  Sometimes, the expression of genes encoding for proteins with low copy numbers in the cell is critical, however, and the loss of this signal can be a very serious setback for chip experiments.

ChipCheck calculates how much binding one can expect for a given sample on a DNA chip, taking into account the affinity of every strand sequence for any given probe sequence.  For this, the binding constants for each pair of immobilized strands and target strands have to be known.  With these binding constants one can then calculate where the target DNA introduced with the sample to be analyzed goes.  Figure 3 shows a cartoon that is meant to highlight the computational task: calculating where what fraction of each target sequence goes.

Cartoon of binding equilibria

The way ChipCheck arrives at its results is iterative, meaning, it starts with an arbitrary distribution, calculates a score for it and then changes the distribution of target strands, recalculates the score etc. until it has found the best solution that satisfies all binding constants.  In fact, the process is a little more complex, as outlined in the accompanying manuscript, but the general idea is about right.

3. What needs to be entered to run ChipCheck calculations.
Thanks to the brilliant work of a number of other research groups, it is now possible to calculate the binding constant for the formation of a duplex between two oligonucleotide strands with good accuracy, even if the two strands are not fully Watson-Crick complementary.  To the best of our knowledge, the research group of Prof. SantaLucia, building on earlier work on Profs. Beslauer and Turner and many others, has compiled the most detailed database of parameters that allow the calculation of these binding constants.  If one wants to run a simulation of a hybridization experiment involving a DNA chip with ChipCheck, one needs to following information:

  1. the Temperature at which the hybridization is to be simulated
  2. the Volume of the hybridization solution
  3. the Sequences of all Target Strands and their Concentration
  4. the Sequences of the Probe Strands and the Number of probe molecules on the surface of the respective spot
  5. the Free Standard Enthalpy of Binding (ΔG°) for all target/probe combinations, from which the binding constants can be calculated.

While 1.-4. will be readily accessible to anyone who wants to run a simulation, the data for 5. will have to be obtained from suitable programs, such as HyTher.  Melting is an alternative program that can be used free of charge via the web, but we have not tested it extensively.  To obtain the ΔG° values, please follow the links provided on the Data Entry Page.  Currently, obtaining the data for the full matrix of probe/target interactions is time consuming.  Semi-automatic modes of extracting the data have been successful for us, but we strongly recommend that you contact the owners of the respective pages, if you plan to make extensive use of their service.  It is planned to equip future versions of ChipCheck with a module that calculates the binding constants.

4. Limitations
Needless to say, the current simulations include many approximations and will never 100% faithfully simulate a true experiment.  The binding constants calculated are only approximations based on parameters obtained from other sequences.  The true binding constants will also depend on the density of the probes on the surface of the chip (electrostatic repulsion), the integrity of the probes, the mode of immobilizing the strands etc.  Further, and perhaps most importantly, a simulation on a high density chip with thousands of probes has not been done by us, so we do not know how the algorithm will perform under these conditions, nor whether generating hundreds of thousands of binding constants via the web is doable within a reasonable amount of time or not.  One should also remember that target-to-target hybridization and folding are currently being ignored, and unspecific adsorption on glass surfaces and other sources of noise in a real life experiment cannot be simulated.  Still, we believe that the relative extent of duplex formation on the spots of a chip should be predictable with a reasonable extent of accuracy, if non-hybridization-dependent errors are insignificant.

5. Performance checks and more: What you may gain
We believe that ChipCheck and other programs that calculate total hybridization equilibria will provide important information.  If one knows what 'noise' and what 'true signal' one can expect, one may:

-  design more accurate DNA chips where false positives/negatives are reduced, and
-  achieve higher sensitivity with large arrays.

If you want to analyze the results of real life hybridization experiment, where you do not know the true concentration of all target strands, you cannot do this with the current form of ChipCheck.  However, entering the concentrations of all targets obtained from your experiment, together with the other information, outlined above under "3." will allow you to get a first impression of cross-hybridization, though its extent is, of course, not fully accurate, since the concentrations obtained are themselves tainted by false positive/negative contributions.

Future version of ChipCheck may contain a full analysis module, where intensities of the fluorescence signals may be entered and concentrations be obtained as results.  The manuscript on ChipCheck submitted does outline the mathematical feasibility of such calculations.
Even in its current form, ChipCheck provides you with a good glimpse at how your chip will respond to a given sample and will hopefully allow you to avoid unnecessary experiments and alert you to the level of fidelity you can expect for your hybridizations. 

Back to Overview Page