Skip to content

Bioinformatics - Biostatistics & Computational Biology Greece

Narrow screen resolution Wide screen resolution Increase font size Decrease font size Default font size default color orange color green color
You are here: Home arrow Tutorials arrow Microarray Technology Basics arrow Affymetrix Quality control
Affymetrix Quality control PDF Print E-mail
Written by Triantafillos Paparountas   

QUALITY CONTROL

Generating high quality microarray data requires vigorous quality control measures at each individual step of the process, starting with the experimental design of the study, the generation of samples, extraction of RNA, labeling of the probe, and microarray hybridization. To minimize experimental variability, the same dedicated Research Assistants should perform all microarray studies, the same Affymetrix setup is used for hybridization, washing, and scanning, and microarrays of the same production lot are being used in comparative studies. For the following tutorial R and library affyQC was used to demonstrate the steps of Quality Control over Affymetrix chips.

 

RNA Quality Control

RNA is isolated using Trizol® according to the manufacturer's protocol and purified by phenol/chloroform extractions or RNAeasy columns®. Protocol use is dependent upon the available quantity of RNA from the extraction. However, the same protocol is used throughout an entire experiment.
RNA purity and yield are determined by optical density (OD) measurements at wavelengths of 260 and 280nm. The OD 260/280 ratio should be close to 2.0. Otherwise, the RNA will be re-purified.
Further evaluation of the RNA quality is done using the Agilent Bioanalyzer and Lab-on-a-Chip (Figure 1). Electropherograms are created that detect degradation (Figure 2) and measure the ribosomal 5S, 18S, and 28S bands (Figure 3). Ideally, the ratio of 28S/18S bands should be close to 2, but we do accept samples that show clear 18S and 28S peaks. RNA samples with a visible degree of degradation are not further processed.

Array Hybridization Quality Control
A general visual inspection of the entire chip, after scanning, is performed. There should be no white speckling, holes, smudges, areas of saturation or uneven hybridization on the chip.
Internal and external spiked in controls should maintain a 1:2 ratio between the 5’ and 3’probe sets. The internal controls are GAPDH and beta actin. The external controls are BioB, BioC, and CreX. The external controls should also increase in quantity, the latter being highest.
The measure of background noise (RawQ) should remain consistent across the experiment, meaning within ± 3 points of the median.
The scaling factor (SF) should remain consistent across the experiment. The scaling factor for each given experiment should be within a 2-3 fold range.

Statistical Quality Control
Histogram and box plot analysis of *.cel intensities are performed. Histograms are a good visualization tool for identifying saturation, which is seen as an additional peak at the highest log intensity in the plot. With a PMT setting of 100%, probe intensities reached saturation more frequently (See Figure 4 for arrays displaying saturation). Saturated probes are excluded from further analysis. Box plots are another good visualization tool for analyzing the overall intensities of all probes across the array. The box is drawn from the 25th and 75th percentiles in the distribution of intensities. The median, or 50th percentile, is drawn inside the box. The whiskers (lines extending from the box) describe the spread of the data. Arrays must be similar in range, or are otherwise discarded (See Figure 5 for an acceptable box plot).

The Association of Biomolecular Resource Facility (ABRF) conducted a multicenter study in 2002 to identify factors that contribute to variability in oligonucleotide microarray results. This retrospective study used data from 835 MG-U74A and HG-U95A Affymetrix arrays that were previously generated in the microarray core facilities of the members of the Microarray Research Group (MARG) (Knudtson et. al; Factors contributing to variability in DNA microarray results: the ABRF Microarray Research Group 2002 study. J. Biomol. Tech. 2002; 108).

The results of this study indicate that:
• Lab-to-lab variation accounted for the greatest source of error in this Affymetrix study. This suggests that .CHP data generated by different institutions may not be easily compared without further normalization in comparative analyses.
• The observed variance in the signals for the exogenous control spikes suggest that the controls may not be an adequate tool to normalize data for comparison analysis. It had been previously assumed that these values should be independent of sample and array type.
• Biological reproducibility should be demonstrated by repeating each experiment a minimum of 3 or 4 times with different extracts of the same type.
• Systematic, reproducible errors can be minimized by applying various algorithms which serves to improve the average reproducibility from ~77% to ~93%.
However, the caveat is that one should not try to rescue a poor hybridization result with mathematical manipulations!

 

Image
Figure 1 QC report page 1 The first page simple list the names of the arrays and assigns an index number to be used in future plotting. The names taken from the data set by use of the sampleNames method of the affy package. These sample names and indexes are also listed on several other plots. An example is shown in Fig. 1. The plot is generated with the following command. R> titlePage(Dilution)

 

Image
Figure 2,3 QC report page 2 The second page consists of two plots. The first is a boxplot plot of the all pm intensities and the second plot consists of kernel density estimates of these intensities. Both of these methods are defined in the affy package. These plots are useful assessing the overall signal quality for the arrays. Any array with a low average intensity or a significantly different shaped density would be suspect. An example is shown in Fig. 3. The plot is generated with the following command. R> signalDist(Dilution)
 

 

Image
Figure 4 QC Report page 3 The third page is the QC plot from the simpleaffy package. This plot shows the 30 : 50 ratio for spiked-in and control genes specific to the array type. Additionally the percentage of present gene calls and background levels are given. An example is shown in Fig. 4.The plot is described in detail in the document QC and Affymetrix data included in the simpleaffy documentation. The following is an excerpt from that document describing the plot. The figure is plotted from the bottom up with the first chip being at the base of the diagram and the last chip in the QCStats object at the top. If the standard steps for generating a QCStats object are followed, then this corresponds to the order of your samples in the AffyBatch object. Dotted horizontal lines separate the plot into rows, one for each chip. Dotted vertical lines provide a scale from -3 to 3. Each row shows the %present, average background, scale factors and GAPDH / _-actin ratios for an individual chip.
 



• GAPDH 30 : 50 values are plotted as circles. According to Affymetrix they should be about 1. GAPDH values that are considered potential outlier (ratio > 1.25) are coloured red, otherwise they are blue.


• β-actin, 30 : 50 ratios are plotted as triangles. Because this is a longer gene, the recommendation is for the 30 : 50 ratios to be below 3; values below 3 are coloured blue, those above, red.

• The blue stripe in the image represents the range where scale factors are within 3-fold of the mean for all chips. Scale factors are plotted as a line from the centre line of the image. A line to the left corresponds to a down-scaling, to the right, to an up-scaling. If any scale factors fall outside this ´ S3-fold regionˇ S, they are all coloured red, otherwise they are blue.

• % present and average background, are listed to left of the figure. plot is generated with the following command.
R>plot(qc(Dilution))



 

Image
Figure 5,6 QC Report page 4 The next two pages are generated by analyzing the positive and negative control elements on the outer edges of the Affymetrix arrays. For each array the intensities for all border elements are collected. Elements with an intensity greater the 1.2 times the mean for that group are assumed to be positive controls. Elements with a signal less that 0.8 of the mean are assumed to be negative controls. This method of separation into positive and negative controls is used so that exact details of the arrangement of these elements is not required. Elements falling in between these cut offs are not used in further calculations. The fourth page consists of box plots of the positive and negative elements. The means and standard deviations of the intensities for each array should be comparable. Large variations in the positive control can indicate non-uniform hybridization or gridding problems. Variations in the negative controls indicate background fluctuations. The plot (shown in Fig. 5) is generated with the following command. R> borderQC1(Dilution)

 

Image
Figure 7,8 QC Report page 5 As a further test, the elements are separated based on which edge of the array they are located. The mean values for the left, right, top, and bottom elements are calculated for positive and negative controls. Once the elements are separated into positive and negative controls, and further divided by the four locations, the center of intensity (COI) for the controls is calculated. If the hybridization is uniform across the array, the location the COI for the positive elements will be located at the physical center of the array. Any spatial variations in the hybridization, such as those caused by a bubble being present during hybridization, will cause the COI to move from center. Another cause to the COI being off center is a slight misalignment of the grid used to determine the cell intensities. The COI is plotted on a relative scale where the point (0,0) is the center and 1 and -1 represent the edges of the array. Some variation to the COI is expected but an array with visible intensity variations stands out in these plots as an outlier. Any array that where the COI has coordinate with and magnitude greater that 0.5 is flagged by labeling the data point with the array index. A similar plot is made for the negative controls. This plot is a measure of the uniformity of the background across the array. Again arrays where the COI has coordinate with and magnitude greater that 0.5 is flagged. An example is shown in Fig. 7,8. The plot is generated with the following command. R> borderQC2(Dilution)
 

 

Image
Report page 6 The sixth page is a heat map of the array-array Spearman rank correlation coefficients. The arrays are ordered using the phenotypic data (if available) in order to place arrays with similar samples adjacent to each other. Self-self correlations are on the diagonal and by definition have a correlation coefficient of 1.0. Data from similar tissues or treatments will tend to have higher coefficients. This plot is useful for detecting outliers, failed hybridizations, or mistracked samples. See in Fig. 6 for an example. Of course caution must be used in deciding if an array should be discarded, because the differences in the expression patterns might be due to interesting biology, not a processing error. The plot is generated with the following command. R> correlationPlot(Dilution)


This tutorial was based on the affyQCReport manual of Craig Parman and Conrad Hallin. The affyQCReport is part of the R-Bioconductor project





Reddit!Del.icio.us!Google!Live!Facebook!Slashdot!Netscape!Technorati!StumbleUpon!Newsvine!Furl!Yahoo!Smarking!Ma.gnolia!Add this social bookmarking functionality to your website! title=
 
Next >

Login

Register to submit news and reviews





Lost Password?
No account yet? Register
Your email address will not be sold.

Advertisements!