The steps required in a microarray experiment Uses and types Many types of arrays exist and the broadest distinction is whether they are spatially arranged on a surface or
on coded beads: • The traditional solid-phase array is a collection of orderly microscopic “spots”, called features, each with thousands of identical and specific probes attached to a solid surface, such as glass, plastic or silicon biochip
(commonly known as a genome chip, DNA chip or gene array).
One technique used to produce oligonucleotide arrays include photolithographic synthesis (Affymetrix) on a silica substrate where light and light-sensitive masking agents
are used to “build” a sequence one nucleotide at a time across the entire array.
 Other methods permit analysis of data consisting of a low number of biological or technical replicates; for example, the Local Pooled Error (LPE) test pools standard
deviations of genes with similar expression levels in an effort to compensate for insufficient replication.
• Hypothesis-driven statistical analysis: Identification of statistically significant changes in gene expression are commonly identified using the t-test, ANOVA, Bayesian
methodMann–Whitney test methods tailored to microarray data sets, which take into account multiple comparisons or cluster analysis.
Although absolute levels of gene expression may be determined in the two-color array in rare instances, the relative differences in expression among different spots within
a sample and between samples is the preferred method of data analysis for the two-color system.
Microarrays and bioinformatics The advent of inexpensive microarray experiments created several specific bioinformatics challenges: the multiple levels of
replication in experimental design (Experimental design); the number of platforms and independent groups and data format (Standardization); the statistical treatment of the data (Data analysis); mapping each probe to the mRNA transcript that
it measures (Annotation); the sheer volume of data and the ability to share it (Data warehousing).
In single-channel microarrays or one-color microarrays, the arrays provide intensity data for each probe or probe set indicating a relative level of hybridization with the
Suppose samples need to be compared: then the number of experiments required using the two channel arrays quickly becomes unfeasible, unless a sample is used as a reference.
One strength of the single-dye system lies in the fact that an aberrant sample cannot affect the raw data derived from other samples, because each array chip is exposed to
only one sample (as opposed to a two-color system in which a single low-quality sample may drastically impinge on overall data precision even if the other sample was of high quality).
These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA (also called anti-sense RNA) sample (called target) under high-stringency
 • Dimensional reduction: Analysts often reduce the number of dimensions (genes) prior to data analysis.
Various grass-roots open-source projects are trying to ease the exchange and analysis of data produced with non-proprietary chips: For example, the “Minimum Information About
a Microarray Experiment” (MIAME) checklist helps define the level of detail that should exist and is being adopted by many journals as a requirement for the submission of papers incorporating microarray results.
 These methods assess statistical power based on the variation present in the data and the number of experimental replicates, and can help minimize Type I and type II
errors in the analyses.
• Network-based methods: Statistical methods that take the underlying structure of gene networks into account, representing either associative or causative interactions or
dependencies among gene products.
Examples of unsupervised analyses methods include self-organizing maps, neural gas, k-means cluster analyses, hierarchical cluster analysis, Genomic Signal Processing
based clustering and model-based cluster analysis.
see MA plot), and log-transformation of ratios, global or local normalization of intensity ratios, and segmentation into different copy number regions using step detection
 This type of approach is not hypothesis-driven, but rather is based on iterative pattern recognition or statistical learning methods to find an “optimal” number of clusters
in the data.
Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome.
Sequences may be longer (60-mer probes such as the Agilent design) or shorter (25-mer probes produced by Affymetrix) depending on the desired purpose; longer probes are more
specific to individual target genes, shorter probes may be spotted in higher density across the array and are cheaper to manufacture.
It is also used for the identification of structural variations and the measurement of gene expression.
The resulting “grid” of probes represents the nucleic acid profiles of the prepared probes and is ready to receive complementary cDNA or cRNA “targets” derived from experimental
or clinical samples.
Input data for class prediction are usually based on filtered lists of genes which are predictive of class, determined using classical hypothesis tests (next section), Gini
diversity index, or information gain (entropy).
 This is an example of a DNA microarray experiment which includes details for a particular case to better explain DNA microarray experiments, while listing modifications
for RNA or other alternative experiments.
Another benefit is that data are more easily compared to arrays from different experiments as long as batch effects have been accounted for.
Data analysis National Center for Toxicological Research scientist reviews microarray data Main article: Microarray analysis techniques See also: Gene chip analysis
Microarray data sets are commonly very large, and analytical precision is influenced by a number of variables.
Oligonucleotide arrays are produced by printing short oligonucleotide sequences designed to represent a single gene or family of gene splice-variants by synthesizing this
sequence directly onto the array surface instead of depositing intact sequences.
 • Class discovery analysis: This analytic approach, sometimes called unsupervised classification or knowledge discovery, tries to identify whether microarrays (objects,
patients, mice, etc.)
During knowledge discovery analysis, various unsupervised classification techniques can be employed with DNA microarray data to identify novel clusters (classes) of arrays.
In addition, mRNAs may experience amplification bias that is sequence or molecule-specific.
 The input data used in class discovery analyses are commonly based on lists of genes having high informativeness (low noise) based on low values of the coefficient of
variation or high values of Shannon entropy, etc.
Although oligonucleotide probes are often used in “spotted” microarrays, the term “oligonucleotide array” most often refers to a specific technique of manufacturing.
 Two-channel vs. one-channel detection Diagram of typical dual-colour microarray experiment Two-color microarrays or two-channel microarrays are typically hybridized
with cDNA prepared from two samples to be compared (e.g.
 Unlike microarrays, which need a reference genome and transcriptome to be available before the microarray itself can be designed, RNA-Seq can also be used for new
model organisms whose genome has not been sequenced yet.
Identifying naturally existing groups of objects (microarrays or genes) which cluster together can enable the discovery of new groups that otherwise were not previously known
A number of open-source data warehousing solutions, such as InterMine and BioMart, have been created for the specific purpose of integrating diverse biological datasets, and
also support analysis.
The determination of the most likely or optimal number of clusters obtained from an unsupervised analysis is called cluster validity.
Publications exist which indicate in-house spotted microarrays may not provide the same level of sensitivity compared to commercial oligonucleotide arrays, possibly owing
to the small batch sizes and reduced printing efficiencies when compared to industrial manufactures of oligo arrays.
two RNA samples obtained from each experimental unit) may help to quantitate precision.
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface.
More recently, Maskless Array Synthesis from NimbleGen Systems has combined flexibility with large numbers of probes.
 Fabrication Microarrays can be manufactured in different ways, depending on the number of probes under examination, costs, customization requirements, and the type
of scientific question being asked.
 Algorithms that affect statistical analysis include: • Image analysis: gridding, spot recognition of the scanned image (segmentation algorithm), removal or marking of
poor-quality and low-intensity features (called flagging).
 Each applicable probe is selectively “unmasked” prior to bathing the array in a solution of a single nucleotide, then a masking reaction takes place and the next set
of probes are unmasked in preparation for a different nucleotide exposure.
The labelled fragments bind to an ordered array of complementary oligonucleotides, and measurement of fluorescent intensity across the array indicates the abundance of a predetermined
set of sequences.
These sequences are typically specifically chosen to report on genes of interest within the organism’s genome.
Third, spots of each cDNA clone or oligonucleotide are present as replicates (at least duplicates) on the microarray slide, to provide a measure of technical precision in
This provides a relatively low-cost microarray that may be customized for each study, and avoids the costs of purchasing often more expensive commercial arrays that may represent
vast numbers of genes that are not of interest to the investigator.
For some of these methods the user also has to define a distance measure between pairs of objects.
• Class prediction analysis: This approach, called supervised classification, establishes the basis for developing a predictive model into which future unknown test objects
can be input in order to predict the most likely class membership of the test objects.
Normalization methods may be suited to specific platforms and, in the case of commercial platforms, the analysis may be proprietary.
 Weighted gene co-expression network analysis is widely used for identifying co-expression modules and intramodular hub genes.
[‘1. Taub, Floyd (1983). “Laboratory methods: Sequential comparative hybridizations analyzed by computerized image processing can identify and quantitate regulated RNAs”. DNA. 2 (4): 309–327. doi:10.1089/dna.1983.2.309. PMID 6198132.
2. ^ Adomas A;
Heller G; Olson A; Osborne J; Karlsson M; Nahalkova J; Van Zyl L; Sederoff R; Stenlid J; Finlay R; Asiegbu FO (2008). “Comparative analysis of transcript abundance in Pinus sylvestris after challenge with a saprotrophic, pathogenic or mutualistic
fungus”. Tree Physiol. 28 (6): 885–897. doi:10.1093/treephys/28.6.885. PMID 18381269.
3. ^ Pollack JR; Perou CM; Alizadeh AA; Eisen MB; Pergamenschikov A; Williams CF; Jeffrey SS; Botstein D; Brown PO (1999). “Genome-wide analysis of DNA copy-number
changes using cDNA microarrays”. Nat Genet. 23 (1): 41–46. doi:10.1038/12640. PMID 10471496. S2CID 997032.
4. ^ Moran G; Stokes C; Thewes S; Hube B; Coleman DC; Sullivan D (2004). “Comparative genomics using Candida albicans DNA microarrays reveals
absence and divergence of virulence-associated genes in Candida dubliniensis”. Microbiology. 150 (Pt 10): 3363–3382. doi:10.1099/mic.0.27221-0. PMID 15470115.
5. ^ Hacia JG; Fan JB; Ryder O; Jin L; Edgemon K; Ghandour G; Mayer RA; Sun B; Hsie L;
Robbins CM; Brody LC; Wang D; Lander ES; Lipshutz R; Fodor SP; Collins FS (1999). “Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays”. Nat Genet. 22 (2): 164–167. doi:10.1038/9674.
PMID 10369258. S2CID 41718227.
6. ^ Jump up to:a b c Gagna, Claude E.; Lambert, W. Clark (1 May 2009). “Novel multistranded, alternative, plasmid and helical transitional DNA and RNA microarrays: implications for therapeutics”. Pharmacogenomics.
10 (5): 895–914. doi:10.2217/pgs.09.27. ISSN 1744-8042. PMID 19450135.
7. ^ Jump up to:a b c Gagna, Claude E.; Clark Lambert, W. (1 March 2007). “Cell biology, chemogenomics and chemoproteomics – application to drug discovery”. Expert Opinion on
Drug Discovery. 2 (3): 381–401. doi:10.1517/17460418.104.22.1681. ISSN 1746-0441. PMID 23484648. S2CID 41959328.
8. ^ Mukherjee, Anirban; Vasquez, Karen M. (1 August 2011). “Triplex technology in studies of DNA damage, DNA repair, and mutagenesis”.
Biochimie. 93 (8): 1197–1208. doi:10.1016/j.biochi.2011.04.001. ISSN 1638-6183. PMC 3545518. PMID 21501652.
9. ^ Rhodes, Daniela; Lipps, Hans J. (15 October 2015). “G-quadruplexes and their regulatory roles in biology”. Nucleic Acids Research. 43
(18): 8627–8637. doi:10.1093/nar/gkv862. ISSN 1362-4962. PMC 4605312. PMID 26350216.
10. ^ Rasheed, Awais; Hao, Yuanfeng; Xia, Xianchun; Khan, Awais; Xu, Yunbi; Varshney, Rajeev K.; He, Zhonghu (2017). “Crop Breeding Chips and Genotyping Platforms:
Progress, Challenges, and Perspectives”. Molecular Plant. Chin Acad Sci+Chin Soc Plant Bio+Shanghai Inst Bio Sci (Elsevier). 10 (8): 1047–1064. doi:10.1016/j.molp.2017.06.008. ISSN 1674-2052. PMID 28669791. S2CID 33780984.
11. ^ J Biochem Biophys
Methods. 2000 Mar 16;42(3):105–10. DNA-printing: utilization of a standard inkjet printer for the transfer of nucleic acids to solid supports. Goldmann T, Gonzalez JS.
12. ^ Lausted C; et al. (2004). “POSaM: a fast, flexible, open-source, inkjet
oligonucleotide synthesizer and microarrayer”. Genome Biology. 5 (8): R58. doi:10.1186/gb-2004-5-8-r58. PMC 507883. PMID 15287980.
13. ^ Bammler T, Beyer RP; Consortium, Members of the Toxicogenomics Research; Kerr, X; Jing, LX; Lapidus, S; Lasarev,
DA; Paules, RS; Li, JL; Phillips, SO (2005). “Standardizing global gene expression analysis between laboratories and across platforms”. Nat Methods. 2 (5): 351–356. doi:10.1038/nmeth754. PMID 15846362. S2CID 195368323.
14. ^ Pease AC; Solas D; Sullivan
EJ; Cronin MT; Holmes CP; Fodor SP (1994). “Light-generated oligonucleotide arrays for rapid DNA sequence analysis”. PNAS. 91 (11): 5022–5026. Bibcode:1994PNAS…91.5022P. doi:10.1073/pnas.91.11.5022. PMC 43922. PMID 8197176.
15. ^ Nuwaysir EF;
Huang W; Albert TJ; Singh J; Nuwaysir K; Pitas A; Richmond T; Gorski T; Berg JP; Ballin J; McCormick M; Norton J; Pollock T; Sumwalt T; Butcher L; Porter D; Molla M; Hall C; Blattner F; Sussman MR; Wallace RL; Cerrina F; Green RD (2002). “Gene Expression
Analysis Using Oligonucleotide Arrays Produced by Maskless Photolithography”. Genome Res. 12 (11): 1749–1755. doi:10.1101/gr.362402. PMC 187555. PMID 12421762.
16. ^ Shalon D; Smith SJ; Brown PO (1996). “A DNA microarray system for analyzing complex
DNA samples using two-color fluorescent probe hybridization”. Genome Res. 6 (7): 639–645. doi:10.1101/gr.6.7.639. PMID 8796352.
17. ^ Tang T; François N; Glatigny A; Agier N; Mucchielli MH; Aggerbeck L; Delacroix H (2007). “Expression ratio evaluation
in two-colour microarray experiments is significantly improved by correcting image misalignment”. Bioinformatics. 23 (20): 2686–2691. doi:10.1093/bioinformatics/btm399. PMID 17698492.
18. ^ Shafee, Thomas; Lowe, Rohan (2017). “Eukaryotic and prokaryotic
gene structure”. WikiJournal of Medicine. 4 (1). doi:10.15347/wjm/2017.002. ISSN 2002-4436.
19. ^ Churchill, GA (2002). “Fundamentals of experimental design for cDNA microarrays” (PDF). Nature Genetics. supplement. 32: 490–5. doi:10.1038/ng1031.
PMID 12454643. S2CID 15412245. Archived from the original (PDF) on 8 May 2005. Retrieved 12 December 2013.
20. ^ NCTR Center for Toxicoinformatics – MAQC Project
21. ^ “Prosigna | Prosigna algorithm”. prosigna.com. Retrieved 22 June 2017.
Little, M.A.; Jones, N.S. (2011). “Generalized Methods and Solvers for Piecewise Constant Signals: Part I” (PDF). Proceedings of the Royal Society A. 467 (2135): 3088–3114. doi:10.1098/rspa.2010.0671. PMC 3191861. PMID 22003312.
23. ^ Jump up to:a
b c Peterson, Leif E. (2013). Classification Analysis of DNA Microarrays. John Wiley and Sons. ISBN 978-0-470-17081-6.
24. ^ De Souto M et al. (2008) Clustering cancer gene expression data: a comparative study, BMC Bioinformatics, 9(497).
Jaskowiak, Pablo A; Campello, Ricardo JGB; Costa, Ivan G (2014). “On the selection of appropriate distances for gene expression data clustering”. BMC Bioinformatics. 15 (Suppl 2): S2. doi:10.1186/1471-2105-15-S2-S2. PMC 4072854. PMID 24564555.
Bolshakova N, Azuaje F (2003) Cluster validation techniques for genome expression data, Signal Processing, Vol. 83, pp. 825–833.
27. ^ Ben Gal, I.; Shani, A.; Gohr, A.; Grau, J.; Arviv, S.; Shmilovici, A.; Posch, S.; Grosse, I. (2005). “Identification
of transcription factor binding sites with variable-order Bayesian networks”. Bioinformatics. 21 (11): 2657–2666. doi:10.1093/bioinformatics/bti410. ISSN 1367-4803. PMID 15797905.
28. ^ Yuk Fai Leung and Duccio Cavalieri, Fundamentals of cDNA microarray
data analysis. Trends in Genetics Vol.19 No.11 November 2003.
29. ^ Priness I.; Maimon O.; Ben-Gal I. (2007). “Evaluation of gene-expression clustering via mutual information distance measure”. BMC Bioinformatics. 8 (1): 111. doi:10.1186/1471-2105-8-111.
PMC 1858704. PMID 17397530.
30. ^ Wei C; Li J; Bumgarner RE (2004). “Sample size for detecting differentially expressed genes in microarray experiments”. BMC Genomics. 5: 87. doi:10.1186/1471-2164-5-87. PMC 533874. PMID 15533245.
31. ^ Emmert-Streib,
F. & Dehmer, M. (2008). Analysis of Microarray Data A Network-Based Approach. Wiley-VCH. ISBN 978-3-527-31822-3.
32. ^ Wouters L; Gõhlmann HW; Bijnens L; Kass SU; Molenberghs G; Lewi PJ (2003). “Graphical exploration of gene expression data: a comparative
study of three multivariate methods”. Biometrics. 59 (4): 1131–1139. CiteSeerX 10.1.1.730.3670. doi:10.1111/j.0006-341X.2003.00130.x. PMID 14969494. S2CID 16248921.
33. ^ Jain N; Thatte J; Braciale T; Ley K; O’Connell M; Lee JK (2003). “Local-pooled-error
test for identifying differentially expressed genes with a small number of replicated microarrays”. Bioinformatics. 19 (15): 1945–1951. doi:10.1093/bioinformatics/btg264. PMID 14555628.
34. ^ Barbosa-Morais, N. L.; Dunning, M. J.; Samarajiwa, S.
A.; Darot, J. F. J.; Ritchie, M. E.; Lynch, A. G.; Tavare, S. (18 November 2009). “A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data”. Nucleic Acids Research. 38 (3): e17. doi:10.1093/nar/gkp942.
PMC 2817484. PMID 19923232.
35. ^ Mortazavi, Ali; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold (July 2008). “Mapping and quantifying mammalian transcriptomes by RNA-Seq”. Nat Methods. 5 (7): 621–628. doi:10.1038/nmeth.1226. ISSN
1548-7091. PMID 18516045. S2CID 205418589.
36. ^ Jump up to:a b Wang, Zhong; Mark Gerstein; Michael Snyder (January 2009). “RNA-Seq: a revolutionary tool for transcriptomics”. Nat Rev Genet. 10 (1): 57–63. doi:10.1038/nrg2484. ISSN 1471-0056. PMC
2949280. PMID 19015660.
Photo credit: https://www.flickr.com/photos/mar1lyn84/2641979584/’]