NCBI GEO: archive for high-throughput functional genomic data
- Barrett, Tanya
- Troup, Dennis B.
- Wilhite, Stephen E.
- Ledoux, Pierre
- Rudnev, Dmitry
- Evangelista, Carlos
- Kim, Irene F.
- Soboleva, Alexandra
- Tomashevsky, Maxim
- Marshall, Kimberly A.
- Phillippy, Katherine H.
- Sherman, Patti M.
- Muertter, Rolf N.
- Edgar, Ron
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as ‘Minimum Information About a Microarray Experiment’ (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.