The informatics core at the YCMD specializes in all aspects of data manipulation, from data capture to hit selection. The services below can be incorporated into a full screening campaign or mixed and matched as needed.
Raw Data Extraction:
Low content readout:
Plates are read on the plate reader and raw numerical readouts (one per well) are provided in spreadsheet format.
High content readout:
Plates are read on the confocal microscope, images are captured for each well and raw numerical readouts are provided in spreadsheet format.
- Well level data:
The table will contain one row per well. The number of readouts per well will depend on the image analysis.
- Cell level data:
The table will contain one row per cell in the captured image(s). The number of readouts per cell will depend on the image analysis.
Images are available in two format as follows:
1) Grey scale with one image per channel
2) Color overlays of all the channels of each captured image into a single image.
Screen Robustness Assessment:
Robustness statistics are generated at the plate, run, replicate and/or screen level. These statistics determine the quality and reliability of the results. Examples of available statistics are:
- Correlations between replicate sets are checked to ensure the reliability of the data.
- Average, median, standard deviation, median absolute deviation, coefficient of variation of each screen readout per group on the plates (negative and positive controls, samples, etc.)
- Z’ factor: determines the performance of the controls
- Z factor: determines the performance of then samples
- SSMD: determines the performance of the controls and/or samples
Data are provided in spreadsheet format with one row per plate/run/replicate/screen and as many statistics as requested.
Data normalization services are available to transform raw plate readouts for comparison across screen runs and replicates. Numerous data normalization methods can be applied based on the assay characteristics.
The normalization methods currently available are:
1) With controls directly related to the assay readout: percent inhibition of control, percent of control.
2) With or without controls: percent of sample, Z Score, Robust Z Score, SSMD, Robust SSMD, B score.
Other customized normalization methods can also be applied if desired. Data are provided in spreadsheet format with one row per well and as many normalizations as requested.
Genes producing the desired phenotype can be selected out of the sample pool via various hit selection method such as rank based selection or threshold based selection. When screens are run in replicates, various additional options are available. Hits can be selected based on their average score across the replicates or their replicate count across the replicates. Depending on the nature of the screen, inhibitor and/or potentiator genes are selected out of the sample pool.
Data are provided in spreadsheet format with one row per hit gene with the corresponding raw and normalized data.
Genes of interest are labeled with one or more of the following annotations:
- Statistically significant canonical pathways
- Functional networks
- GO biological processes, molecular functions and/or cellular components
Gene annotations are used to inform the hit selection process in order to produce a final hit list of genes relevant to the process under investigation.
Data are provided in a report containing all requested annotations for each hit gene.
HTCB Informatics also provides expert level cheminformatics support, exploiting the synergies between compound profiling results (chemical activators and inhibitors) and gene hits from RNAi campaigns. Combining data sets, we can overlay novel screening data with, for example, known protein/tissue interaction data and can speed the discovery of novel biomarkers.
The Center has deployed a highly scalable bioinformatics hardware infrastructure housed within a modern industrial-scale data center. The data center is supported by layered security (physical, hardware, software), redundant power supplies, and advanced environmental controls. The Center has deployed databases and data processing applications that, for the purposes of high content imaging and analysis, primarily utilize a range of high-performance IBM servers. Auxiliary computational resources are available for high-throughput image analysis via a Center-administered Linux cluster as well as the Yale High Performance Computing Center, the latter a massive cluster of over one thousand CPUs. Computer disk space, always a concern with high content imaging, is available in ample quantities to satisfy present and future needs; the currently deployed architecture is quickly scalable to 80 TB on existing disk controllers and equipment, and can easily grow with minor enhancements.
Informatics services rates vary according to numerous parameters such as the number of plates read, the number of images generated per well, the number of channels per image, the number readouts, and the number of transformations applies to the data, etc. Please Michael Kinch to pricing information.