Computational Resources at the IGSP
Integrated environment, from laptop to high-performance compute cluster
IGSP's information systems are designed to minimize time spent on staging data for analysis and maximize efficiency of analysis. In practical terms this means that data access is continuous throughout the Institute, and datasets visible on a networked laptop are also available to thousands of CPU cores on the Duke Shared Cluster Resource (DSCR), a high-performance computational resource shared by researchers across the university.
IGSP's DNA Microarray Core, Proteomics Core, and Sequencing Core facilities use the centralized storage, so that IGSP scientists can easily acquire large datasets and set them up for analysis. The Microarray and Proteomics Cores use the "Express" data repository for data distribution and analysis for major projects. "Express" has been developed by IGSP programmers to ease data production activities in the cores and provide a true repository for data of abiding scientific interest. The system has been used to automate data storage and analysis for several major studies.
Generous and secure storage
In terms of raw storage capacity, Duke's IGSP has the third largest data storage system in the Duke University and Health System enterprise. Data is backed up to disc, and mirrored to separate locations for disaster recovery. Currently, individuals are granted 25 gigabytes of backed up storage space, and labs have access to 150 gigabytes of backed up storage space. Labs and projects that require more storage can purchase additional storage to be added to the existing storage controllers and systems. The storage is designed to ease data sharing among the Institute's researchers.
IGSP has four separate installations of NetApp FAS 3070/3020 series filers in three Duke locations. Disc shelves attached to the filers use NetApp's fibre channel architecture for high-performance storage and SATA disc for more capacious, moderate-performance storage.
Processing power fit for wide range of projects
Computational power is tailored to fit researchers' requirements, and the infrastructure handles large and small projects. The infrastructure is designed to be flexible, with open access to IGSP researchers on computational servers outfitted with a broad range of bioinformatics and application development tools. Software not currently on IGSP machines can be installed on request.
As of July 2009, five 8 CPU-core Intel machines, each fitted with 32 gigabytes of RAM, are available to researchers with regular and basic demand for computation. Additional computational resources are available by arrangement for more computationally intensive projects, such as high-throughput gene expression microarray or sequence analysis. Access to these dedicated devices are restricted to specific research groups. Special provisions have been made on both the storage and the computational infrastructure for protected sensitive electronic information, such as datasets that fall under HIPAA regulation.
The IGSP computation and core infrastructure uses Dell 1850 and 1950 series 1U machines, Dell 1955 and M600 blade/enclosure systems and a Dell R900 device for proteomics analysis.
High performance computation is executed on the Duke Shared Cluster Resource, a compute cluster of over 3,000 CPU-cores. This cluster is directly connected to IGSP's storage infrastructure via dedicated 10 GigE fibre, allowing for easy staging of large datasets. The cluster has all commonly used software, and systems administrators will install additional software on request. IGSP is a major contributor to the DSCR and is in the process of bringing online additional computational servers funded by the NIH (grant number 1S10RR025590-01) and the North Carolina Biotechnology Center (grant number 2009-IDG-1002).
The IGSP computational infrastructure is Linux-based, since Linux is a widely adopted and very reliable platform for computational biologists. Use of open source software is encouraged, though projects also use proprietary software when it fits their research needs.
Immediately available bioinformatics software and development infrastructure
Commonly used software for sequence analysis, gene expression analysis, and proteomics is available to all researchers. The IT infrastructure is particularly well suited for application development by IGSP researchers, and a significant number of computational and software-development projects are underway, ranging from software for specialized analysis to enterprise-wide data management and data analysis systems.
Talented core IT staff
IGSP's seven-member IT staff includes individuals trained and certified in Oracle and MySQL databases, systems administration, information security, and bioinformatics. Half of the staff hold advanced degrees. Programming and database staff have extensive training and experience in biology labs. Staff have education and work experience at a broad range of organizations, including the European Bioinformatics Institute (EBI), Duke, Northwestern, Western Kentucky University, and the Rochester Institute of Technology.



