Indiana University
  •  
  •  
  •  

Project: Data Capacitor

Primary UITS contact: Stephen Simms

Last update: August 12, 2008

Description: This project, creating a Data Capacitor and a Metadata/Web Services server, addresses two clear and widespread challenges: the need to store and manipulate large amounts of data for short periods of time (hours to several days) and the need for reliable and unambiguous publication, discovery, and utilization of data via the web. For more, see the Data Capacitor web site.

The Data Capacitor is both a system and a project, funded by the National Science Foundation with significant matching funds from Indiana University.

Progress and research possibilities in many disciplines have been fundamentally changed by the abundance of data now so rapidly produced by advanced digital instruments. Scientists face the present challenge of drawing out from these data the information and meaning contained within. IU has established a significant cyberinfrastructure composed of high performance computing systems, archival storage systems, and advanced visualization systems spanning two main campuses in Indianapolis and Bloomington, and connected to national and international networks. This institution enhances its infrastructure in ways that will result in qualitative changes in the research capabilities and discovery opportunities of a broad array of scientists who work with large data sets.

As a project, the Data Capacitor creates a large-capacity, short-term data store with very fast I/O and the Metadata/Web Services server. Research and development efforts at IU will create the tools required for the Data Capacitor to be used to its fullest. The Data Capacitor is expected to become a development platform and testbed for new cyberinfrastructure, as well as a proof of concept for large-capacity, short-term storage devices. On the other hand, the Metadata/Web Services server enables the institution to establish a leadership position in standards-based data dissemination in many fields.

At SC07, the Data Capacitor was used to demonstrate fast data transfers across great distances using the Lustre filesystem. It is the Data Capacitor's wide-area capability that an Indiana University-led team utilized to win this year's international bandwidth challenge competition.

As a computing system, the vital statistics of the Data Capacitor are as follows:

  • Primary use: Short term storage
  • Operating system/file system: RHEL 4, Lustre 1.4.10
  • Peak theoretical processing capability: 1.54 teraflops
  • Achieved maximum Linpack performance: Not applicable
  • Total system RAM: 128GB
  • Total disk storage: 1PB
  • Total archival storage: Not applicable
  • Processor:
    • Dual-core 3.0GHz Xeon processor
    • Four floating point operations per clock cycle per core
  • Types of nodes:
    • Object-based Storage Target
    • Dual-core 3.0GHz Xeon processor
    • Four floating point operations per clock cycle per core
  • Metadata Server:
    • Dual-core 3.0GHz Xeon processor
    • Four floating point operations per clock cycle per core
  • Numbers of nodes:
    • 28 Object-based Storage Target
    • 4 MetaDataservers
  • Internal network: One gigabit private
  • Connections to external network: 284 gigabit to E1200
  • Date of acquisition: Delivered May 2006; accepted October 2006
  • Accessible to: Local IU users
  • Cooling: Enclosed, self-contained water-cooled racks (from Ritall, Inc.)
  • Further information: Peak 14.5GBps I/O rate

Outcome and benefits: Progress and research possibilities in many disciplines have been fundamentally changed by the abundance of data now so rapidly produced by advanced digital instruments. A critical challenge facing scientists is to draw out from these data the information and meaning that they contain. The Data Capacitor provides researchers with a 535TB file system to temporarily store and manipulate large data sets. Because the file system can be mounted in multiple places, it is possible for the Data Capacitor to play a role in every step of the data life cycle, from acquisition or creation, through computation and visualization, to archive storage. Because of its size, the Data Capacitor will help even out mismatches between the rate of data production and the rate of data analysis, much the way a capacitor evens the flow of electrons in a circuit. Because of its aggregate 14.5GBps write rate, the Data Capacitor can keep up with even the most tenacious data firehose.

Client impact: In addition to the Co-PIs and SIs listed below, this project benefits faculty members at IU involved in the following projects:

  • Center for Genomics and Bioinformatics

    The Center for Genomics and Bioinformatics is a multidisciplinary research center serving the IUB campus. The CGB carries out independent research in genomics and bioinformatics, collaborates with and/or assists projects developed by IUB faculty, and promotes interdepartmental and interdisciplinary interactions to enhance genomics and bioinformatics at IUB.

  • Computational Biology and Bioinformatics

    Located in the Center for Computational Biology and Bioinformatics at IUPUI, Sean Mooney's laboratory is focused on research and training in bioinformatics and computational biology. Specifically, the lab's research interests aim to characterize and predict the effects of genetic variation.

  • Computational Chemistry

    James P. Reilly's laboratory focuses on research in efficient biomolecular ion production, proteomics, photochemistry of peptide ions, protein structure and cellular fingerprinting, and novel time-of-flight instrumentation.

  • Computational Fluid Dynamics Laboratory

    The Computational Fluid Dynamics Laboratory was established in 1986 within the Department of Mechanical Engineering to conduct research and develop software in the areas of computational fluid dynamics and heat transfer. Current research projects include the finite element and finite volume solution of three-dimensional flow problems; high speed compressible flow calculations for internal and external flows; unsteady flow computations; moving body flows with unstructured meshes; parallel computing; load balancing for parallel computing on parallel processors and network of workstations; and high-performance grid computing.

  • Internet Traffic Analysis

    This project studies the infrastructure scalability and vulnerabilities of expanding communication networks, by means of analyzing the statistical behavioral patterns that emerge and are observable in Internet traffic data. The idea is that such analysis may lead to robust design/planning/management tools as well as methods for mitigating and/or immunizing against attacks by early detection of anomalous patterns correlated with malicious behavior. The networks considered span a very broad range of scale, from individual interactions (e.g., social engineering, phishing, covert communication) to application-specific flows (e.g., spam, email, and web-based DDoS) to global-scale Internet traffic networks (e.g., Internet2 peer networks and worms).

  • Linked Environments for Atmospheric Discovery

    Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point, supported by high-performance computing systems. With LEAD, meteorologists, researchers, educators, and students are no longer passive bystanders or limited to static data or pre-generated images, but rather they are active participants who can acquire and process their own data.

  • Platform for Computational Comparative Genomics on the Web

    PLATCOM is an integrated system for the comparative analysis of multiple genomes. It is designed in a modular way, so that multiple tools and databases can be integrated freely and the whole system can grow easily. The PLATCOM system is built on internal databases, which consist of GenBank, Swiss-Prot, COG, KEGG, and Pairwise Comparison Database (PCDB). PCDB is a derived database from GenBank built by performing pairwise comparison of protein-to-protein and whole genome-to-whole genome sequences with FASTA and BLASTZ, respectively. Currently it contains 48,205 entries of unduplicated protein-to-protein and whole genome-to-whole genome pairwise comparison matches. PCDB is designed to incorporate newer genomes automatically, so that PLATCOM evolves as new genomes become available. Over these databases, a suite of genome analysis applications is provided.

  • Polar Grid

    Polar Grid is an NSF MRI funded partnership of Indiana University and Elizabeth City State University to acquire and deploy the computing infrastructure needed to investigate the urgent problems in glacial melting.

  • Proteomics at IU

    The Proteomics Core Facility at the IU School of Medicine opened in the fall of 2001 in the Department of Biochemistry and Molecular Biology. It is a component of the INGEN cores supported by Indiana Genomics Initiative (INGEN). The Proteomics Core Facility became the academic component of the Indiana Centers for Applied Protein Sciences (INCAPS) in May 2004 and was renamed the Protein Analysis and Research Center. It is a service and collaborative research resource that balances applied proteomics research with the development of new and improved methods for protein identification, characterization, and quantification. The Center encourages collaborations that apply the tools of proteomics to cutting-edge biomedical research. For more, see the article Honing the Proteome in Research & Creative Activity.

  • WIYN Observatory

    The WIYN Telescope, a 3.5-meter instrument employing many technological breakthroughs, is the newest and second largest telescope on Kitt Peak. The WIYN Observatory (pronounced "win") is owned and operated by the WIYN Consortium, which consists of the University of Wisconsin, IU, Yale University, and the National Optical Astronomy Observatories (NOAO). Most of the capital costs of the observatory, which amounted to $14 million, were provided by these universities, while NOAO, which operates the other telescopes of the Kitt Peak National Observatory, provides most of the operating services. This partnership between public and private universities and NOAO is the first of its kind. The universities benefit from access to a well-run observatory at an excellent site, and the larger astronomical community served by NOAO benefits from the addition of this large, state-of-the-art telescope to Kitt Peak's array of telescopes.

  • X-ray Crystallography

    The Indiana University Molecular Structure Center (IUMSC) is a service and research facility in the Department of Chemistry at IUB. The laboratory has a full complement of single crystal and powder diffraction equipment used to characterize crystalline materials using the techniques of X-ray crystallography. Researchers in the laboratory can determine the three-dimensional structure of nearly any material that can be crystallized. A crystallographic study produces a set of atomic coordinates that locate the atoms of a molecule in the "unit cell" of the crystal. This information can then be used to generate images of the molecule and to determine distances and angles in the molecule. In addition, the data allows one to examine the packing of the molecules in the crystal, information which can often lead to understanding the properties of the material. IUMSC Server allows rapid access to the data generated in the IUMSC. Nearly all of the materials studied have been synthesized or isolated by researchers from other laboratories, usually within the IU system, but often from laboratories throughout the world.

Project sponsor: Craig Stewart, Associate Dean for Research Technologies

Project team:

  • Stephen Simms
  • Joshua Walgenbach
  • Justin Miller
  • Nathan Heald

Additional information

  • PI: Craig Stewart
  • Co-PIs: Randall Bramley, Catherine A. Pilachowski, Beth Plale, Stephen Simms
  • SIs: P. Cherbas, S. Chien, D. Clemmer, M. Davy, A. Dzierba, G.C. Fox, K. Kallback-Rose, M. Gupta, D. Hart, K. Honeycutt, J. Huang, J. Huffman, S. Kim, A. Lumsdaine, F. Menczer, S. Mooney, M. Palakal, J. Paolillo, P. Radivojac, J. Reilly, H. Tang, E. Wernert, B. Wheeler, D. Durisen, H. Cohn, R. Payli
  • Funding agency and grant number: NSF CNS0521433
  • Grant dates: October 1, 2005-October 1, 2008
  • Funding to UITS: $1,720,000
  • Total funding to IU related to this project: $1,720,000