XLDB Workshop Goes International

The SciDB team. From left, back row: Roman Simakov, Hideaki Kimura, Kian-Tat Lim, Emad Soroush, Daniel Wang; front row: Pavel Velikhov, Jennie Rogers. (Photo courtesy of Oleg Bartunov.)

The third annual Extremely Large Databases Workshop took place in Lyon, France, in late August, 2009—the first year that the workshop has taken place away from its roots at SLAC. SLAC database engineer Jacek Becla and his team members Kian-Tat Lim and Daniel Wang organized the event, co-locating it with the 35th Very Large Database research conference. This year's XLDB Workshop focused on reaching out to non-U.S. communities and scientific communities, such as geoscience, radio astronomy and biology, that have been underrepresented at the past workshops.

Two years ago, Becla established the invitation-only Extremely Large Databases Workshop series to bring together scientific and commercial users of extremely large databases, two groups that Becla said previously had very little contact on this issue. He also invited members of academia and database manufacturers to discuss advances in database technology and, most importantly, let them hear about problems that users are having or features they'd like to see in future database engines.

Previous XLDB workshops led to the creation of a project to build a new open-source database engine, called SciDB, geared specifically toward complex scientific analytics at extremely large scales. When SciDB is released, Becla said, it will revolutionize the way scientific analyses are done. The SciDB project has already attracted more than 20 database professors and engineers world-wide who are collaboratively designing and building the software. These include database giants Michael Stonebraker and David DeWitt, who pioneered database research and helped create technologies such as those used in today's automatic teller machines. The group demonstrated an early prototype of SciDB to several hundred people at the Very Large Database conference and again at the XLDB workshop.

"The system we are building is very different [than commercially manufactured databases]," Becla said. "We finally understand well what science's needs are, and we are building an engine that will fully address these needs, taking advantage of numerous commonalities between how different science domains want to ultimately analyze their data sets."

Today's extremely large systems are measured in petabytes. One petabyte, or one million billion bytes is non-trivial to manage. As one blogger explained, if one letter of text represented one byte, a petabyte-long line of letters in a typical font size would stretch from the Earth to the Sun ten times. As unfathomable as that number is, science experiments such as BaBar and internet companies such as Google or eBay are already producing petabytes of data, and the software must keep up. Becla's team, who built the BaBar database are now responsible for the design of the Large Synoptic Survey Telescope database, estimates that the LSST will generate more than one hundred petabytes of data.

More and more scientific disciplines are acquiring such massive amounts of data and need software to store, share and process it, but commercial database manufacturers don't produce systems that can handle these extremely large datasets at reasonable cost and performance levels. As a result, XLDBs are typically built in-house using custom software, making them more expensive to develop and maintain. Becla began the workshop to provide a forum for sharing solutions to these problems.

The XLDB workshops include very few presentations. Instead, attendees come prepared to discuss in open forum specific issues in the field. The results of this kind of discussion have been tremendously positive, Becla said, and, as the open source SciDB database system shows, highly productive.

—Calla Cofield
SLAC Today, September 16, 2009

Handy Links

SLAC News Center

SLAC Today

SLAC News

Lab News

SLAC Links

Stanford

Around the Bay

XLDB Workshop Goes International



Handy Links SLAC News Center News Center home page SLAC Today SLAC Today Subscribe Archives: Feb 2006-May 20, 2011 Archives: May 23, 2011 and later Submit Feedback or Story Ideas About SLAC Today SLAC News SSRL Headlines symmetry magazine TIP Archives Lab News Interactions Lightsources.org ILC NewsLine Int'l Science Grid This Week Fermilab Today Berkeley Lab News @brookhaven TODAY DOE Pulse CERN Courier DESY inForm US / LHC SLAC Links Emergency Safety Policy Repository Site Entry Form Site Maps M & O Review Computing Status & Calendar SLAC Colloquium SLACspeak SLACspace SLAC Logo Café Menu Flea Market Web E-mail Marguerite Shuttle Discount Commuter Passes Award Reporting Form SPIRES SciDoc Activity Groups Library Stanford Stanford University Stanford Report Stanford Events Life on Campus Around the Bay Bay Area Traffic Bay Area Weather Caltrain BART	XLDB Workshop Goes International The SciDB team. From left, back row: Roman Simakov, Hideaki Kimura, Kian-Tat Lim, Emad Soroush, Daniel Wang; front row: Pavel Velikhov, Jennie Rogers. (Photo courtesy of Oleg Bartunov.) The third annual Extremely Large Databases Workshop took place in Lyon, France, in late August, 2009—the first year that the workshop has taken place away from its roots at SLAC. SLAC database engineer Jacek Becla and his team members Kian-Tat Lim and Daniel Wang organized the event, co-locating it with the 35th Very Large Database research conference. This year's XLDB Workshop focused on reaching out to non-U.S. communities and scientific communities, such as geoscience, radio astronomy and biology, that have been underrepresented at the past workshops. Two years ago, Becla established the invitation-only Extremely Large Databases Workshop series to bring together scientific and commercial users of extremely large databases, two groups that Becla said previously had very little contact on this issue. He also invited members of academia and database manufacturers to discuss advances in database technology and, most importantly, let them hear about problems that users are having or features they'd like to see in future database engines. Previous XLDB workshops led to the creation of a project to build a new open-source database engine, called SciDB, geared specifically toward complex scientific analytics at extremely large scales. When SciDB is released, Becla said, it will revolutionize the way scientific analyses are done. The SciDB project has already attracted more than 20 database professors and engineers world-wide who are collaboratively designing and building the software. These include database giants Michael Stonebraker and David DeWitt, who pioneered database research and helped create technologies such as those used in today's automatic teller machines. The group demonstrated an early prototype of SciDB to several hundred people at the Very Large Database conference and again at the XLDB workshop. "The system we are building is very different [than commercially manufactured databases]," Becla said. "We finally understand well what science's needs are, and we are building an engine that will fully address these needs, taking advantage of numerous commonalities between how different science domains want to ultimately analyze their data sets." Today's extremely large systems are measured in petabytes. One petabyte, or one million billion bytes is non-trivial to manage. As one blogger explained, if one letter of text represented one byte, a petabyte-long line of letters in a typical font size would stretch from the Earth to the Sun ten times. As unfathomable as that number is, science experiments such as BaBar and internet companies such as Google or eBay are already producing petabytes of data, and the software must keep up. Becla's team, who built the BaBar database are now responsible for the design of the Large Synoptic Survey Telescope database, estimates that the LSST will generate more than one hundred petabytes of data. More and more scientific disciplines are acquiring such massive amounts of data and need software to store, share and process it, but commercial database manufacturers don't produce systems that can handle these extremely large datasets at reasonable cost and performance levels. As a result, XLDBs are typically built in-house using custom software, making them more expensive to develop and maintain. Becla began the workshop to provide a forum for sharing solutions to these problems. The XLDB workshops include very few presentations. Instead, attendees come prepared to discuss in open forum specific issues in the field. The results of this kind of discussion have been tremendously positive, Becla said, and, as the open source SciDB database system shows, highly productive. —Calla Cofield SLAC Today, September 16, 2009