SLAC Today logo

XLDB2 Conference Brings Together Industry, Theory

Databases that hold petabytes (a million billion bytes) of information are the focus of the second annual Extremely Large Databases Workshop taking place today at SLAC. Scientific experiments such as BaBar previously boasted the largest databases on Earth, but now they stand neck-and-neck with industry groups including Google and eBay. Science and industry representatives will meet, along with database theorists and vendors, to discuss commonalities in the race to manage the world's ever-growing deluge of data.

Although science has previously been ahead of industry in database size, industry has caught up, moving ahead quickly with the advantage of extensive financial resources. Conference organizer Jacek Becla said there is a lack of communication between science and industry when it comes to building XLDBs. Becla built the database for the BaBar experiment and is now building one for the Large Synoptic Survey Telescope. He hopes the conference will open up discussion between the two groups and help them find commonalities. "It's a two way street," said Kian-Tat Lim, one of the co-organizers. "Both sides have something they can learn from the other."

Last year's meeting brought together science and industry representatives for the first time to talk about XLDBs. "It helped everybody to realize all of us are facing very similar problems," said Becla.

One common challenge for XLDBs is managing large numbers of machines. One machine would take years to analyze a petabyte of data, so thousands are needed. Adding machines multiplies the chances for a machine failure, however. Analysis needs to continue even when the underlying computers cannot. These basic road blocks make XLDBs very different than those holding a few orders of magnitude less information.

Plans are circulating in the XLDB community to create an open-source extremely large database management system. This system would provide basic software code for building an extremely large database. It would be optimized for complex analytical tasks common to science and industry. "Right now, anyone building an extremely large data base essentially starts from scratch," Becla said. "An open-source database would build a common infrastructure that could be used by everyone else."

A handful of vendors including IBM and Oracle will attend the conference. Becla said these vendors are more likely to eventually produce commercial XLDBs if they can see that there is common and widespread demand. In addition, academic database theorists will contribute to the discussion, and Becla hopes this will strengthen connections between database theory and real-world applications.

SLAC offers uniquely neutral ground for hosting the conference. While industry representatives might hesitate to share ideas at a conference hosted by one of their competitors, government laboratories aren't gaining profits from the results of the meeting, making private companies more willing to participate.

"This year we've had to turn away a lot of people," said Becla. The conference is built around open discussions, and for that reason it will host no more than 70 attendees. The organizers tried to include representatives from all data-intensive areas of industry and science. Anyone who cannot attend the conference can read about the proceedings in the report Becla and his group will produce. Last year's report is available online.

—Calla Cofield
SLAC Today, September 29, 2008