From ketchup storage to data storage: Microsoft's Pittsburgh team launches large-scale data storage system
In a North Shore building that once warehoused Heinz condiments, Microsoft engineers are building infrastructure that stores a precious commodity: data.
The product today isn’t necessarily physical, like a ketchup bottle, but for big companies, having large-scale, high-performing data storage is “their livelihood,” according to Ron Bianchini, distinguished engineer at Microsoft.
Microsoft’s office, in a five-story brick building on River Avenue on the North Shore, is home to Azure Managed Lustre, a cloud-based, parallel file system for high-performance computing and artificial intelligence workloads.
Lustre itself was an open-source file system developed about 20 years ago by Carnegie Mellon University. Among the early adopters of the massive system are the Department of Energy National Laboratories, including the Los Alamos’ Cielo supercomputer.
The North Shore project integrates the Lustre file system into its offerings with the aim to make it easier for companies to use.
This means clients like Petronas, the Malaysian petroleum giant, can crunch the sonar information that maps out the sea floor to determine the best places to drill.
It’s part of Switzerland-based weather intelligence company Meteomatics technology that collects information from drones, sensors and other gathering points to provide commercial weather forecasting and power output forecasting for wind, solar and hydro companies. Meteomatics is another Lustre client.
It’s not the only player in the field. Amazon Web Services developed its own FSx for Lustre.
Azure Managed Lustre was released for general availability in July. The company expects to see growth, especially as more companies use generative AI and the need for massive storage expands.
The leader behind Microsoft’s project is Bianchini, a serial entrepreneur who has founded several companies, including Avere Systems, a Pittsburgh-based startup that produces computer data storage and data management infrastructure, in 2008. It was acquired by Microsoft in 2018.
After integrating Avere into Microsoft, the company tapped the Pittsburgh team to integrate the open-source Lustre technology, Bianchini said.
Getting the system to general availability status is “extremely exciting,” said Brian Barbisch, principal software engineering manager.
“We created something from scratch and assembled a team together,” Barbisch said. “In the beginning, when you’re a small team, your job titles are erased. You’re starting from scratch, and you’re just doing the next hard thing that comes up, the next thing that needs to be done.”
Bianchini said the work of the universities like Carnegie Mellon, University of Pittsburgh and Penn State in creating an expertise in data storage has helped make the region a hub for the technology. The effort to create that goes back about 40 years, when the National Science Foundation recognized the work being done in the field locally.
“In the ’80s, the NSF granted a data storage center of excellence at CMU.”
“It was a clear recognition that this region has strong intellectual property expertise in data storage,” said Bianchini, who is a CMU graduate, former professor and current board member.
Among the early clients is Duke Clinical Research Institute.
“I think the work that we’re enabling in areas like cancer research, energy exploration — it’s being able to enable that in small companies or large companies that maybe not would have not been able to do this before,” said Brian Lepore, principal program manager. “It’s global and we’re doing it from Pittsburgh.”
Henry Baltazar, a research director with S&P Global Market Intelligence, noted that the demand for high-performance, cloud-based applications has been growing in recent years, especially as AI and machine learning has become more widely adopted.
“I think that’s why Microsoft is going into a technology such as Lustre — to be able to address some of those new markets that they weren’t really looking at as much in the past,” Baltazar said.
The original Lustre file system “is very well known, but it can be very difficult to manage. With new use cases coming out, especially as we think about things like AI and machine learning, we need those types of performance capabilities now,” Baltazar said. “It’s not just for the rocket scientists or defense contractors. It’s a sign that the need for that technology is moving down the stack.”
Remove the ads from your TribLIVE reading experience but still support the journalists who create the content with TribLIVE Ad-Free.