But for now, this is big, even for Big Blue:
The Register has unearthed a research paper that shows IBM working on a computing system capable “of hosting the entire internet as an application.” This mega system relies on a re-tooled version of IBM’s Blue Gene supercomputers so loved by the high performance computing crowd. IBM’s researchers have proposed tweaking the Blue Gene systems to run today’s most popular web applications such as Linux, Apache, MySQL and Ruby on Rails.
The complete paper [link to PDF file] contains this interesting point:
At present, almost all of the companies operating at web-scale are using clusters of commodity computers, an approach that we postulate is akin to building a power plant from a collection of portable generators. That is, commodity computers were never designed to be efficient at scale, so while each server seems like a low-price part in isolation, the cluster in aggregate is expensive to purchase, power and cool in addition to being failure-prone. Despite the inexpensive network interface cards in commodity computers, the cost to network them does not scale linearly with the number of computers. The switching infrastructure required to support large clusters of computers is not a commodity component, and the cost of high-end switches does not scale linearly with the number of ports. Because of the power and cooling properties of commodity computers many datacenter operators must leave significant floor space unused to fit within the datacenter power budget, which then requires the significant investment of building additional datacenters.
It’s always more fun to have it all in one big box until something breaks:
We are extending the existing infrastructure to allow nodes to actively react to hardware failures. Node failures are in many cases non-fatal for the application and recovery is possible. However, node failures which traditionally do not affect a node need to be handled due to the high level of integration. For example, when a node fails which acts as a forwarding node at the physical layer, a network segment may become unreachable. While we can easily deallocate the faulty node from the pool, we must ensure that all necessary nodes still provide networking functionality. Here, the reliability of a single-chip solution is very advantageous. The failure of nodes are often due to failing memory modules. However, each processor chip has 8MB of integrated eDRAM. If more than one RAM chip fails we can usually bring the node back into a state where it still acts as a router, even though normal workloads cannot be run.
Okay, maybe this will work.