My colleagues David Floyer and John Furrier are attending the Flash Memory Summit this week. A glance at the Web site and the exhibitor’s list for the event reminds me of the good old disk drive days back when there were many dozens of active manufacturers worldwide. The market was exciting with tons of VC money pouring in and plenty of innovation to move the market forward. It was hyper-competitive.
At the time, much of the activity was at the device level but as EMC showed, the real money to be made was by developing system level function and specifically combining hardware and software to deliver new types of business value to enterprise customers.
I see a similar dynamic occurring in the fast moving flash market these days. The list of companies attending this week’s event is long with many names that are not household words by any stretch of the imagination. One that stood out to me was Fusion-io. Not because it’s on the list, but because it is a bronze sponsor (the lowest level) and has little asterisk next to its name signifying the company is sponsoring but not exhibiting.
STEC on the other hand is a platinum sponsor, has a nice big booth by the entrance and has a banner ad on the event site. So why is this of interest? Because I suspect Fusion-io isn’t really that interested in the Flash Memory Summit and is there just to see what developments are happening in the ecosystem. For STEC and the other big sponsors, however this is their “Superbowl” of sorts.
While I’m stretching this analogy, Fusion-io and STEC are kind of like EMC and Seagate. For example, EMC used to attend the big drive company events (e.g. Comdex) to catch the scuttlebutt and see what was happening. Meanwhile Seagate would rent out most of a downtown hotel and throw lavish parties for its customers at the event. They were/are both storage companies but their businesses were quite different. They don’t compete for the same customers and have completely different business models.
I find that many observers are looking at flash and thinking that because all flash is faster than super slow spinning disk that any flash will speed up application performance and hence any company selling flash must be the same. In other words, people are looking at the $2B+ valuation of Fusion-io and thinking that everyone doing flash is going to grab a piece of that pie.
Not So Fast Sparky
More specifically, the world has finally caught on that PCIe is a better way to connect to the host than using slow disk controller technologies across a network. Now there are other connection methods that are emerging (e.g. Infiniband) but judging from all the PCIe announcements lately it’s pretty clear the world is buying into Fusion-io’s connectivity approach. Twenty-four months ago there was some debate about this but now it’s pretty much accepted. What we’re seeing in the market is many observers looking at all the PCIe-based flash solutions and concluding they’re just like Fusion-io.
Here’s why they’re wrong. Fusion-io’s advantage has little to do with PCIe and everything to do with software, specifically its Virtual Storage Layer (VSL) technology. This is why, in addition to its hyper-growth and killer customers, the company in my mind can justify a $2B valuation. Wikibon’s David Floyer explains VSL in gory detail in this excellent post. This action item from Floyer says it all:
Application heads and ISVs need to take a long, hard look at Flash on Server architectures and Fusion-io’s VSL implementation. As the cost of flash comes down, the functionality and ease-of-use improvements that can be made to applications will be game changing.
Game changing is an over-used phrase in marketing but in this case I believe it’s justified. What other PCIe flash suppliers are doing for the most part is taking SSDs and off-the-shelf controllers and plugging them into a PCIe bus. There’s nothing wrong with that. Compared to spinning disk it can deliver an order of magnitude performance improvement. But as Floyer’s research note describes, while this approach can deliver high IOPs – similar to Fusion-io, it still uses traditional disk protocols to communicate with the on-board RAID controller. This means these solutions must endure the latency penalties associated with the handshaking of, for example, SCSI.
These PCIe flash solutions use processor offloads which are old school thinking based on the days of Pentium before the age of mega multi-cores. Said another way, these systems can’t exploit the potential of today’s multi-core processors. Fusion-io’s approach, on the other hand, uses host resources to manage the system. This means the management of the VSL space is cognizant of all the processors at the host level and as such can be optimized for maximum efficiency, thereby eliminating potential bottlenecks. Sure this method uses more CPU cycles (bad in the old days of mainframe disk when processor cycles were expensive and scarce) but CPU capacity in multi-core systems is plentiful. And yes you could use smaller processors but then you wouldn’t get the face-melting application performance that Facebook, Apple and others are experiencing with Fusion-io.
It’s all About Latency
Latency is what drives application performance. And as Facebook has discovered, the user experience is enhanced dramatically when you can deliver two to three orders of magnitude performance improvements. IOPs is a fun metric but latency is where the rubber hits the road. In standard PCIe architectures, the SAS and SATA protocols and processor offloads introduce application latencies. Think about the solid state disks that have been around since the late 1970’s. Why did they never take off? Not only because they were volatile and expensive but also because they put memory on the “other side of the channel” relegating it to niche use cases.
The Fusion-io approach that Floyer describes puts flash as a much higher capacity (and cheaper) extension to memory, right next to the processor with latency that is similar (somewhat higher) to DRAM. So it’s like the old AS/400 single level store except the pool is flash, not crappy, slow, spinning disk.
The Flash Landscape
Here’s my simplistic interpretation of how the brains in the Wikibon community see the flash space.
There are four levels here.
4. Solid State Disk – it’s mimicking disk drive form factors and function using flash. Nice idea. It reminds me of when IBM introduced its RAMAC disk array (the second time around). It had an old, outdated controller architecture that was running out of gas so it purchased disk “bricks” with cache from Xitel and put them in a subsystem to extend the life. Think of this approach as putting lipstick on a spinning disk pig.
3. Flash as Primary Storage – Now this layer to me is interesting. The idea here is to make an all-flash device (e.g. SolidFire) and target it at primarily block storage from IBM, EMC, HP and also some file apps and deliver consistent quality of service. This is very compelling to cloud service providers who can enable new applications and charge customers for QoS. Very disruptive to the block-based, high-end storage guys. Think of this as Elastic Block Storage for the enterprise.
2. Flash as Cache – This approach helps specific applications and use cases. It’s good for virtualized systems and helps legacy apps run faster (e.g. Oracle databases, Exchange, etc.). Examples include LSI CacheCade, Oracle and IO Turbine (which Fusion-io just purchased). But this is a narrow solution, not designed for tomorrow’s applications. Nice but not game changing.
1. Flash as Memory Extension – This highest layer provides direct memory access and eliminates the latencies not only of spinning disk but also of traditional disk protocols. Unlike Flash as cache there are no cache misses. People often say “so what, if I get a 95% cache hit rate what’s the difference?” Ask Facebook. Ask Apple. Fusion-io writes are atomic, meaning you only have to do writes once, unlike traditional disk protocols which must endure a litany of signaling and write verification overheads.
FYI – #1 is either PCIe or in theory on the motherboard. #2 is mostly PCIe and #’s 3 and 4 are pretty much standard disk protocols.
So What’s the Big Deal?
Today, only Fusion-io has #1. The company stands alone. It has a huge lead on the competition. Its advantage is in fusing hardware and software to address large data intensive problems. Fusion-io is trying to set the new standard platform for how applications will be developed. Does this mean it will succeed in attracting ISVs and developers? No but if I had to bet I’d say it’s 2-1 odds that the company succeeds beyond anyone’s expectations and there’s a 4-1 chance that Fusion-io becomes an absolutely ridiculous mega home run.
Who else is in the multi-billion dollar race? Not the guys in #’s 2, 3 and 4 above– nice market but not universally game changing across the application spectrum. It’s Intel, Samsung, maybe Oracle, maybe IBM…perhaps Google. And none of these has really given any indication that it’s close to Fusion-io. Whatever Google invents it will keep to itself although it could spawn some new open source movement – who knows. Intel is interesting but if we’re still talking about the future Intel threat to Fusion-io in 18-24 months I’d say it’s too late for Intel to dominate – unless it pulls out the monopoly tactics playbook (but the DoJ is watching closely). The other companies I mentioned are contenders they’re really not even on the track and most will end up as pretenders in this race.
As applications become more and more data intensive there’s no reason that over time, any active data should be on spinning disk. This is good for all the flash players but especially for the guys placing flash as close to the processor as possible and enabling a new class of applications to reach their potential. This is where the big prize will be taken. Viewed another way…what is the future standard by which all SQL, no SQL and high end applications will run? For my money, I’ll back the startup, executed well, just went public, has the secret sauce and the focused mojo horse.