SSDs choked by crummy disk interfaces: NVMe and SCSI Express Explained

December 13, 2011

This is the complete repost of Chris Mellior’s terrific article from last week:

Gotta be PCIe and not SAS or SATA

By Chris Mellor • Get more from this author

Posted in Storage7th December 2011 15:43 GMT

Free whitepaper – VMready

A flash device that can put out 100,000 IOPS shouldn’t be crippled by a disk interface geared to dealing with the 200 or so IOPS delivered by individual slow hard disk drives.

Disk drives suffer from the wait before the read head is positioned over the target track; 11msecs for a random read and 13msecs for a random write on Seagate’s 750GB Momentus. Solid state drives (SSDS) do not suffer from the lag, and PCIe flash cards from vendors such as Fusion-io have showed how fast NAND storage can be when directly connected to servers, meaning 350,000 and more IOPS from its ioDrive 2 products.

Generation 3 PCIe delivers 1GB/sec per lane, with a 4-lane (x4) gen 3 PCIe interface shipping 4GB/sec.

You cannot hook an SSD directly to such a PCIe bus with any standard interface.

You can hook up virtually any disk drive to an external USB interface or an internal SAS otr ATA one and the host computer’s O/S will have standard drivers that can deal with it. Ditto for an SSD using these interfaces, but the SSD is sluggardly. To operate at full speed and so deliver data fast and help keep a multi-core CPU busy, it needs an interface to a server’s PCIe bus that is direct and not mediated through a disk drive gateway.

What could go wrong with this rosy outlook? Plenty; this is IT. There is, of course, a competing standards initiative called SCSI Express.

If you could hook an SSD directly to the PCIe bus you could dispense with an intervening HBA that requires power, and slows down the SSD through a few microseconds added latency and a hard disk drive-connectivity based design.

There are two efforts to produce standards for this interface: the NVMe and the SCSI Express initiatives.


NVMe, standing for Non-Volatile Memory express, is a standard-based initiative by some 80 companies to develop a common interface. An NVMHCI (Non-Volatile Memory Host Controller Interface) work group is directed by a multi-member Promoter Group of companies – formed in June 2011 – which includes Cisco, Dell, EMC, IDT, Intel, NetApp, and Oracle. Permanent seats in this group are held by these seven vendors, with six other seats held by elected representatives from amongst the other work group member companies.

It appears that HP is not an NVMe member, and most if not all NVMe supporters are not SCSI Express supporters.

The work group released a v1.0 specification in March this years, and details can be obtained at the NVM Express website.

A white paper on that site says:

The standard includes the register programming interface, command set, and feature set definition. This enables standard drivers to be written for each OS and enables interoperability between implementations that shortens OEM qualification cycles. …The interface provides an optimised command issue and completion path. It includes support for parallel operation by supporting up to 64K command queues within an I/O Queue. Additionally, support has been added for many Enterprise capabilities like end-to-end data protection (compatible with T10 DIF and DIX standards), enhanced error reporting, and virtualisation.

The standard has recommendations for client and enterprise systems, which is useful as it means it will embrace the spectrum from notebook to enterprise server. The specification can support up to 64,000 I/O queues with up to 64,000 commands per queue. It’s multi-core CPU in scope and each processor core can implement its own queue. There will also be a means of supporting legacy interfaces, meaning SAS and SATA, somehow.

blog on the NVMe website discusses how the ideal is to have a SSD with a flash controller chip, a system-on-chip (SoC) that includes the NVMe functionality.

What looks likely to happen is that, with comparatively broad support across the industry, SoC suppliers will deliver NVMe SoCS, O/S suppliers will deliver drivers for NVMe-compliant SSDs devices, and then server, desktop and notebook suppliers will deliver systems with NVMe-connected flash storage, possibly in 2013.

What could go wrong with this rosy outlook?

Plenty; this is IT. There is, of course, a competing standards initiative called SCSI Express.

SCSI Express

SCSI Express uses the SCSI protocol to have SCSI targets and initiators talk to each other across a PCIe connection; very roughly it’s NVMe with added SCSI. HP is a visible supporter of it, with there being SCSI Express booth at its HP Discover event in Vienna, and support at the event from Fusion-io.

Fusion said its “preview demonstration showcases ioMemory connected with a 2U HP ProLiant DL380 G7 server via SCSI Express … [It] uses the same ioMemory and VSL technology as the recently announced Fusion ioDrive2 products, demonstrating the possibility of extending Fusion’s Virtual Storage Layer (VSL) software capabilities to a new form factor to enable accelerated application performance and enterprise-class reliability.”

The SCSI Express standard “includes a SCSI Command set optimised for solid-state technologies … [and] delivers enterprise attributes and reliability with a Universal Drive Connector that offers utmost flexibility and device interoperability, including SAS, SATA and SCSI Express. The Universal Drive Connector also preserves legacy investments and enables support for emerging storage memory devices.”

An SNIA document states:

Currently ongoing in the T10 ( committee is the development of SCSI over PCIe (SOP), an effort to standardise the SCSI protocol across a PCIe physical interface. SOP will support two queuing interfaces – NVMe and PQI (PCIe Queuing Interface).

PQI is said to be fast and lightweight. There are proprietary SCSI-over-PCIe products available from PMC, LSI, Marvell and HP but SCSI Express is said to be, like PQI, open.

The support of the NVMe queuing interface suggests that SCSI EXpress and NVMe might be able to come together, which would be a good thing and prevent the industry working on separate SSD PCIe-interfacing SoCs and operating system drivers.

Of course this imagining could be just us blowing smoke up our own ass.

There is no SCSI Express website but HP Discover in Vienna last month revealed a fair amount about SCSI express, which is described in a Nigel Poulton blog.

He says that a 2.5-inch SSD will slot into a 2.5-inch bay on the front of a server, for example, and that “[t]he [solid state] drive will mate with a specially designed, but industry standard, interface that will talk a specially designed, but again industry standard, protocol (the protocol enhances the SCSI command set for SSD) with standard drivers that will ship with future versions of major Operating Systems like Windows, Linux and ESXi”.

HP SCSI Express cardHP SCSI Express card from HP Discover at Vienna

Fusion-io 2.5-inch, SCSI Express-supporting SSDs plugged into the top two ports in the card pictured above. Poulton says these ports are SFF 8639 ones. The other six ports appear to be SAS ports.

A podcast on HP social media guy Calvin Zito’s blog has two HP staffers at Vienna talking about SCSI Express.

SCSI Express productisation

SCSI Express productisation, according to HP, should occur around the end of 2012. We are encouraged (listen to podcast above) to think of HP servers with flash DAS formed from SCSI Express-connected SSDs, but also storage arrays, such as HP’s P4000, being built from ProLiant servers with SCSI Express-connected SSDs inside them.

This seems odd as the P4000 is an iSCSI shared SAN array, and why would you want to get data at PCIe speeds from the SSDs inside to its X86 controller/server, and then ship them across a slow iSCSI link to other servers running the apps that need the data?

It only makes sense to me if the P4000 is running the apps needing the data as well, if the P4000 and app-running servers are collapsed or converged into a single (servers + P4000) system. Imagine HP’s P10000 (3PAR) and X9000 (Ibrix) arrays doing the same thing: its Converged Infrastructure ideas seem quite exciting in terms of getting apps to run faster. Of course this imagining could be just us blowing smoke up our own ass.

El Reg’s takeaway from all this is that NVMe is almost a certainty because of the weight and breadth of its backing across the industry. We think it highly likely that HP will productise SCSI Express, with support from Fusion-io and that, unless there is a SCSI Express/NVMe convergence effort, we’re quite likely to face a brief period of interface wars before one or the other becomes dominant.

Concerning SCSI Express and NVMe differences, EMC engineer Amnon Izhar said: “On the physical layer both will be the same. NVMe and [SCSI Express] will be different transport/driver implementations,” implying that convergence could well happen, given sufficient will.

Our gut feeling is that PCIe interface convergence is unlikely, as HP is quite capable of going its own way; witness the FATA disks of recent years and also its individual and admirably obdurate flag-waving over Itanium. ®

Heading to Las Vegas? The 2 Interop Sessions you MUST NOT MISS

April 12, 2010
Tuesday, April 27:  4:00 PM–5:00 PM
Application Acceleration from a Data Storage Perspective

Data storage is just one element in the application delivery ecosystem, but it turns out that it is often the critical performance bottleneck.  A number of new technologies have come out in the recent years to optimize storage performance.  This session reviews several of these innovations and explains how  some performance-optimization technologies can be easily integrated into the existing storage infrastructure. Topics include spindle aggregation, various storage acceleration devices, application-level QoS techniques, and caching or hot-spotting with solid state disk.  Specific attention will be given to accelerating network file systems, MS Exchange, and SQL databases. 

SpeakerJacob Farmer, CTO, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies.  He has served on the advisory boards of many of the most successful storage technology startups, and is well respected in the analyst community.  Jacob is a graduate of Yale University.

Wednesday, April 28:  2:00 PM–3:00 PM
Overview and Current Topics in Solid State Storage

This session provides introductory material and discussion of solid state storage.  A comprehensive overview of the technology, from components to devices to systems is provided, along with an overview of several current topics surrounding the integration, deployment, use and application of solid state storage.  The material is intended for those who are not familiar with solid state storage in the enterprise and wish to develop a working understanding of the technology and its usage.   Learning Objectives Understand solid state storage technology in its various forms Understand common characteristics and behaviors of solid state storage Understand the benefits of using solid state storage in enterprise applications  

SpeakerRobert Peglar, Vice President Technology, Xiotech Corporation

Rob Peglar is Vice President Technology and Senior Fellow for Xiotech Corporation as well as holding a seat on the board of SNIA.  Mr. Peglar holds the B.S. degree in Computer Science from Washington University, St. Louis Missouri, and performed graduate work at Washington University’s Sever Institute of Engineering. 

Consider Element-based Storage to Support Application-centric Strategies

March 29, 2010

What is Element-based storage?

Element-based storage is a new concept in data storage that packages caching controllers, self-healing packs of disk drives, intelligent power/cooling, and non-volatile protection into a single unit to create a building-block foundation for scaling storage capacity and performance. By encapsulating key technology elements into a functional ‘storage blade’, storage capability – both performance and capacity – can scale linearly with application needs. This building-block approach removes the complexity of frame-based SAN management and works in concert with application-specific function that resides in host servers (OSes, hypervisors and applications themselves).

How are Storage Elements Managed?

Storage elements are managed by interfacing with applications running on host servers (on top of either OSes or hypervisors) and working in conjunction with application function, via either direct application control or Web Services/REST communication. For example, running a virtual desktop environment with VMware or Citrix, or a highly-available database environment with Oracle’s ASM or performing database-level replication and recovery with Microsoft SQL Server 2008 – the hosts OSes, hypervisors, and applications control their own storage through embedded volume management and data movement. The application can directly communicate with the storage element via REST, which is the open standard technique called out in the SNIA Cloud Data Management Interface (CDMI) specification. CDMI forms the basis for cloud storage provisioning and cloud data movement/access going forward.

The main benefits of the element-based approach are:

  • Significantly better performance – more transactions per unit time, faster database updates, more simultaneous virtual servers or desktops per physical server.
  • Significantly improved reliability – self-healing, intelligent elements.
  • Simplified infrastructure – use storage blades like DAS.
  • Lower costs – significantly reduced opex, especially maintenance and service.
  • Reduced business risk – avoiding storage vendor lock-in by using heterogeneous application/hypervisor/OS functions instead of array-specific functions.

Action Item: Organizations are looking to simplify infrastructure, and an application-centric strategy is one approach that has merit. Practitioners should consider introducing storage elements as a means to support application-oriented storage strategies and re-architecting infrastructure for the next decade.

Rob Peglar is VP of Technology at Xiotech and a Xiotech Senior Fellow.  A 32-year industry veteran and published author, he leads the shaping of strategic vision, emerging technologies, defining future offering portfolios including business and technology requirements, product planning and industry/customer liaison. He is the Treasurer of the SNIA, serves as Chair of the SNIA Tutorials, as a Board member of the Green Storage Initiative and the Solid State Storage Initiative, and as Secretary/Treasurer of the Blade Systems Alliance.  He has extensive experience in storage virtualization, the architecture of large heterogeneous SANs, replication and archiving strategy, disaster avoidance and compliance, information risk management, distributed cluster storage architectures and is a sought-after speaker and panelist at leading storage and networking-related seminars and conferences worldwide.  He was one of 30 senior executives worldwide selected for the Network Products 2008 MVP Award.    Prior to joining Xiotech in August 2000, Mr. Peglar held key technology specialist and engineering management positions over a ten-year period at StorageTek and at their networking subsidiary, Network Systems Corporation. Prior to StorageTek, he held engineering development and product management positions at Control Data Corporation and its supercomputer division, ETA Systems.     Mr. Peglar holds the B.S. degree in Computer Science from Washington University, St. Louis Missouri, and performed graduate work at Washington University’s Sever Institute of Engineering.  His research background includes I/O performance analysis, queuing theory, parallel systems architecture and OS design, storage networking protocols, clustering algorithms and virtual systems optimization.

repost from WIKIBON: 

Xiotech Launches Fellows Program, Names Inaugural Class

February 19, 2010


Yesterday we launched our Xiotech Fellows program and named our inaugural class of Fellows. As President and CEO Alan Atkinson said, “We at Xiotech are blessed to have an incredibly deep roster of storage ‘rock stars’ who have a collective body of accomplishments and patents that rivals anyone else in the industry.”

Our Fellows are:

•Richard Lary, Corporate Fellow – With 40 years of industry experience and 32 patents, Lary’s legendary influence spans from the creation of VAX and the Digital Storage Architecture at DEC to having been a vital consultant to many corporations.

•Rob Peglar, Senior Fellow – With 32 years of experience, Peglar is one of the industry’s most seasoned and recognized experts on a host of storage topics and the current treasurer of SNIA.

•Ken Bates, Fellow – With 35 years of experience, Bates is a performance guru and was a guiding force in the development of the gold standard in performance testing – the SPC benchmark.

•Todd Burkey, Fellow – With 33 years of experience, Burkey has helped to drive key Xiotech innovations for its industry leading reliability around predictive monitoring, telemetry automation and failure management.

•Clark Lubbers, Fellow – With 35 years and 56 patents pending or granted, Lubbers was an early and influential force in virtualization, performance and RAID architectures.

•Bill Pagano, Fellow – With 35 years of experience, Pagano was the lead hardware designer of the company’s Intelligent Storage Element (ISE™) technology and holds five patents for his hardware work.
For more information, please visit:

Performance Anxiety: Notes from Rob Peglar

January 17, 2010

Performance (Still) Matters

Posted by Rob Peglar

At the start of each year, many companies – Xiotech included – have national meetings with their public-facing employees.  These meetings are colloquially referred to as ‘kickoff’ events, which implies something new, a fresh start, the beginning of another game.  While there is certainly a lot of truth in that, kickoff is also a good time to revisit topics which may have faded from memory but are in truth even more relevant today.  One such topic is storage performance.

The last decade – football game, if you like – was dominated by one ‘team’, that being capacity.  Growth of capacity, management of that growing capacity, backup of that capacity, replication and protection of that capacity, all things capacity.  Enterprises couldn’t get enough storage, quickly enough, to meet their data growth needs.  They were constantly running out of storage.  So, they grew and grew and grew, and spent and spent and spent.  Towards the end of the decade, predictably enough, some enterprises turned their focus towards data reduction after many years of data growth.

The other team – performance – took many hits over the years as capacity dominated.  Performance made a few yards every now and then, but mostly, had to punt, as capacity won the day-to-day battle.  However, in this decade, performance has the ball and is driving, as capacity is reduced.  I personally believe performance – once the dominant criteria for storage, when datasets were small – is making a big comeback.

Performance learned many lessons over the last decade.  Today, performance is not merely about IOPS, throughput or response time – the traditional three aspects.  It’s about ratios: the three aspects of performance per unit of input.  Now, what’s “input”, you say?  To an enterprise, in particular IT, inputs are simple:  money, time, space, power, cooling, and humans (which require money), bringing us full circle.

So, it’s not about IOPS, it’s about IOPS/$, IOPS/watt, (IOPS * TB)/watt, and other ratios.  Measuring storage performance using ratios turns out to be the most useful technique for enterprises to evaluate their storage; after all, CFOs measure business performance using ratios.  I believe CIOs should measure IT performance in general and storage performance in particular using ratios.  On the compute side, we often refer to servers not by how many virtual machines they can merely hold, but how many they can run efficiently and meet a given SLA.  There is a direct business translation between that and what a cloud compute provider would measure, for example.  The same should be true for storage.

Storage devices (including arrays) perform only three essential functions; they store data, move data, and protect data.  Inherent in all these essential functions is efficiency.  We now know how to store efficiently, as seen by the increasing use of data reduction techniques.  Protecting data efficiently is also a well-understood realm, both in terms of data at rest (such as the new RAGS method) and data in flight (using a variety of new and efficient encryption methods).  But what of moving data?  The ball is now in the red zone – can we move the ball (data) into the end zone, or will we be distracted by the cute cheerleader on the sideline (features that are licensed by the TB) who looks nice but doesn’t play?

Moving data to applications is the entire game now.  The more efficiently data is moved, the more efficiently enterprises can run their workloads.  Yes, Virginia, there is a Santa Claus, and his name is performance.  But not the old performance, i.e. ‘my array gets more IOPS than your array’ – but the new performance, measured in ratios.  It’s quite straightforward.  Place the attributes where more is better, such as IOPS, GB/sec (yes, I said GB, not MB; MB is old-school), petabytes (yes, I said petabytes; the 2010 decade is the decade of the petabyte in the enterprise) and length of warranty (e.g. at least 5 years) in the numerator, and place attributes where less is better, such as $, rack U, floor tiles, watts, BTUs, and human costs in the denominator.  For example, IOPS* TB / $.  This particular metric measures cost efficiency over a given surface with a given workload.

Yes, it’s a new year, and we have a new kickoff.  Performance has the ball, and is driving.  Pay attention to it – because like most things in IT, it’s an old idea, an old notion, an old concept made new again.  Performance still matters, because time is still money, and efficiency is the way to save time.  After all, there’s only 24 hours in a day, and that is the inexorable limit we all battle.

Peglar Bio

Rob Peglar is Treasurer of SNIA and VP Technology of Xiotech.  He  is a 32-year storage industry veteran, published author and sought-after speaker and panelist at leading storage and networking-related seminars and conferences worldwide. At Xiotech he helps shape strategic vision and emerging technologies, defines future offering portfolios, plays a key role in product planning and serves as an industry/customer liaison. He has served as chair of the SNIA Tutorials, on the boards of the Green Storage Initiative and the Solid State Storage Initiative and as secretary/treasurer of the Blade Systems Alliance. He has won numerous awards for his efforts and expertise, including being one of 30 senior executives worldwide selected for the Network Products Guide 2008 MVP Award.