Contact

  • thoughtput (at) gear6.com
    Drop us a note with your thoughts or comments. Thanks!

Events

Podcasts

Subscribe

« February 2008 | Main | April 2008 »

March 31, 2008

Memory in the Data Center - part II

This is a continuation from Memory in the Data Center - Part I

We all want higher throughput and lower latency data centers.

That translates to adding more memory. The question is how best to do that for maximum application performance at the least overall infrastructure cost?

Flash or DRAM drives? One approach is to take the new memory-based solutions and make them look like disks. That is to make flash or DRAM emulate the characteristics of a disk drive using some flavor of SCSI, FC or SATA disk-level operations to communicate with the overall system.

Examples of this implementation include both flash and DRAM based solid state drives, solid state arrays, PCI-based memory repositories, and static memory appliances.

This approach can benefit very specific data sets that are not likely to grow because the memory is being presented as a fixed LUN to the system, and resizing is difficult. It is tricky to resize a group of SSDs in an array when there are no more drive slots left, or add another PCI card to a full bus.

This memory-as-disk approach also applies to small data sets because adopting it forces you to forgo all of the benefits of low-cost, high-capacity storage. Granted you can manually move data around, but that is generally regarded as troublesome and time consuming maintenance that could be better spent on new initiatives.

One further complication with memory-as-disk approach is the difficulty in extracting the active data set from the entire file or volume. This can be near impossible to assess manually and therefore results in over-provisioning memory resources to accomodate the entire data file or LUN as opposed to the smaller percentage of active or “hot” data.

Finally, memory-as-a-disk assumes that the memory has disk level retention characteristics. So in the case of DRAM that requires a robust and highly available battery backup system, along with all of the storage management responsibilities of persistence. And in the case of flash some concerns remain about wear levels and reliability, all of which will improve and possibly be resolved over time, but still may cause near-term concern for enterprise IT departments.

An interesting alternative to deploying memory as a disk, is to use memory as a cache. This concept itself is not new as cache has been deployed at nearly every level of data center systems from L1 and L2 CPU based caches, to motherboard-attached memory, to storage subsystem caches, and even cache at the drive level.

The CPU-level and drive-level caching should and will continue for a long time to come. What I’d like to focus on now is the differences between client side caching (at the application server), storage side caching (at the subsystem), and caching in between in the network.

The end-node server and subsystem solutions have worked well in the past, particularly in a world where single servers connected to single storage systems, because in that case having memory on one side of the connection or the other was guaranteed to provide assistance. But in an increasingly networked world where many application servers are connecting to multiple storage systems, including the deployment of clustered file systems, having the memory and caching resources in the network makes a lot more sense.

No doubt that this line of thinking will stir some debate. And I agree that there will always be room for caching at the end-node server and storage systems. But the reality is that once you are in a multi-device world, the most efficient and effective use of a memory-based resource is to apply it to the maximum number of servers and storage systems. This is analogous to the migration from direct-attached to network storage many years ago.

With a memory-based caching resource in the network any application server requesting data from any storage system can benefit from the ability to cache frequently accessed data in high-speed memory compared to slower mechanical disk. This guarantees that in cases of shifting hot-spots or hot-systems the benefit of caching applies across the board.

Alternatives would be to put the maximum amount of cache memory within each server and storage system which falsely assumes that each system is under the exact same utilization all of the time. It is simply not an option for most customers.

Another major reason to cache in the network is for scalability. Caching at an end node devices means that each increment in cache requires another end-node device. Need more cache in your subsystem? Buy another subsystem. If the cache is in the network, that subsystem might last a lot longer, and in some cases customers might actually reduce the overall number of storage systems required to deliver a set performance level.

Over and over, we’ve seen valued system level technologies be accessible within the network. Servers have long been there, storage more recently, and caching with high-speed, high-performance memory is another step in the network direction.

Next up: Part III - Installation and Operation of Memory in the Data Center

March 26, 2008

Gear6 News: $10 Million Financing Secured

Gear6 Secures $10 Million Financing Round From Horizon Ventures, U.S. Venture Partners and InterWest Partners

Centralized Storage Caching Innovator Lands Second Major Financing Round to Bolster Company Growth and Expansion

Mountain View, Calif. – March 26, 2008 – Gear6, accelerating I/O for real time application performance, today announced it has secured a $10 million dollar financing round led by Horizon Ventures. Existing investors U.S. Venture Partners and InterWest Partners also participated in the round. The funds will be used to bolster the company’s market expansion being fueled by increasing demand for its CACHEfx line of scalable caching appliances. Centralized storage caching is rapidly emerging as the most simple and cost-effective way to increase data center application performance while dramatically reducing total storage costs. [read the full news release here]

March 24, 2008

Memory in the Data Center - part I

I have been talking with several industry folks recently about the push for more memory in the data center. A friend asked the following question, and it led to this series of posts.

Question: The battle lines on the storage pyramid seem to be in flux. Flash on disks. Flash drives in arrays. Ever larger server memory capacity. D2D backup replacing tape. Disk is encroaching on tape. Flash is encroaching on disk. And DRAM is encroaching on both in some cases. Can you shed some light on this?

Sure. Let's break this down into four sections.

1) Media
2) Architecture and Implementation
3) Installation and Operation
4) Overall Data Center Impact

Let'€™s start with the media and get that out of the way first. Ultimately this discussion is less about media and more about the architecture, specifically how do you architect a solution that matches the way you use your data.

There are several developments happening across different media types and price points that are particularly exciting.

Flash is one of the hottest topics because it is a truly groundbreaking shift in the price/performance spectrum. This is not to say that flash will rule the world. Rather, we are going to see a variety of ways to make use of this media.

As consumers, we have already benefited from flash in our iPods, camera memory cards, USB keys and more. The next wave for flash is in the enterprise.

As an industry, we're racing to figure out how to extract the most value from different types of flash media based on flash-level specs. This analysis is important, but we need to measure media types at an application level as well.

Application metrics. Metrics such as overall system latency, total application operations per second, and application performance per dollar of infrastructure will become more important than what one flash based drive can do.

Staying on the media discussion, the prices of DRAM continue to drop significantly with CPU and motherboard suppliers creating denser memory configurations on industry standard platforms. For many industry applications the cost of DRAM, particularly when measured in a cost-per-IOP basis, is extremely favorable.

Then there are spinning mechanical disks. The big changes in the disk drive market are the amazing increases in capacity, but unfortunately with declining performance. But the availability of new memory-based solutions completely reshapes our ability to make use of this valuable drive-based media.

SATA and SAS drives can provide a near infinite amount of capacity for common applications. Now users can enhance the "€œcheap and deep"€ capacity with performance-based memory solutions. In and of themselves, the low-cost, high-capacity disks were only good for archiving types of applications. But with the addition of memory-based storage enhancements, the picture changes completely, and "cheap and deep" now has instant performance.

We will forever be presented with new an interesting media choices. Just a couple of months ago, Intel and ST Microelectronics announced major achievements with phase-change memory, a technology that some claim may eventually replace flash.

More recently, at the FAST 08 Conference, IBM introduced a tutorial on Storage Class Memory, and finding the right applications for the right technologies within a triangle of Speed, Write Endurance, and Cost per bit. In addition to expectations for improved flash, other candidates to optimize this triangle include Ferroelectric RAM (FeRAM), Magnetic RAM (MRAM), Resistive RAM (RRAM), Solid Electrolyte, and Phase-chance RAM (PC-RAM).

Rather than assuming this media discussion will ever be over, we need to focus on the overall architecture of memory-based solutions and how one makes effective use of currently available media. More on that next.

March 19, 2008

Q&A on Virtualization for Everyone Blog

We had a chance to connect with Tarry Singh who maintains the blog Virtualization for Everyone. After our call, he sent us a list of interview questions.

Check out the full post here.

March 17, 2008

Introducing the G100

Gear6 Expands Market Reach with New, Entry-Level Centralized Storage Caching Appliance

New G100 Slashes Storage Costs, Leverages Existing Systems and Eliminates Performance-Threatening I/O Bottlenecks

Mountain View, Calif. – March 17, 2008 – Gear6, accelerating I/O for real time application performance, today announced the launch of the CACHEfx G100 scalable caching appliance. This product expands the reach of centralized storage caching to a broader customer base in need of rebalancing their existing storage configurations to lower total system costs, increase I/O performance, reduce rack space, and cut expensive energy consumption. The G100 delivers the same feature set and capabilities as the full CACHEfx product line in a more compact and lower cost configuration. The new appliance is designed for customers requiring moderate bandwidth and cache capacity looking to cut costs and simplify configurations for performance oriented storage. CACHEfx appliances scale easily as modular building blocks, allowing customers to cost-effectively start small and “scale as you grow” as performance requirements increase. [click here for the full news release]

March 10, 2008

How Caching Helps Cut Backup Costs

I received a great question about our recent video Top 6 Ways to Cut Storage Costs, specifically how can a scalable caching appliance reduce backup costs.

Basically, backup architectures are often (but not always) tied to the number of storage controllers and the amount of capacity deployed.

When storage is configured for performance, administrators often need to deploy a greater number of disk spindles and more storage controllers than they might need if they were configuring for capacity. When designing backup solutions for this "performance-optimized" storage, the licensing costs often climb in relation to the number of controllers and amount of disk over-provisioning.

When a scalable caching appliance is deployed -- reducing the storage footprint by eliminating unnecessary controllers and spindles -- the licensing and equipment costs for the backup infrastructure go down.

So it comes down to designing effective, balanced data center architectures from the start. When over-provisioning is rampant -- often the case with traditional "add more disk" approaches to performance -- the cascading costs of management, maintenance, and backup can result in excess costs as well. Designing in scalable caching appliances keeps storage footprints to a minimum and in turn can help shrink backup costs.

March 05, 2008

Cloud Computing Roundtable - Take 1

The news feeds have been full of cloud computing activity over the last few months, so we decided to kick of a roundtable discussion and see if we can make a regular thing of it.

Click here for Episode 1 of the Cloud Computing Roundtable.

For this initial episode, we have the following participants:

Hosts:
Gary Orenstein, Vice President of Marketing, Gear6
Steve Norall, CTO of TechValidate and former industry analyst

Industry Guests:
Edgard Capdevielle, Vice President of Product Management, Nirvanix
Jim Herbold, Enterprise General Manager, Box.net

Comments and suggestions for future episodes welcome, including ideas for additional industry guests.

Enjoy.

March 04, 2008

Comments from B&S message board

AVM recently posted a few earnest questions on the Byte and Switch message board. In general, I find those anonymous boards to be bastions of hot-headedness, but AVM appeared sincere. So I posted a reply to hopefully answer the questions about benchmarks and caching in the network. Scroll down if you want to see AVM's questions first.

---------------------------------------
AVM,

Glad to see the interest in Gear6 and CACHEfx. I’ll do my best to answer your questions and then perhaps invite topics requiring more detail to shift to our company blog, www.thoughtput.com.

Regarding benchmarks, there are differences. The benchmarks you mentioned are excellent for persistent storage systems. But the workloads that we see at our customers are different from the profiles of the SPC and SFS benchmarks. Our targets markets frequently have to handle heavy reads, small I/O operations, and intense metadata. So our benchmark used a workload generated with the SIO load generator from NetApp configured for 100% random read of the data. The load ran on 60 dual CPU, dual core, clients, 30 threads each, accessing a CACHEfx appliance with 768 GB of RAM-based cache capacity. The clients connected to a Fujitsu XG200C switch via eight 10 GbE links and the appliance connected to the switch with six 10 GbE links. The configuration also included a common NAS system.

In general, we prefer to focus our activities around specific customer workloads as they provide the best measures of value. Our NEMo I/O profiling tool is one way for us to get a good understanding of cache-ability for specific workloads. It is a simple script that requires no installation and can be a helpful indicator of caching success.

Regarding your point about where should the cache reside. We have, and will continue to see caching opportunities at all levels of the infrastructure. That said, some data centers will require significant caching investments. Caching on the controller or gateway only makes sense if every single storage controller or NAS gateway is going to be perfectly balanced with every other controller or gateway all the time.

On the other hand, if you believe in shifting hotspots and hot files across multi-controller environments then there are clear economic, performance, and management benefits to placing that caching resource in the network. If you have caching deployed in the network, it can scale independently from the end-systems. Now upgrading performance does not necessarily mean buying a new or additional subsystem, but rather being smart about adding cache that helps all existing subsystems.

This is an area we will continue to communicate further. Keep an eye on Thoughtput.com for more details. Thanks for your questions.

Gary Orenstein, Vice President of Marketing, Gear6

----------------------------------------------

                                                                 
Author:  AVM IGNORE AUTHORNumber: 1
Subject: value propositionDate: 03/02/08 12:42 PM
 Rate This Post:

Hi, I have several questions regarding Gear6 and their CacheFX device. If you've read the post before, my apologies -- I somehow managed to post it to the wrong board.

First is regarding the benchmark they published with "Over 872,000 I/O Operations Per Second"

http://www.gear6.com/news-releases/111307

There is no detailed description of the workload, network configuration, media used, and Gear6 is not to be found in such benchmark lists as SPC-1 or SFS97_R1. NetApp posted a result of 1032461, but is Gear6's result comparable to that?

http://www.spec.org/sfs97r1/results/res2006q2/

Second is whether CacheFX makes any practical sense to buy (or even imitate) at all. I think anyone who wants a truly scalable storage installs a NAS gateway anyway, and whatever amount of RAM is needed to cache the data, can be installed on the gateway.

With so many vendors offering NAS gateways, all capable of caching (see e.g. http://www.computerweekly.com/Articles/2007/08/09/220800/nas-gateway-specifications.htm ), why would anyone want a device which does caching and nothing else?

March 03, 2008

Top 10 Signs of I/O Bottlenecks

How do you identify I/O bottlenecks? We put together a short video that outlines a few of the primary symptoms. Of course, it is not an exact science, and often involves a combination of intuition and measurement, but this list will get you off to a good start.