In the IT industry today it is nearly impossible not to hear the word cloud dozens of times a day, but many storage administrators treat cloud a four letter word. The basic tenant for a storage administrator is to ensure an organization’s data is safe and secure. If the storage administrator makes a mistake bad, bad things happen. Companies fold, black holes collapse, and sun exploded. NetApp is trying to change the minds of those storage administrators, and for good reason. IT organizations are always looking to do more work with lesss money, and cloud storage can’t be ignored as a viable way to do that.
At Storage Field Day 9, NetApp talked a fair amount about how they are embracing cloud storage as key to the industry’s future. No more a storage vendor affords not to embrace cloud storage, and NetApp sees it as a key. Part of the future for NetApp is expanding the hybrid array with sub-LUN tiering to including cloud storage as a replacement for NL-SAS. This makes perfect sense since the performance characters of a slow SAS drive is not too dissimilar to that of cloud storage. In addition to this tiering, it also has a goal of snapshots being truly cross technology.
All in all the direction NetApp having makes their future bright!Read More
This week I’ve been spending some time at Pure Accelerate, where I’ve been able to talk to the engineering and executive teams behind the new FlashBlade system.
In an attempt to embrace its start up cultural roots, Pure Storage developed FlashBlade as a startup inside the company. What that means is they hired new engineering staff to build a unique and separate product from the ground up. The new team members, to keep the development secretive, were not connected to other traditional Pure employees on Linkedin. While the development was largely separate, some of the FlashArray development team did help where it main sense. That collaboration resulted in a fork of the FlashArray management interface which is used by FlashBlade.
The result of the startup of a company is a new and a unique product. The first thing to understand about FlashBlade is what it is not. It is nor a replacement for a low latency and general purpose workload. The architecture of the system prohibits Flashblade from reaching the sub-millisecond latency of a traditional All Flash Array. In the current iteration of the product only supports file and Object access. In fact, it currently only supports NFS v3 for file access, since it is a prevalent and easy protocol. Adding SMB support is being worked on, but that adds a new layer of complexity where many products have stumbled. FlashBlade is, as a 1.0 product, missing some basic features like snapshots. The mentality was to start shipping product, rather than trying to make the product perfect.
FlashBlade is designed to be low-cost, flexible, and scalable. One way to ensure the costs were low, was to ensure that the actual NAND was the largest component costs of the array. Pure has built a custom hardware blade consisting of an CPUs with direct connect to NAND flash (8Tb or 52Tb) for storage and NV-RAM for power protection of writes. By eliminating the SSD, the design can be kept simplistic and adaptable. Connecting the blades together is a built in low-latency 40Gb Ethernet switched which provides blade-to-blade traffic as well as client connectivity. Currently, FlashBlade scales to 1.6PB in a 4u chassis. However, this should grow with the addition of an external network switch. The NAND gateway is a custom FPGA designed to allow abstraction of the particulars of the NAND. This means the NAND chips could change to a different vendor with a software change rather than an ASIC hardware redesign.
The Software behind FlashBlade, known as Elasticity, runs in a distributed fashion on the Intel CPUs on each blade. This software implements a common object storage and runs data services like encryption and erasure coding. This is also where client access protocols are implemented.
While the architecture of FlashBlade is certainly well thought out and designed, the currently shipping product is lacking many of the features of a NetApp Filer or EMC Isilon. Many people are speculating that this is designed to talk on the Filers and Isilon, but the product is still very new. I am looking forward to seeing where Pure takes FlashBlade and how customers in the field put the system to use.Read More
It wasn’t that long ago that Violin Memory pioneered the high-performance flash market. They created the market for speed but were lacking a rich data services offering. It wasn’t long before many other all-flash storage arrays came to market which may not have been as fast but were good enough with a rich feature set. The lack of data services and competition ultimately causes a lot of pain for Violin Memory. It was so bad that many people thought the company was doomed, but after our SFD8 presentation I’m not convinced.
The meeting had started before the cameras were actually rolling, which I think was really unfortunate for Violin. We were doing two things: eating breakfast and talking financials with the CEO. As we listened to him talk one thing became apparent – Violin financials say they have a chance. CEO Kevin DeNuccio said they had a quarterly burn rate of $12mm with a break-even point being 4 to 8 quarters out. They also have $120mm in cash from the latest funding round. What this means if we need to keep a close eye on them over the next few quarters to see how the burn rate changes. If they can manage to move down to a $7mm – $10mm burn rate, I can see where they make pull back from the brink.
Violin Memory has done a lot of work to change their technology to move forward as a platform. Building rich data services is no easy task, but they seem to have done a great job. The underlying hardware has evolved, but they now have several platforms based on need. Watch the full SFD8 presentations for a deep dive into the technology, but sufficed to say it looks great. With the technology being a major investment, and the financial picture looking promising, Violin Memory may just make it.Read More
If you’ve worked in IT for any amount of time you’ve likely heard the term “secondary storage” which you’ve known as a backup tier. You’ve also heard of “tier 2” storage for test and development workloads not needing the data services of production. These two terms have had very different requirements. Backups target storage is generally cheap, deep, and optimized for sequential writes. Test/dev storage, on the other hand, needs to have different performance since it has actual workloads. Cohesity thinks this needs to change. They content that secondary storage needs to be anything that is not primary storage.
Redefining a term and carving out a new market segment is no small task, but Cohesity shows some pretty interesting use cases:
- Data Protection for VMware environments – Once a hypervisor snapshot is created the data is sent to the Cohesity array where things like deduplication and replication can be applied. This gives you unlimited snaps without the performance impacts of VMware
- User data access using NFS and (soon) SMB.
- Space efficient copies of test and dev can be created on demand with integration with tools like Puppet and Chef in the works.
- Data Analytics allowing elastic search of all supported file types on stored on Cohesity
Just thinking about all the uses cases makes my head spin, but it does all seem really cool. If you have all of these services in-house today, you’re likely paying a fair amount of money for each one. They may also be highly disconnected from the storage array – and we all know data locality matters. This also creates the problem of data sprawl with so many copies of the data and copies cost money. Cohesity calls this the data iceberg, where only the primary data is above water, but the vast majority is hidden below the water.
It all seems highly inefficient, which is why Cohesity was founded.
Cohesity was founded to be an infinitely scalable pool of data but has only been tested up to 32 nodes. Given the metadata requirements per node, I can’t imagine they won’t run into a hard limit with the current implementation. All in all, I like what they are trying to do, but they have a hard path in front of them. Since they are trying to do so much, they’ll face an interesting challenge in defining the problem in their marketing materials. Cohesity is a version 1 product so they have some big gaps in what they do and what they do well, but hopefully that changes as they mature.
Over the last decade, storage arrays have been evolving to becomes faster and protect data better. They have become smarter with where and how data is stored and how to predict when data will be accessed. However one thing has not, by and large, changed. Storage arrays haven’t learned much about what data is actually stored on them. In the last year, this has begun to, in part, change. A few companies now have products which provide some type of data awareness, but they aren’t created equal. Content awareness is nothing new. However, it has mostly existed for legal and compliance separate from the storage array.
Today I attended Storage Field Day 8 and had a chance to meet with Andy Warfield of Coho Data and I saw a different take on this. A few months ago I saw Coho Data start talking about running containers directly on their storage array. I dismissed this mostly edge cases and largely not needed, but today I thought differently. What does this have to do with content awareness? Hang with me for just a minute and I think it will all become clear. Andy showed us a simple demo of a container running on the array that had some simple code. All the software did was convert an image from color to greyscale. On a Windows VM with access to Coho Data storage, a full-color image was dropped into a folder. Instantly the software running on the container kicked in and converted the image to greyscale. The windows VM had direct access to both images now.
So this use case is pretty lame, but the possibilities around this are awesome. By having this type of direct access to the data can enable things like virus scanning, data loss prevention, and other workflows. A classic example of an enterprise is an application writes data to a network share. Every 5 or 10 minutes another application scans that location looking for the file and when found it takes action. This whole workflow is filled with artificial delays.
This type of ability to view the content is poised to change how businesses think about data storage. By having direct access to data using APIs and protocol access workflows like this. Also, it opens up external software to do all sorts of cool and interesting things like data protection and other custom workflows. I’m not sure that running generalize software on a storage array is overly useful, but some uses cases it is a huge benefit. I’m excited to see how Coho Data develops a partner network who can make use of this hook.Read More
At VMworld this year I attended Tech Field Day Extra event as a delegate. Once of the more interesting presentations came from DataGravity. One of the reasons I found them so interesting is they are attempting to add a new dimension to the storage array market. Since the storage array market was created people have cared about three things: data resiliency, speed, and capacity. These three metrics should not come as a surprise since the intent was to leverage a larger pool of capacity for reduced cost and improved performance. Fast forward to now and these three things have not fundamentally changed much. We still care about protection our data, making it perform well, and how much data we can store. DataGravity is trying to change this by building new Data Analytics capabilities into their storage array.
If you’ve every worked in a large enterprise, you have most likely dealt with legal and security teams. They always ask the storage administrators “Can you tell me what data we stored on this server?” and “Can you tell me who accessed this file?”. To which Mr. Storage Administrator has always replied a firm “No.”. After all the storage administrator is just a steward of data and not the actual owner. He or she has no idea what the end user writes to space. Legal and Security teams use things like Data Loss Prevention and data discovery to answer their questions. Doing this process, however, is a bit of a pain for everyone involved. Wouldn’t it be nice of your storage array knew what it was storing? That is what Data-aware storage is all about.
DataGravitiy can search hundreds of file types to do some interesting things. For example, it can tell you who is talking about a particular keyword or search for Social Security Numbers. The software can also tell you who is working on documents together to give you an idea of how your organization is collaborating. One interesting feature is the ability for each user to do nearly instantaneous restores of their data.
If you’d like to know more about data analytics, in general, check out this comprehensive post by Enrico Signoretti.
Earlier this year I attended Storage Field Day 7 and a hot topic was, of course, hyperconvergence. The simplicity of all-in-one convergence is helping lead a transformation in the datacenter market – no wonder that we had three sessions on the topic. The session from Maxta was interesting because their offering helps overcome a preconception many in the enterprise space have about hyperconvergence. The issue is that of linear scaling whereby when you need storage you must add compute and vice versa. By eliminating the need to add compute power and storage simultaneously Maxta provides huge benefits when compared to others.
A Maxta solution consists of a minimum of three nodes. All virtual machines are protected in a RAID-1 style by keeping a second copy of the VM on another host. The 3nd node serves as a witness to ensure data integrity. Each node consists of compute, storage, and a hypervisor. The storage is designed as a hybrid configuration with flash accelerating a capacity tier of traditional spinning media. Flash will also be used for a journal and metadata. An all-flash configuration is also supported should the need arise. Each host has a VM appliance installed on it which consumes 4 vCPU and 8GB of memory.
A cluster can grow on the fly in three different days. The usual Scale Out approach where a new node is be added to the cluster with no interruption of service. Data is rebalanced across the new node. The maximum number of nodes is driven by vSphere limitations. In addition Maxta offers a Scale Up approach where new drives can be added to the host or smaller drives replaced with large drives. Either option is also done without service interruption.
Maxta has a fairly robust data services offering which includes thin provisioning, deduplication, compression, zero copy cloning, and snapshotting. I am not a fan of VMware snapshots as they incur a large performance penalty to the virtual machine. Maxta snapshots are more like a traditional SAN snapshot and have very little performance overhead. Another great feature provided by Maxta is fault domains for stretched clustering.
Maxta is sold either as an appliance of as software only depending on your requirements. Software licensing is done as raw backend data rather than being based on usable capacity. Non-storage nodes do not incur additional costs.
All in all Maxta offers a very interesting solution in the hyperconverged marketplace. They have spent a lot of time trying to set themselves apart from the other players and I think it’s going to pay off.
Disclaimer: I attended Storage Field Day 7 as a delegate. My travel, accommodations, and most meals were paid for. There is no requirement to blog or tweet about my experiences and I am not compensates in any way.Read More
I recently attended Storage Field Day 7 and spent some time talking about the concept of data virtualization. Data virtualization seeks to add a layer of abstraction between the storage type and the client. Data virtualization, similar to what server virtualization did for compute resource, seeks to free the data from the underlying physical resources. Primary Data seeks to make data virtualization the cornerstone of software-defined storage.
In November of last year Primary Data came out of stealth to address the problem of data mobility using data virtualization. Today data is locked up in storage arrays, public cloud providers, and local server storage. Each of these types of data repositories had different data service offerings ranging from rich to extremely limited. The metadata and data are locked away of the silo of the repository. A few solutions exist in the market today for data virtualization, but they rely on the data capabilities of the encapsulated storage target. Primary Data offers a true data virtualization platform which brings rich data services into the virtualization layer regardless of underlying target capabilities.
The key to Primary Data’s solution lays in the separation of the metadata from the data. This means that Primary Data becomes a control channel which brokers a connection to the data. One of primary benefits this method brings is access to the data completely protocol agnostic. It doesn’t matter if the backend is direct attached storage, fiber channel, network storage, or even an object layer. Data targets can be flash storage or magnetic media; it doesn’t matter to Primary Data.
Most modern storage arrays either fit into the high performance all-flash configuration or a hybrid array which offer native data tiering. Data is placed a array, locally on the server, or in a cloud provider based on a few factors. Generally the most prevalent of these are cost and performance requirements. People say things like “the fastest storage you have” or “the cheapest slowest storage” but often have no real idea of performance requirements of the dataset. By storing the metadata and brokering the data connection, Primary Data allows data to be moved to the correct target and ensure a balance between cost and performance – all without sacrificing any rich data services. Suddenly an older storage array without robust data services can leverage Primary Data.
Metadata on Primary Data offers several advantages over a normal coupled metadata and data approach. Primary Data houses metadata locally and globally de-duplicated, which helps reduced the storage footprint any environments which have copies and copies of the same data. Because metadata is centrally storage, searches and data accesses are accelerated. This Big Metadata creation uses by Primary Data has tremendous value for storage performance and cost considerations.
More and more enterprises are strings to implement Software-defined storage and running into a new set of challenges in doing so. The idea is the change how data is storages, placed, and accessed. Primary Data offers a unique offering to do just that. Its policy-based automation aims to become a key in successfully moving storage environments to the next level. Primary Data defiantly is looking to make a splash in the storage arena and helping IT shops transform how they think about data today.
Disclaimer: I attended Storage Field Day 7 as a delegate. My travel, accommodations, and most meals were paid for. There is no requirement to blog or tweet about my experiences and I am not compensates in any wayRead More
This week I’ve been busying working on my post Storage Field Day 7 content. I’ve got a few blog posts in the pipeline for some of the presenters. I think I give a unique perspective, given I’m currently in the enterprise space in the day-to-day trenches.
In the meantime all of the recorded content from SFD7 is available on YouTube. I encourage everyone to check them all out and draw your own conclusion about the presenters. Share what you learn with the community!Read More
I’m sitting in a car on the way to San Francisco airport to catch a plane back home to Cincinnati. For the past several days I’ve had the pleasure of being a part Storage Field Day 7. SFD is an independent event bringing community influences together with vendors to discuss technology solutions and provide open and honest feedback. It’s a great even to connect with other community members, discuss challenges and solutions, and create content for the community benefit – all of which I’m a supporter of.
Since this was my first time going I was surprised at how different it is from watching the feeds and video, especially for someone who doesn’t do a lot of public presentations. I didn’t anticipate how difficult it would be to find time to ask a question. Growing up in my house it was a huge “no no” to interrupt someone while speaking, but the limited amount of time at SFD means that is all but impossible. Every time I wanted to interrupt the image of my mom ready to smack me popped into my head! The next challenge is just how long the days are. When watching the live steam you can take a break between presenters. That isn’t possible as a delegate. As soon as the session ends we rush to pack, hop in transportation, and get ready to do it all over again. It makes for a very long and rushed day.
Speaking of a long day it‘s important to understand the mental fatigue felt by the time the event has concluded. During the presentations you’ll see us delegates on out laptops. You may thing we aren’t paying attention, but it is the exact opposite. We are researching, fact checking, tweeting, and thinking. It’s a huge amount of content during the presentations, not all from the vendor either. After the day’s events wrap up we come together as a group for some social time. Being the storage geeks that we are we end up talking about, you guessed it, storage! So when we should be relaxing, we are adding to our mental fatigue. With all this mental fatigue I wouldn’t change it for a moment. The bonding, learning, and networking prove to be invaluable.
I can’t express how much I’ve enjoyed my SFD7 experience. I wouldn’t trade the long days, hard work, and mental fatigue for anything. I just want to take a moment to say a huge thank you to everyone who works very hard to make this event happen. What you do for our community cannot be measured. I’m honored to have been a part of it. I’d also like to thank all of the presenters and delegates for making to time to connect with me. I know we are all very busy, but taking the time out to connect face to face is amazing. It is sad to think I won’t be able to throw things at Dave, blame random things on Ray, to make Howard want to hit me, or annoy Steven by ordering his drink. I’m glad to be going home, but I’ll miss the face time with all the new friends I’ve met.
Disclaimer: I attended Storage Field Day 7 as a delegate. My travel, accommodations, and most meals were paid for. There is no requirement to blog or tweet about my experiences and I am not compensates in any wayRead More