As a long time enterprise infrastructure specialist, I’ve spent countless hours trying to optimize the performance of environments. Early in my career, I spent some time on a team who worked very closely with the monitoring team where I learned how hard it was to correlate the volumes of data collected. We were collecting so much data about our environment that it was almost overwhelming. Things like the temperature of the CPU, how many storage IOs were pending, and memory usage was. We had all this awesome data and what did we do with it? We set up monitoring to make sure numbers didn’t cross a certain threshold. When it did cross that threshold, we sent an alert. All this data at our fingertips and all we used it was for alerting. I knew something was off, but I was green and didn’t understand that we were missing the bigger picture.
That was a long time ago, and I’m on a very different career path now, and I’ve learned that data matters. What I’ve learned is that we can use technical data to make business decisions. If you’ve never had to write a business justification for spending IT dollars this may seem foreign to you, so let me explain. Let’s say I have a server running a business critical application responsible for batch processing and sending invoices. Without this application, the bills never get sent to the customers, and the money stops coming in. When the money stops flowing the wheels of the business stop rolling, and it’s a huge problem. The physical server this application runs on is 6 years old, but it meets the business need. No one never complains about performance so it’s largely out of people’s minds. To the IT operation folks, this s an aging server that needs to be replaced, but that doesn’t translate into business value.
Is the IT operations person wrong? The answer is, of course, that it depends. To understand this, we really need to understand the performance characters of the server in question. Let’s look at the completely fictional story of an IT operations guy, let’s call him Alex.
With all the information being collected Alex can see that the nightly batch process consumed 100% of its CPU. Alex was also able to see that a dozen times a day the server was using all its memory. To Alex what does this all mean? That sometimes we are doing more work than we can handle. That is all he knows. What he can’t see is what the application is doing during does spikes so he makes notes of the peak times and decides to talk to the application support team. Once Alex talks to Jen from the application team, he is able to connect the dots that this is when a sales guy is entering a big order in. Alex knows he needs to talk to a sales guy to understand what is actually going on so he calls Todd, who tells him every time he uses the system to enter a sale he has to wait for 45 minutes. Todd has said it has always been like this so he has never complained to anyone. Alex now has a clear picture of what’s going on and enough information to see he can give the sales guy back hours of their day to make more sales. Even just giving Todd back 45 minutes a day is worth the cost of a new server.
I know the storage above is a simple example, but it holds true for any time IT is looking to spend money. IT can’t keep being a segmented cost center for the business and until you have insight into how changes in infrastructure impact the business that will never happen. I recently had a chance to meet with the team at CloudPhysics at Tech Field Day 11, and I have to say I was impressed. Right now the software collected and reports on all measure of metrics inside VMware as an initial scope. It collects loads of data and uploads it to their centralize data lake. The data is compared and correlates it in from phenomenal ways. It will take VM data, host data, and data store data and even knowledge base articles to gain insights into the performance of the system. On top of that CloudPhysics is building a platform which allows you to see how a workload would change if the underlying infrastructure would change. I was very impressed bu the team, but don’t take my word for it check it out for yourself.Read More
For the last few years many IT organizations have been moving some workloads to the cloud but in my experience, most enterprises have a “no cloud” policy still. That policy is, of course, not quite enforced and they have some cloud services being consumed due to shadow IT or Software-as-a-Service. Most analysis today are predicting in the next few years the “no cloud” policy will be nearly extinct. The problem many IT operations folks have with this is that you can’t outsource responsibility. Now, what exactly does that mean?
My favorite example of this has to do with what cloud providers think backup is versus what an enterprise thinks a backup is. A cloud service provider things backup is a way to protect itself from failures they cause, but an IT operations person sees backup as a way to protect the organization against any conceivable. The last thing they want is some VP calling and yelling at them that the files in their deleted items actually got deleted. Issues like that are one of the reasons enterprises have been skittish moving data to the cloud. What has been traditionally lacking is a way to move workloads from on-premises to any cloud provider. To really protect your business applications when moving to the cloud what you really need is the ability to move workload around from any location to any location. A multi-provider and multi-hypervisor vMotion.
Zerto is a software provider which started as a way to protect virtual machines inside of VMware. It started as a VM-level journaled replication technology that allowed you to replicate a VM from Site A to Site B. This had the interesting side benefit of allowing VMs to move between different CPU types without affinity. It wasn’t long before Zerto came up with a way to allow a virtual machine to move from ESX to Hyper-V. Next came the ability to replicate an on-premises workload to AWS and allow it to run. This was a huge leap forward but left a few things to be desired. First was the loss of choice in only supporting AWS, but we all understand that is driven by demand. More importantly was the lack of failback from AWS in the event of a recovery event. Don’t get me wrong, this was some really awesome stuff, but it wasn’t really what the industry needed to protect the business.
At Tech Field Day 11 we had a chance to get some insight into the Zerto product roadmap. Not satisfied to simply be a replication software provider, Zerto is working to position themselves for the future. They really see themselves being the glue that will allow an IT organization to move data seamlessly from any location to any other location. Imagine being able to move your on-premises workload to AWS and then to Azure. All done easily and with little to no pain.
In my twenty years of enterprise infrastructure experience, I’ve noticed a few things that are universal to every organization. One of the most universally time-consuming things about working IT is usually disaster recovery testing. We all know that business continuity is extremely important, but that doesn’t make testing and executing recovery plans any less expensive. It takes compute power to takes full and incremental copies of the data and, of course, storage to house the backups.Organizations also spend weeks and weeks of people’s time planning, documenting, executing, and remediating disaster recovery plans. Until needed business resiliency often seems like a waste of money and time – but that all changes when you need it. When finally needed everyone remembers what a great investment data protection is, but what about all the rest of time? Can’t data resilience be more than a one-trick pony?
The simple answer is “yes” it is possible to use all the data copies created for data resilience. This concept is starting to make some seriously awesome improvements to organizations embracing it. One of the early pioneers, actifio, coined the term “copy data virtualization” but others are calling it other things. The goal being to take the backup data and put it to use for more than just backups. This makes perfect sense to me since the data is sitting around doing nothing most of the time. Actifio lets IT do some really awesome stuff, but at its heart it lets users provision copies of data to be used for something. What exactly does that mean in the real world?
Let’s say we have an Oracle database administrator who is planning to make some major database changes. He doesn’t want to run this on the current development environment because that could impact the development teams. Traditionally the DBA is going to ask for a new environment to be provisioned, copy the data, and all the associated work like getting firewall rules. That is a huge work effort for everyone involved. This is where actifio can shine. Using their software actifio takes the copy of production, makes an instantly space efficient clone and presents it to an existing system. The DBA can then import the data and test his script.
We can take this to the next level because actifio has an extensive REST API, which can be leveraged by software plus salt or Jenkins. This allows a developer to take a copy of data for their development efforts. It becomes trivial for development to have a robust workflow which can clone data and servers for test. They have a great demo of this using ansible below for those who are interesting.
This idea of making use of data we already is really cool, but can be hard to understand. At Tech Field Day 11 we had some time to talk to the actifio crew, and I have to say they are doing some really impressive stuff. It’s so impressive, in fact, that other vendors are starting to use their marketing terms. Check them and other vendors are going copy data management out. I think something like this is key to the future of data protection.Read More
Up until a few years ago security was an afterthought in most IT organizations. It wasn’t until data breaches started to become public that nearly every company found a renewed focus on security. The investments for cyber security over the next 5 years is, according to some IT analysts, is going to exceed over $1 trillion. One major challenge facing organizations as they wrap with hands around cyber security is the lack of information about changes in the environment. Change is a constant in IT environments, but for a long time, we have had a severe lack of auditing and tracking of what has actually changed. To address that gap IT shops have implemented centralized audit logging, but that has left some serious gaps.
One of the biggest gaps I see is that audit logging is more about information and a lot less about how that information is presented. Let’s take a look at the simple example of creating a user inside of active directory. That single user creation will actually trigger nearly half a dozen individual events including events for the actual creation, enabling, password reset, and change of said account. This generates, literally, pages of data about the events. This can be a huge problem for creating accurate reports that reflect the change, especially if your target audience is a non-technical auditor. IT operations staff have to spend time not only creating the report but also explaining what it actually is. Even with a renewed focus on security, this inefficient user of resources hampers the ability to actually perform work. This problem is very common with traditional SIEM implementations, as it often seems presenting information is an afterthought.
At Tech Field Day 11 in Boston, we spoke with Netwrix about their solution to this problem. In my user creation example above they turn the event sprawl into a nice report about showing just relevant information like what user was created by who and when. That is a stark comparison to a traditional centralized logging system which gives information overload. This is just one of many examples of not Netwrix allows for deep analytics of security and auditing data. In this new world of security having access to what has changes allows us to not only be more security, but leverage security information to troubleshoot our environment. Netwrix is the start of the next generation of SIEM technologies which think about presenting the data not just collecting.
At too many organizations the security and IT operations team have an adversarial relationship. The IT operations staff sees security as coming in waving their baselines and demanding immediate work. While the security work does have value, it isn’t part of the core work for IT operations, but that doesn’t make it any less important. In the current state of data breaches, it is impossible for organizations not to focus on security. However, it doesn’t mean IT operations and security can’t collaborate on tools which help both teams. I think this is a nice place for a change auditing tool to fit. When troubleshooting major incidents one question always asked is “What changed?”. This is asked so often because we know very often incidents are caused by change. Wouldn’t it be nice during an incident troubleshooting sessions you could easily answer what change happened recently?
Moving forward the partnership between IT operations and security has to get stronger. We need each other to be successfully and allow for protected thriving business. To foster a true partnership we need mutually beneficial tools like Netwrix to help pave the road. Until security tools help operations, we can expect them to continue to have an adversarial relationship. Let’s hope that newer generations of tools that are starting to appear can help us bridge the divide between the two sides of the house.Read More
In the IT industry today it is nearly impossible not to hear the word cloud dozens of times a day, but many storage administrators treat cloud a four letter word. The basic tenant for a storage administrator is to ensure an organization’s data is safe and secure. If the storage administrator makes a mistake bad, bad things happen. Companies fold, black holes collapse, and sun exploded. NetApp is trying to change the minds of those storage administrators, and for good reason. IT organizations are always looking to do more work with lesss money, and cloud storage can’t be ignored as a viable way to do that.
At Storage Field Day 9, NetApp talked a fair amount about how they are embracing cloud storage as key to the industry’s future. No more a storage vendor affords not to embrace cloud storage, and NetApp sees it as a key. Part of the future for NetApp is expanding the hybrid array with sub-LUN tiering to including cloud storage as a replacement for NL-SAS. This makes perfect sense since the performance characters of a slow SAS drive is not too dissimilar to that of cloud storage. In addition to this tiering, it also has a goal of snapshots being truly cross technology.
All in all the direction NetApp having makes their future bright!Read More
This week I’ve been spending some time at Pure Accelerate, where I’ve been able to talk to the engineering and executive teams behind the new FlashBlade system.
In an attempt to embrace its start up cultural roots, Pure Storage developed FlashBlade as a startup inside the company. What that means is they hired new engineering staff to build a unique and separate product from the ground up. The new team members, to keep the development secretive, were not connected to other traditional Pure employees on Linkedin. While the development was largely separate, some of the FlashArray development team did help where it main sense. That collaboration resulted in a fork of the FlashArray management interface which is used by FlashBlade.
The result of the startup of a company is a new and a unique product. The first thing to understand about FlashBlade is what it is not. It is nor a replacement for a low latency and general purpose workload. The architecture of the system prohibits Flashblade from reaching the sub-millisecond latency of a traditional All Flash Array. In the current iteration of the product only supports file and Object access. In fact, it currently only supports NFS v3 for file access, since it is a prevalent and easy protocol. Adding SMB support is being worked on, but that adds a new layer of complexity where many products have stumbled. FlashBlade is, as a 1.0 product, missing some basic features like snapshots. The mentality was to start shipping product, rather than trying to make the product perfect.
FlashBlade is designed to be low-cost, flexible, and scalable. One way to ensure the costs were low, was to ensure that the actual NAND was the largest component costs of the array. Pure has built a custom hardware blade consisting of an CPUs with direct connect to NAND flash (8Tb or 52Tb) for storage and NV-RAM for power protection of writes. By eliminating the SSD, the design can be kept simplistic and adaptable. Connecting the blades together is a built in low-latency 40Gb Ethernet switched which provides blade-to-blade traffic as well as client connectivity. Currently, FlashBlade scales to 1.6PB in a 4u chassis. However, this should grow with the addition of an external network switch. The NAND gateway is a custom FPGA designed to allow abstraction of the particulars of the NAND. This means the NAND chips could change to a different vendor with a software change rather than an ASIC hardware redesign.
The Software behind FlashBlade, known as Elasticity, runs in a distributed fashion on the Intel CPUs on each blade. This software implements a common object storage and runs data services like encryption and erasure coding. This is also where client access protocols are implemented.
While the architecture of FlashBlade is certainly well thought out and designed, the currently shipping product is lacking many of the features of a NetApp Filer or EMC Isilon. Many people are speculating that this is designed to talk on the Filers and Isilon, but the product is still very new. I am looking forward to seeing where Pure takes FlashBlade and how customers in the field put the system to use.Read More
It wasn’t that long ago that Violin Memory pioneered the high-performance flash market. They created the market for speed but were lacking a rich data services offering. It wasn’t long before many other all-flash storage arrays came to market which may not have been as fast but were good enough with a rich feature set. The lack of data services and competition ultimately causes a lot of pain for Violin Memory. It was so bad that many people thought the company was doomed, but after our SFD8 presentation I’m not convinced.
The meeting had started before the cameras were actually rolling, which I think was really unfortunate for Violin. We were doing two things: eating breakfast and talking financials with the CEO. As we listened to him talk one thing became apparent – Violin financials say they have a chance. CEO Kevin DeNuccio said they had a quarterly burn rate of $12mm with a break-even point being 4 to 8 quarters out. They also have $120mm in cash from the latest funding round. What this means if we need to keep a close eye on them over the next few quarters to see how the burn rate changes. If they can manage to move down to a $7mm – $10mm burn rate, I can see where they make pull back from the brink.
Violin Memory has done a lot of work to change their technology to move forward as a platform. Building rich data services is no easy task, but they seem to have done a great job. The underlying hardware has evolved, but they now have several platforms based on need. Watch the full SFD8 presentations for a deep dive into the technology, but sufficed to say it looks great. With the technology being a major investment, and the financial picture looking promising, Violin Memory may just make it.Read More
If you’ve worked in IT for any amount of time you’ve likely heard the term “secondary storage” which you’ve known as a backup tier. You’ve also heard of “tier 2” storage for test and development workloads not needing the data services of production. These two terms have had very different requirements. Backups target storage is generally cheap, deep, and optimized for sequential writes. Test/dev storage, on the other hand, needs to have different performance since it has actual workloads. Cohesity thinks this needs to change. They content that secondary storage needs to be anything that is not primary storage.
Redefining a term and carving out a new market segment is no small task, but Cohesity shows some pretty interesting use cases:
- Data Protection for VMware environments – Once a hypervisor snapshot is created the data is sent to the Cohesity array where things like deduplication and replication can be applied. This gives you unlimited snaps without the performance impacts of VMware
- User data access using NFS and (soon) SMB.
- Space efficient copies of test and dev can be created on demand with integration with tools like Puppet and Chef in the works.
- Data Analytics allowing elastic search of all supported file types on stored on Cohesity
Just thinking about all the uses cases makes my head spin, but it does all seem really cool. If you have all of these services in-house today, you’re likely paying a fair amount of money for each one. They may also be highly disconnected from the storage array – and we all know data locality matters. This also creates the problem of data sprawl with so many copies of the data and copies cost money. Cohesity calls this the data iceberg, where only the primary data is above water, but the vast majority is hidden below the water.
It all seems highly inefficient, which is why Cohesity was founded.
Cohesity was founded to be an infinitely scalable pool of data but has only been tested up to 32 nodes. Given the metadata requirements per node, I can’t imagine they won’t run into a hard limit with the current implementation. All in all, I like what they are trying to do, but they have a hard path in front of them. Since they are trying to do so much, they’ll face an interesting challenge in defining the problem in their marketing materials. Cohesity is a version 1 product so they have some big gaps in what they do and what they do well, but hopefully that changes as they mature.
Over the last decade, storage arrays have been evolving to becomes faster and protect data better. They have become smarter with where and how data is stored and how to predict when data will be accessed. However one thing has not, by and large, changed. Storage arrays haven’t learned much about what data is actually stored on them. In the last year, this has begun to, in part, change. A few companies now have products which provide some type of data awareness, but they aren’t created equal. Content awareness is nothing new. However, it has mostly existed for legal and compliance separate from the storage array.
Today I attended Storage Field Day 8 and had a chance to meet with Andy Warfield of Coho Data and I saw a different take on this. A few months ago I saw Coho Data start talking about running containers directly on their storage array. I dismissed this mostly edge cases and largely not needed, but today I thought differently. What does this have to do with content awareness? Hang with me for just a minute and I think it will all become clear. Andy showed us a simple demo of a container running on the array that had some simple code. All the software did was convert an image from color to greyscale. On a Windows VM with access to Coho Data storage, a full-color image was dropped into a folder. Instantly the software running on the container kicked in and converted the image to greyscale. The windows VM had direct access to both images now.
So this use case is pretty lame, but the possibilities around this are awesome. By having this type of direct access to the data can enable things like virus scanning, data loss prevention, and other workflows. A classic example of an enterprise is an application writes data to a network share. Every 5 or 10 minutes another application scans that location looking for the file and when found it takes action. This whole workflow is filled with artificial delays.
This type of ability to view the content is poised to change how businesses think about data storage. By having direct access to data using APIs and protocol access workflows like this. Also, it opens up external software to do all sorts of cool and interesting things like data protection and other custom workflows. I’m not sure that running generalize software on a storage array is overly useful, but some uses cases it is a huge benefit. I’m excited to see how Coho Data develops a partner network who can make use of this hook.Read More
At VMworld this year I attended Tech Field Day Extra event as a delegate. Once of the more interesting presentations came from DataGravity. One of the reasons I found them so interesting is they are attempting to add a new dimension to the storage array market. Since the storage array market was created people have cared about three things: data resiliency, speed, and capacity. These three metrics should not come as a surprise since the intent was to leverage a larger pool of capacity for reduced cost and improved performance. Fast forward to now and these three things have not fundamentally changed much. We still care about protection our data, making it perform well, and how much data we can store. DataGravity is trying to change this by building new Data Analytics capabilities into their storage array.
If you’ve every worked in a large enterprise, you have most likely dealt with legal and security teams. They always ask the storage administrators “Can you tell me what data we stored on this server?” and “Can you tell me who accessed this file?”. To which Mr. Storage Administrator has always replied a firm “No.”. After all the storage administrator is just a steward of data and not the actual owner. He or she has no idea what the end user writes to space. Legal and Security teams use things like Data Loss Prevention and data discovery to answer their questions. Doing this process, however, is a bit of a pain for everyone involved. Wouldn’t it be nice of your storage array knew what it was storing? That is what Data-aware storage is all about.
DataGravitiy can search hundreds of file types to do some interesting things. For example, it can tell you who is talking about a particular keyword or search for Social Security Numbers. The software can also tell you who is working on documents together to give you an idea of how your organization is collaborating. One interesting feature is the ability for each user to do nearly instantaneous restores of their data.
If you’d like to know more about data analytics, in general, check out this comprehensive post by Enrico Signoretti.