NetApp, CDW, and how not to treat customers

The opinions expressed on this site are the authors personal views and do not reflect those of his employer or his clients in any way.

Back in May of 2011 my company purchased a NetApp solution from CDW that consisted of two production NetApps ans and a passive NetApp solution at a DR site hosted in one of CDWs data centers. The servers were virtualized servers in a VMWare environment hosted on some pretty robust HP hosts.

Almost from the beginning the problems began. CDW sent one of the “best” trainers to work on the implementation and setup and for the knowledge transfer. To say that the whole experience was an abomination would be accurate. The configuration he wanted us to implement based on his “best practices” (three spare drives and two parity drive per twelve disk shelves) was wrong and left little room for growth, the tools he told us to use were out of date and not recommended to be used any more by NetApp, he could figure out how to get drives to mirror, couldn’t create LUNs and map them, etc. In short, it was not a good start.

We worked with CDW to get some remediation and they actually sent someone to get the system up and running for the most part. We still had many things to get running, but the ball was rolling. That’s when the second shoe dropped. We got a project manager that was extremely disorganized, did not follow through on items, and who’s solution to any issue was essentially for use to call the vendor rather than pulling one of the CDW techs (who we had contracts with) to help us. When we did reach out to one of the techs for assistance (per the tech’s own instructions) she was more concerned about chastising us the customer than actually getting us the help we needed…and paid for.

We then, at our insistence, assigned a new project manager. She was very good and got us to where we needed to be. Almost eleven months after we started, she guided us through getting all the little pieces and parts working.

But the process was not smooth and was fraught with setbacks, poor and unknowledgeable employees, threats, tantrums, and hard feelings. But we were done.

And then something happened….

Well, more specifically a few things happened. The first was that we reached the point we needed to start testing the DR solution. We quickly discovered that the best way to do this, and one that provided the best solution for data recovery in a non-disaster scenario was to use FlexClone. This allows an image of a mirror to be brought up without breaking the original mirror which would require a re-synch or possible a re-initialization of data.

Because our DR site was a “passive” site it was not licensed for this or many other items. CDW came up with a quick solution and worked with NetApp to get both of our controllers into active status if we purchased an additional disc shelf. A bit too much like arm-twisting for my liking but I justified it by seeing that with this new space and the active configuration, we could now do some archiving with the NetApp. And most importantly, all sites would now, per CDW, “be the same.” This would come back to haunt us as they were not truly “the same” but were missing a piece of software that would become key in this sorry little tale.

Around June of this year we noticed something at NetApp changed. Prior to June, calls were answered in fairly timely manner. Usually the same day, perhaps the next the call would be returned and the problem resolved. As June wore on, the time frame started slipping. Now it was two and three days. By July it was a solid three days. Now in August it is five to seven days to respond to the call with a solution.

I mention the time frame because it plays into the whole scenario with both NetApp and CDW. You see, CDW has postured themselves as a value-added reseller…that if there is a problem, we can always go back to CDW and get their help in getting the problem resolved.

In the beginning of July we were still working on getting the DR testing solidified and came to a stunning realization: our DR site that was “the same” was not license for SnapDrive. SnapDrive is used to mount RAW LUNs for SQL Database, Exchange Databases, etc. Suddenly we were not sure if we could recover in a disaster. This is almost a year after we started this project and we had no idea if we could recover or not. I was on vacation, so my co-worker asked for pricing on SnapDrive. A week went by with no pricing from CDW, so he asked for it again and was asked by CDW for the serial numbers of the NetApp controllers. Another week when by and I was back from vacation but we still had no pricing. I asked for it again. I was asked for the serial numbers again.

A few days later I called her about it, having still received no pricing. She said she would get it but to hold on because her other customers did not have SnapDrive at their DR sites and she was going to find out what they did. That was on July 27th.

On August 27th, after multiple promises by CDW that they were working on it, I had finally had enough of this nonsense and had to get my CFO involved to help pressure CDW to answer the relatively simple question: do we need SnapDrive at our DR site to recover in a disaster or not?

Almost two months had passed and we did not have an answer to this simple question.

<A brief pause on this story while I add other things stat were going on to fill out the entire picture>

In the meanwhile, we had another issue with our NetApp Operations Manager where is would “lose” the DR site. It wasn’t that the DR site would go down or anything, it would just lose the connection and stop sending data to it, even though it was on line and running. The solution was to increase the timeout period on the NetApp Ops Manager.

It took five days and six people (two from CDW and four from NetApp) to get that answer.

About the time we got that figured out, we discovered that On Tap 8.1.1RC1 had a goofy little glitch that caused it to slowdown the SnapMirror to a crawl, making it impossible to get the data across the network to the DR site. This problem was discovered on August 23. Since then we have heard the all to common refrain from NetApp that “all their engineers are to busy helping someone else with bigger issues” three times on this one case. It is now just a few hours shy of seven days with no resolution to this issue, our DR site is nine days behind in mirroring and our company is at risk.

In short, since June NetApp has been less than prompt in responding to issues that they deem as “non-critical.”

I like to think I am a reasonable guy, but waiting a week for a response to something so critical, or two months for an answer to a simple question is unacceptable. If they are that busy I can only see two possible scenarios: 1) they don’t have enough technicians to fulfill their support obligations or 2) they have some MAJOR software problems.

<Return to the main issue>

At this point, I felt it was necessary to pull my CFO into the problem as our DR site is, to our knowledge, not working. I detailed to him a brief time line of what has occurred, what the response has been thus far, and where we are at, emphasizing that all we are looking for is an answer to the question: do we need SnapDrive or not at the DR site?

He forwarded it on to the folks at CDW to find out what was going on. What he got back was essentially a ten paragraph ad hominem attack on yours truly talking and how they have bent over backwards doing all these wonderful things for our company and how they were just try to save us money…all directly to my CFO and not copying me on any of it. They even tried to contact him via phone bypassing me, the IT manager. Stay Classy, San Diego…

And that, folks, was the breaking point for me. Instead of responding and working to fix the issue and answering the damn question, they felt it was a better idea to attack the person asking the question with personal attacks and half-truths.

That said, the gist of the email to the CFO centered on the idea that the reason we were in the predicament was that we (me and my co-worker) were not properly trained. Additionally they would get with NetApp and arrange a time to get the answer to the question (which is what we were after in the first place and took us three weeks to finally schedule.)

So the conference call began with about ten people on it including at least one NetApp engineer. And so we asked one simple question: do we need SnapDrive or not at the DR site?

That was it.  That was the only point of having all of these highly paid people on a conference call.  It is why we had to pull out executive level company office away from his busy day to ask this question.

The answer from this highly trained and knowledgeable NetApp engineer: “I don’t think so. I’m 99% sure you don’t, but I’ll have to test it.”

Wow. Stunning. Two months we pursued this point and what do we get? A “we’re not sure.”

We all sat there stunned. Surely they discussed this before the call, hadn’t they? Apparently they had not, and our CFO forced them to give us a time to have an answer (which incidentally NetApp missed.)

They finally got back to us the following afternoon and let us know that we did not need SnapDrive do mount a RAW LUN at our DR site.

Just shy of two months later we finally had our answer.

Meanwhile, my CDW reps have now taken to not responding to my emails requesting help with the mirroring issue (not that they were a whole lot of help with the SnapDrive issue) and I have begun purchasing from another vendor.

In short, my recent experience with CDW and NetApp has really sucked. It has become a running gag in our IT Department that the App in NetApp stands for apology, since mostly what they do lately is apologize for their lousy support.

I am really having buyer’s remorse.

(Update 10-15-2012)