Performance and Storage Consolidation for Databases with Flash
Databases: you’ve got them whether you build software or not, and they’re usually the prima donna component of a software system. They take up lots of space, they demand dedicated storage hardware with lots of throughput, and everyone wants their own enormous copy to play with. Connections Education’s software as a service (SaaS) Education Management System uses 240 cores of database servers in production alone, plus dozens more database servers to support a myriad of non-production environments, which pummels a normal storage backend to a pulp.
In addition to already-high input/ output (I/O) needs, Connections Education has unpredictable data growth, unpredictable new I/O requirements due to the introduction of new features by its software development team, a growing population of non-production database environments, and demand for production-level I/O for development, tuning, and load testing.
There’s no way we could support our increasing non-production database needs on spindles–as our sysadmin manager puts it, that many racks “would put us out of the building.” The organization tried to use a combination of database restores onto direct-attached storage and Storage Area Network (SAN) snapshots on spindles. Unfortunately, we were limited on spindle count in our dev/test environments because of Network Operation Center (NOC) space and budget, the storage I/O throughput wasn’t great, and the restores frequently failed and left environments broken. When the business asked for new environments, we used to have to say “no” or “only if you have budget for more disks.”
We were tripping over the limitations of traditional SAN storage. The massive numbers of spinning disks required to achieve I/O throughput cause major storage sprawl. Every copy of the database needs its own large swath of disks for performance. Database administrators were tied up with keeping non-production environments updated instead of helping software developers and tuning queries. Database administrators and storage admins were spending weeks planning and executing storage allocations and migrations, which provided zero direct business value.
To improve the situation, we turned to flash—not a tray of flash drives in our traditional SAN to supplement the spinning disks, but a new breed of storage: the purpose-built flash SAN. After a proof of concept with one manufacturer and interviews with a couple of others, we settled on EMC’s XtremIO product for the feature set, price point, and future integration with our other EMC SANs.
Compared to traditional spinning disks, NAND flash storage media is blazingly fast and compact. Since it has no moving parts, it uses less energy, produces less heat, and breaks far less often. A few years into our flash SAN usage, we have yet to see a disk failure; losing drives in our traditional SAN was a frequent occurrence. EMC’s XtremIO SAN has exciting features that a traditional SAN doesn’t, including inline deduplication, inline disk compression, inline data encryption, and writeable snapshots that are instantly available and meet the same I/O demands as the original.
“We had to carefully carve up the disks so that workloads wouldn't disturb each other”
With our traditional SAN, we had to carefully carve up the disks so that workloads wouldn’t disturb each other. With a purpose-built flash SAN, I would be completely comfortable running all twelve of our environments – production and non-production – off one pair of SANs that fit in a single rack. With all environments running at full blast, we wouldn’t even scratch the surface of the flash array’s capabilities. When we put the XtremIO into production, our database I/O latency went from averaging hundreds of milliseconds per operation to flat-lining at near zero (1.5 ms/op). Outside of production, we went from six limping environments to ten stunningly-fast environments— making our developers more productive and enabling our business users to serve the students we support better. We can even add new environments whenever someone dreams up a need for them – the databases don’t take up any additional space or present an I/O throughput challenge.
There is very little planning involved in flash storage. What used to consume immense employee time and effort for weeks is now streamlined and nearly effortless: we buy for space, not throughput, because throughput is no longer a concern, and there are no raid groups, tiers, caches, disk pools, or thick provisioning to mull over. With the latest model of EMC’s XtremIO SAN, the disks are encrypted, which helps us meet regulatory requirements.
In addition to a dozen copies of our databases, we also maintain months of daily database snapshots so that when someone accidentally deletes or mangles some data and it isn’t discovered for a few weeks, we can immediately bring the appropriate snapshot online, instead of recalling tapes from Iron Mountain and scrambling to find somewhere to restore them. Snapshots, even when the data in them is changing daily, are so efficient that we have nearly 190 TB of data consuming 10 TB of actual space. We refresh 5 TB of data on-demand, daily and weekly to support our software developers and business users.
Connections Education had reached the capacity of traditional SANs, and flash technology has provided a major relief. For the first time in a decade, the organization did not need to budget for additional storage this year, and finally, no one is complaining that the database is slow.