Achieving 1.8 Million Iops with Commonly Available off The Shelf Components

1.8 Million IOPS. That seems like a number reserved for high end proprietary storage systems out of reach for all but the wealthiest of IT budgets. Achieving that level of performance using commercially available off he shelf components, in a word – incredible! It took the vision, experience, and cooperation of industry leaders coming together to design a system that not only achieve the results desired but far exceeded them. This project began with Orange Silicon Valley, a fully owned innovation and research subsidiary of the telecom giant, Orange. At Orange Silicon Valley we are always striving to bring disruptive innovation that can address the IT needs of Orange and our clients. We are always interested in finding ways of doing more with less, and maximizing asset utilization. We believe that based on open standards and open architecture it is possible to deliver a high end appliance type of performance at a much lower TCO (total cost of ownership). At the world races twoards Exascale We envision that "extreme compute" can become more affordable for enterprise IT. The challenge this time – design a high performance, linearly scalable appliance-like system that can handle an intensive I/O bound online transaction processing workload. The system will be considered Mission Critical ++, live, customer facing with zero downtime SLA (service level agreement). Easy enough. There is only one additional request. Do it using commercially available off the shelf components AND, be able to demonstrate a significant reduction of total cost of ownership. The Orange team reached out to their trusted technology advisors at Hyve Solutions to help them identify the right technical direction for the project. Hyve Solutions, a division of SYNNEX Corporation, is a leader in purpose-built data center server and storage solutions, designed to meet their customers' specific workload requirements. Orange Silicon Valley came to use with a clear but challenging set of requirements. Orange had thought through their technical and business requirements in great detail. My engineering team went to work and got creative so we can provide a higher level of collaboration and flexibility to solve their tough technical problems and exceed their goals. The teams worked together to develop a plan for the type of system and performance they were looking to achieve. The next step was finding the right mix of off the self components to get the proof of concept system built. This wasn't and easy task. There were many important decisions to make. Picking the right partner for use was as important as picking the right components. We needed partners that were at the forefront or their respective technologies, but also had the resources to support us both evaluation and production phases. The team chose a Sandy Bridge based platform with 6 of the PCI Express 3.0 slots for its exceptional balance of high data rates and clock speeds. The system supports a quad-channel memory architecture, as well as high speed DRAM at 1,600 mega transactions per second. For the RAID controller, the chose was made to go with LSI. After a series of discussions with LSI we came to the conclusion that they were the logical choice for our storage controller needs. The range of the LSI product portfolio, the field-tested reliability and maturity of their designs and their organizational depth made them the odds on favorite. LSI strives to deliver industry-leading storage technologies, accelerate applications, and improve the end user experience. We are proud to collaborate with Hyve and Kingston in order to help Orange Silicon Valley deliver against a very aggressive goal. Using our years of technology leadership and storage expertise to help the team reach a milestone of over 1.8 million I/Os per second is a powerful achievement. We believe it is consistent with our efforts to improve the end user's overall compute experience and help Orange deliver more information to more users faster. The last two components the team needed to source were memory and SSDs (solid state drives). We needed a large memory and solid state dive footprint for this design. And with Kingston's solid reputation for reliability in the enterprise space they were an easy contender. On the SSD side their E100 showed promising performance and they offered great engineering support. After initial testing their drives exceeded exceptions and we moved forward with Kingston as one of our partners. When the Orange/Hyve team approached us to participate in this project we immediately recognized that this was an opportunity to be a part of an incredible milestone in our industry. Our job was twofold: On the DRAM side, to make sure the memory configuration was optimized for the best performance. And on the flash side, that we selected the right class of SSD that delivered not only exceptional performance, but also achieved the exacting endurance requirement of the intended workload. Now that all the partners were in place, the team began testing the system against the goal set for the project. With system configuration locked down, each partner set up the hardware in their lab environments to run the benchmarks simultaneously. Once we had the hardware set up in our labs we began tuning the environment for optimal performance. We used the FI/O benchmark under CentOS 6.3 to benchmark the 24 drive subsystem. We attached the 24 SSD's to 3 LSI MegaRAID 9265-8i RAID controllers, with an eight drive RAID 0 configuration on each controller. This allowed us to take advantage of the aggregate performance that can be achieved by distributing the workload across the PCI Express channels. To further improve performance we used LSI's "Fastpath" performance option that unlocks additional IOPS by changing the characteristics of the firmware to optimize for SSDs. Until recently one of the roadblocks to higher storage performance with SSDs has been that RAID controllers were engineered for mechanical hard drives. SSDs allow for such high performance that their real potential was being held back. We now have RAID controllers available to us that are specifically designed for SSDs. As this project demonstrates, we are now able to scale SSD performance to levels that weren’t possible as recent as even a year ago. The initial results were very promising. We measured close to our goal of one million IOPS on our first test run and tuned the system up until we got close to 1.8 million IOPS consistently. We started synthetic I/O benchmarks in early 2012 that emulated real world OLTP behavior and we crossed the Million IOPS scalability barrier reminiscent of the sonic boom associated with achieving Mach 1. With a 24-drive bay fully populated with Kingston drives on RAID 0 powered by 3 LSI cards we exceeded 1.8 Million I/OPs. We are very close to hitting a figurative MACH 2. We are working on using this platform for OLTP use cases that are similar to our mission critical extreme I/O demanding applications. We might need to pack a few more TERAFLOPS in the box to be able to fully utilize the near 2 Million IOPS potential of the solution. We’ll find out if this is the case as we keep making progress, so stay tuned! In the end, not only was the project goal achieved, but it exceeded all the partners’ expectations. Our vision was to design a high performance, linearly scalable system using COTS components with significant reduction of TCO. Our design efforts are targeted for mission critical, live, customer facing, zero downtime SLA’s and associated with intensive I/O bound Online Transaction Processing Workloads. With our colleagues at Orange Silicon Valley we were able to able to work with design partners towards the goal of achieving extreme compute at commodity cost. This is very exciting for us! Now we have made this an open architecture that any IT organization can build for their extreme I/O needs. For our database consolidation platforms we expect Carrier Grade performance and reliability. Achieving that with a significantly lower TCO becomes a key game changer for IT. With this proof of concept and its ability to deliver performance, reliability and scale for high I/O bound environments, and all at a reasonable cost, the only question that remains is: How can you use the 1.8 Million IOPS to overcome your I/O challenges? “It’s kind of fun to do the impossible” - Walt Disney