To improve accuracy rates, industry-leader Yap, Inc. refines its transcription algorithms by vigorously analyzing spoken language with a powerful computational farm. The firm needed robust, cost-effective solutions to store terabytes of voice files and stream them to its banks of high-end computers.

Read more below about how JetStor RAID Arrays enable Yap to accomplish these critical tasks and provide customers nationwide with the most accurate speech transcription rates in the industry.

Yap is Perfecting Text-to-Speech with JetStor® RAID Solutions

THE ORGANIZATION One of the more remarkable and useful capabilities of digital communications is converting spoken language into text, and no one does it better than Yap, Inc. Founded in 2006, Yap introduced the world’s first fully automated cloud-based speech recognition platform. Its voice-driven capabilities let users interact more naturally with their handsets and offer service providers greater revenues per user from increased voice and data usage. Enterprises like Microsoft, Sprint, and MetroPCS use Yap’s Speech Cloud™ for such applications as voicemail-to-text, mobile messaging, conference call transcription, and call mining. Moreover, when Yap was selected as a finalist in Silicon Valley’s first ever TechCrunch event, often called the “American Idol” of the technology industry, its speech transcription technologies were widely acclaimed for curbing texting while driving. The firm, based in Charlotte, North Carolina, also won the VMA Innovation Award, a distinction given annually by Europe’s leading wireless operators.

THE CHALLENGE Developing software that can transcribe spoken language into text with reasonable accuracy is extremely challenging. Interpreting inflections, pitch, accents, dialects, and even the “ums” and “ahs” common in everyday speech demand programming that approaches the sophistication of artificial intelligence. Yap’s solutions currently deliver an industry-leading accuracy rate, which is why they are being rapidly adopted by service providers and telecommunication companies. This impressive performance, however, fails to satisfy Yap’s team of scientists.

Ryan T., a research scientist for Yap, explained that a form of Moore’s Law is at work in the speech transcription field. “Just as the number of transistors on a microprocessor will double every two years, the error rate for speech transcription drops by 10 percent every year,” he said. “The field is very competitive and to flourish, Yap must remain in front of the curve.”

As a result, Yap conducts vigorous research to improve its speech transcription algorithms. At its data center, it continuously operates a powerful computational farm to refine its statistical models by analyzing terabytes of people’s speech. “The more data we process and study, the more accurate we can make our solutions,” said Ryan. “Data lies at the core of our business.” Storing this data while making them rapidly available for high-performance computing is a daunting task.

Yap surveyed the marketplace for solutions but rejected large storage arrays from manufacturers like EMC. “We were using the General Parallel File System, which is a clustered file system developed by IBM, so we didn’t need all the bells and whistles that drove up the costs of monolithic arrays,” said Ryan. Moreover, Yap wanted systems that were modular, enabling it to quickly add storage capacity as its research demanded. “To support our work and ensure we remained competitive in the field, we needed primary storage that is fast, reliable, scalable, and cost-effective,” Ryan added.

BENEFITS IMMEDIATELY REALIZEDBy turning to AC&NC solutions, Yap gained a high performing storage cluster to support its research. Every JetStor SAS 516F RAID Array can house 16 450GB 15K RPM SAS disks for 7.2TB of storage. Yap’s initial Jetstor installation was provisioned with 30 terabytes of capacity and linked to its server farm, comprised of powerful dual-socket, quad-core devices, with 4Gbit Fibre Channel connections. When even greater bandwidth is required, staff can upgrade the solutions to 8Gbit Fibre Channel links. “Our JetStors store and deliver data as quickly as our servers can handle the traffic,” said Ryan.

Yap determined that each JetStor SAS 516F platform, when configured for RAID 5, delivers the I/O bandwidth to support three of its high-speed servers. “Unlike a large storage array, our JetStor cluster is modular. This was quite important because as we continually expand our compute farm, we can quickly and easily boost storage by installing additional disks and chassis to the cluster.” The JetStor solutions also provide the reliability to support Yap’s around-the-clock research environment. “Once we set up an array, which is simple to do, it requires hardly any maintenance,” said Ryan. “Our JetStors have proven to be rock-solid primary storage.”

Ryan also explained that AC&NC delivered these capabilities at a fraction of the expense of many competing solutions. “A large storage array from one manufacturer cost four times more than our JetStor systems,” Ryan added. “We pay only for the functionality we need and want. And whereas many providers seem to overprice their disks, AC&NC disks are very cost-competitive. JetStor RAID Arrays are affordable and deliver an unbeatable return on investment.”

HOW WE DID IT Yap first built its research environment with SATA arrays from another vendor, but the cost of owning these systems was high because their disks were expensive. Additionally, Yap’s research quickly advanced and the firm required greater performance from its primary storage. Yap then learned about AC&NC, discovering that the firm’s faster SAS arrays and disks offered the sweet spot for performance, scalability, and economy.

Yap’s JetStor 516F SAS RAID Arrays connect to the server farm at 4Gbit Fibre Channel speeds. The servers, in turn, attach to Yap’s computing cluster with Gigabit Ethernet links. The JetStor solutions house all the voice data that Yap’s computing resources access from the servers. “We’re phasing out the SATA drives,” noted Thomas. “The JetStors give us the fast storage we required to accelerate our research.” To present the JetStor arrays as a single storage pool to the servers, Yap deployed GPFS, a high-performance shared-disk, clustered file system. By clustering the storage devices, administrators improve data throughput, simplify scalability, and avoid managing the JetStor solutions as individual devices.

“The JetStors work well with GPFS, which is used by many of the world’s most powerful supercomputers,” said Ryan. “The combination of the AC&NC solutions and GPFS is ideal for a computationally-intensive environment like ours. With these technologies, we’re poised to get ever closer to perfectly accurate speech transcription.”