At the China Hadoop Summit 2015 in Beijing this past weekend, I talked with various big data players, including large consumers of big data China Unicom, Baidu.com, JD.com, and Ctrip.com; Hadoop platform solution providers Hortonworks, RedHadoop, BeagleData, and Transwarp; infrastructure software vendors like Sequotia.com; and Agile BI software vendors like Yonghong Tech.

The summit was well-attended — organizers planned for 1,000 attendees and double that number attended — and from the presentations and conversations it’s clear that big data ecosystems are making substantial progress. Here are some of my key takeaways:

  • Telcos are focusing on optimizing internal operations with big data.Take China Unicom, one of China’s three major telcos, for example. China Unicom has completed a comprehensive business scenario analysis of related data across each segment of internal business operations, including business and operations support systems, Internet data centers, and networks (fixed, mobile, and broadband). It has built a Hadoop-based big data platform to process trillions of mobile access records every day within the mobile network to provide practical guidelines and progress monitoring on the construction of base stations.
  • Big data delivers real business benefits to Internet companies.Visionary Chinese companies in Internet verticals — firms that were born digital — have achieved substantial business benefits by adopting big data platforms. Ctrip.com, one of the largest online travel agencies, used MLlib component on Spark to improve its hotel sorting and recommendation engine through machine learning; this increased its order submission conversion rate by 69%. JD.com, one of the leading eCommerce players, leveraged Spark streaming to enable near-real-time transaction monitoring for sellers on its platform. Baidu.com, with the largest online search market share in China, also used Spark with Tachyon to build its Baidu MapReduce analytics service for public cloud.
  • Local ISVs are growing up to compete against global peers. Hortonworks had already announced a partnership with leading IT service provider Pactera in June 2013; in December 2014 Cloudera also announced that it would start business operations in China. Local vendors are also growing quickly. SequotiaDB has become the fourth NoSQL database (after MongoDB, DataStax, and Couchbase) to be certified by Cloudera and has successfully supported TB-level log analysis for telco carriers. And BeagleData’s big data platform successfully enabled the transformation of traditional insurance companies’ technology architecture for precision marketing, lowering their costs by 78% and increasing performance by 127%.
  • Tachyon and Spark streaming will drive Spark adoption. As a memory-centric distributed file system, Tachyon effectively helps Spark separate computing from storage and addresses the memory bottleneck of Spark’s JVM. Compared with Storm, Spark streaming based on microbatch processing can not only replace its competitor in some usage scenarios, but also share the framework with Spark, making it easier for users to learn and manage.

What’s your view of the Chinese market ecosystem and the future of Spark and Storm? Please let me know your thoughts.