Cloudera recently announced it is leading an initiative to create open source API compatibility testing frameworks for accelerating the adoption of Apache Sqoop and Apache Flume. The initiative will take the form of new Apache sub-projects for compatibility and are free and open to anyone-vendors, users, or contributors-with an interest in seeing open, multi-vendor standards even more widely adopted, to enable more next-generation, analytic workloads.

“Today, developers who build applications that include Apache Sqoop and Apache Flume don’t have access to open source tools for validating compatibility of their implementations. This introduces risk, cost, and time into application development,” said Mike Olson, chief strategy officer of Cloudera. “Our intent for these testing frameworks is to streamline the process of building next-generation products and apps for ingesting and analyzing data, and to assure the market that any product that correctly passes these tests is compatible with the releases from the Apache Software Foundation.”

As the market share leader in enterprise Hadoop, and regarded by many industry experts as the benchmark for Hadoop adoption and commercial success, Cloudera is proud to contribute to advances in the open source community and to working with the Apache Software Foundation to seek even more elegant and impactful ways to foster the Hadoop ecosystem. The API compatibilitytesting framework initiative reflects Cloudera’s commitment to accelerating the adoption of open source standards upon which the big data platforms of tomorrow are being built today.

Data analysis is only half the battle; getting the data into a Hadoop cluster is the first step in any big data deployment. Apache Flume and Apache Sqoop are key in the movement of massive data.

Apache Sqoop has demonstrated value for efficiently transferring bulk data between Hadoop and structured data stores, such as relational databases. As companies look to tap into all of their data assets, for example to obtain a 360-degree customer view, they must do so quickly and efficiently in order to get real-time, meaningful insights to dramatically impact and influence future customer engagement.

Apache Flume has proven very effective for collecting, aggregating, and moving large volumes of streaming event data into Hadoop clusters. However, as data volumes grow from machine-generated data and streamed in from IoT devices connecting at infrastructure endpoints, the rapid ability to move that data becomes more critical and time sensitive. The ability to put in place open source standards for data collection, data flow, and aggregating data stores now makes future discoveries possible.

Supporting Organizations:

  • Accenture
  • Capgemini
  • Celer Technologies
  • Corvil
  • Couchbase
  • Dataguise
  • Dell Computer
  • E8 Security
  • Fortscale
  • Intel
  • NetFlow Logic
  • NetIQ
  • Oracle
  • Pentaho
  • Quaero
  • Rocana
  • Syncsort
  • SAS
  • SGI
  • StreamSets
  • Talend
  • TCS
  • Teradata
  • Tibco

Supporting Quotes:


“Accenture and Cloudera established a formal alliance in June 2014 to help clients drive real business results from big data. Cloudera’s new open source API compatibility testing framework enables our technologists to participate in Apache Foundation projects and learn more deeply about Apache Sqoop and Apache Flume. As a result, Accenture can further help clients to mobilize and manage data and generate insight that can enable them to defend and differentiate in their markets.”

— Narendra Mulani, Senior Managing Director, Accenture Analytics


“Most big data projects include a real-time decision component, and more and more open source technology such as Kafka, Sqoop, or Flume and of course APIs are key components. More than ever, the market demands a shorter time to market. Using an open source compatibility testing framework will dramatically improve the time to market and the reliability of these business critical functions.”

— Manuel Sevilla, CTO Global Insights & Data Practice


“As a provider of network data analytics for IT operational intelligence, Corvil is asked to integrate with many big data solutions. Standards such as Sqoop and Flume greatly reduce the required development burden and allow us to respond to market demands with greater agility. It is hugely advantageous to see such standards being more broadly updated, and to know that implementations are validated by Cloudera.”

— Donal O’Sullivan, VP Products


“Couchbase applauds Cloudera’s moves today to provide frameworks for certifying big data architectures on open technologies such as Kafka, Flume, and Sqoop. Enterprises across every industry are building next-generation big data platforms that incorporate open technologies from NoSQL and Hadoop vendors – having a free, open test framework will make it faster and easier for them to deploy technology needed to support mission critical applications.”

— Bob Wiederhold, CEO


“As an early-adopter of open-source technologies, Fortscale believes that Cloudera’s efforts to accelerate the adoption of Apache Kafka will enable faster growth of innovative solutions that can scale effectively, thereby generating better value for customers. This was demonstrated numerous times when Fortscale successfully delivered products built using Hadoop, Kafka, and Flume as key building blocks.”

— Guy Mordecai, Director of Product Management

Intel Corporation

“Enterprises like Intel are increasingly investing in big data analytics solutions that require the collection and processing of streaming as well as bulk data. Platforms based on open standards, open source software, and open APIs can significantly reduce the cost and complexity of application development and deployment. Intel has a long-standing commitment to standards-based innovation and supports this initiative to enable greater compatibility between key components of the Apache Hadoop platform.”

— Ron Kasabian, VP Data Center Group and General Manager, Big Data Solutions

NetFlow Logic

“Furthering the adoption of open standards such as Kafka, Flume, and Sqoop in the next-generation big data platform benefits the entire community and allows users to accelerate the development and adoption of next-generation applications.”

— Damian Miller, VP Customer Success


“Today, it is more critical than ever to provide organizations with capabilities to achieve greater context around the massive amounts of data they are collecting to extract real intelligence. The work that Cloudera and their partners are doing around open standards is a significant industry milestone that we are excited to support and leverage to help organizations make better security decisions.”

— David Corlette, Sr. Product Manager, NetIQ security portfolio of Micro Focus


“It’s great that the Apache community is investing in testable APIs. This is will be a tremendous benefit to ISVs such as Rocana who are building on top of a broad range of projects.”

— Eric Sammer, CTO


“Customers continue to leverage Hadoop as their analysis and visualization platform of choice. Having the right data loaded into Hadoop, as well as transforming and cleansing that data in Hadoop, means that information is readily available for visualizations, advanced analysis, and enterprise decisions. SAS supports this Cloudera initiative, as it ensures that our offerings, including SAS® Data Loader for Hadoop, improve data movement and data quality into and throughout a Hadoop cluster.”

— Randy Guard, VP Product Management


“We expect the adoption of open standards like Apache Kafka, Sqoop, and Flume to set the foundation for tomorrow’s analytic, big data workloads. Giving developers free and open access to tools fosters innovation and collaboration across the data science community, but also helps data-intensive industries re-imagine the role of high performance computing across the enterprise.”

— Bob Braham, CMO


“StreamSets, like Cloudera, is completely dedicated to furthering the adoption of open, multi-vendor standards as important building blocks in tomorrow’s big data platform. Apache Sqoop and Apache Flume, both important components of the StreamSets offering, enable the ingestion of more data, more rapidly, into next-generation analytic workloads, so we support Cloudera’s Open Source Compatibility Framework initiative as a means of accelerating their uptake across the ecosystem.”

— Arvind Prabhakar, CTO


“This initiative aligns well with Syncsort’s focus to support Kafka for parallel data ingestion to Hadoop and real-time ETL processing, and accelerates our continued contributions to Apache projects such as Sqoop.”

— Tendu Yogurtcu, General Manager for Big Data


“In our client engagements, we see that the success of impactful big data systems critically depend on the efficiency of data ingestion. This ranges from traditional batch-oriented systems to the exciting new streaming applications necessitated by the Internet of Things era. We believe that open source standards such as Kafka, Flume, and Sqoop and the associated test frameworks are central to the rapid adoption of big data applications”

— Dr. Satya Ramaswamy, Global Head of TCS Digital Enterprise


“Teradata supports Cloudera’s initiative to further the development of Apache Sqoop and Apache Flume software. These open source technologies can be a valuable asset to the open source community, as they become a standardized component of Hadoop deployments.”

— Chris Twogood, VP Products and Services


“TIBCO is committed to providing our customers with the benefits that open standards bring. Any effort to further the adoption of open standards such as Kafka, Flume, and Sqoop in the next-generation big data platform is a positive development for customers, the industry, and the ecosystem.”

— Karl Van den Bergh, VP Product and Cloud

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera’s open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 30,000 individuals worldwide. Over 1,450 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.


Cloudera, Cloudera’s Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.