Big Data Analytics (BDA) Platforms and Tools
- 01. Apache Spark is a unified analytics engine for large-scale data processing
- 02. Apache Storm is a free and open source distributed realtime computation system. It is easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing
- 03. Trino - Fast distributed SQL query engine for big data analytics that helps you explore your data universe.
- 04. Apache Hadoop - A framework for the distributed processing of large data sets
- 05. Samza allows you to build stateful applications that process data in real-time from multiple sources including Apache Kafka.
- 06. Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows
- 08. HPCC Systems - A data lake platform for combining different types of data easier and faster
- 09. Delta Lake is an open-source storage framework that enables building a
Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
- 10. Apache Drill - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
- 11. Apache Druid is a real-time database to power modern analytics applications.
- 12. Apache Flink — Stateful Computations over Data Streams
- 13. The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.
- 14. Apache Hudi brings transactions, record-level updates/deletes and change streams to data lakes!
- 15. Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data
- 16. Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
- 17. Apache Kylin is an open source, distributed Analytical Data Warehouse for Big Data; it was designed to provide OLAP (Online Analytical Processing) capability in the big data era.
- 18. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.