Big Data Analytics (BDA) Platforms and Tools
-
01.
Apache Spark is a unified analytics engine for
large-scale data processing
-
02.
Apache Storm is a free and open source
distributed realtime computation system. It is
easy to reliably process unbounded streams of
data, doing for realtime processing what Hadoop
did for batch processing
-
03.
Trino - Fast distributed SQL query engine for
big data analytics that helps you explore your
data universe.
-
04.
Apache Hadoop - A framework for the distributed
processing of large data sets
-
05.
Samza allows you to build stateful applications
that process data in real-time from multiple
sources including Apache Kafka.
-
06.
Apache Airflow is a platform created by the
community to programmatically author, schedule
and monitor workflows
-
08.
HPCC Systems - A data lake platform for
combining different types of data easier and
faster
-
09.
Delta Lake is an open-source storage framework
that enables building a Lakehouse architecture
with compute engines including Spark, PrestoDB,
Flink, Trino, and Hive and APIs for Scala, Java,
Rust, Ruby, and Python.
-
10.
Apache Drill - Schema-free SQL Query Engine for
Hadoop, NoSQL and Cloud Storage
-
11.
Apache Druid is a real-time database to power
modern analytics applications.
-
12.
Apache Flink — Stateful Computations over Data
Streams
-
13.
The Apache Hive data warehouse software
facilitates reading, writing, and managing large
datasets residing in distributed storage using
SQL.
-
14.
Apache Hudi brings transactions, record-level
updates/deletes and change streams to data
lakes!
-
15.
Iceberg is a high-performance format for huge
analytic tables. Iceberg brings the reliability
and simplicity of SQL tables to big data
-
16.
Apache Kafka is an open-source distributed
event streaming platform for high-performance
data pipelines, streaming analytics, data
integration, and mission-critical
applications.
-
17.
Apache Kylin is an open source, distributed
Analytical Data Warehouse for Big Data; it was
designed to provide OLAP (Online Analytical
Processing) capability in the big data era.
-
18.
Presto is an open source distributed SQL query
engine for running interactive analytic queries
against data sources of all sizes ranging from
gigabytes to petabytes.