row-based vs col based db or format
row based –> good for OLTP ( transcation), e.g: cassendra
col based –> good for OLAP (? easy to aggreation etc?), druid
Parquet ( column based data format):
https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/
https://www.upsolver.com/blog/apache-parquet-why-use
hadoop: big data storage,
what is the alternatives? S3 on cloud?
https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/
https://stackoverflow.com/questions/31011078/data-retention-in-hadoop-hdfs
pinot vs cassandra druid
https://imply.io/post/apache-cassandra-vs-apache-druid
If your queries ALWAYS constrain on a single column in the WHERE clause, for example on a field such as deviceID or customerID, and you are looking to quickly (sub-second response time) scoop up any and all data related to that ID field reliably, and you are doing nothing else, then Cassandra is your mythological creature of choice.
If your use case is such that you honestly have no idea what your WHERE clause will look like, but you know that multiple ID columns will probably need to be queried reliably in less than a few seconds, then Druid is your best bet. Queries matter, people! Know thy query, know thy database.
druid with hadoop
https://en.wikipedia.org/wiki/Apache_Pinot
https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7
Pinot vs druid?
Presto
on hadoop/hdfs? SQL like?
https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes
List of column-oriented DBMSes
Apache Druid
MariaDB ColummnStore