Chinese Yellow Pages | Classifieds | Knowledge | Tax | IME

row-based vs col based db or format

row based –> good for OLTP ( transcation),   e.g: cassendra
col based –> good for OLAP (? easy to aggreation etc?), druid

Parquet ( column based data format):
https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/

https://www.upsolver.com/blog/apache-parquet-why-use

 

hadoop: big data storage,

what is the alternatives? S3 on cloud?

https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/

https://stackoverflow.com/questions/31011078/data-retention-in-hadoop-hdfs

 

pinot vs cassandra druid

https://imply.io/post/apache-cassandra-vs-apache-druid

If your queries ALWAYS constrain on a single column in the WHERE clause, for example on a field such as deviceID or customerID, and you are looking to quickly (sub-second response time) scoop up any and all data related to that ID field reliably, and you are doing nothing else, then Cassandra is your mythological creature of choice.

If your use case is such that you honestly have no idea what your WHERE clause will look like, but you know that multiple ID columns will probably need to be queried reliably in less than a few seconds, then Druid is your best bet. Queries matter, people! Know thy query, know thy database.

druid with hadoop
https://en.wikipedia.org/wiki/Apache_Pinot
https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7

 

Pinot vs druid?

 

Pinot vs Druid Druid Pinot Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is...

Presto

on hadoop/hdfs? SQL like?

https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes

List of column-oriented DBMSes

Apache Druid

MariaDB ColummnStore