Free and open-source software Columnar DB
Database Name | Language Implemented in | Notes |
---|---|---|
Apache Druid | Java | started in 2011 for low-latency massive ingestion and queries |
Apache Kudu | C++ | released in 2016 to complete the Apache Hadoop ecosystem |
Apache Pinot | Java | open sourced in 2015 for real-time low-latency analytics |
Calpont InfiniDB | C++ | |
ClickHouse | C++ | released in 2016 to analyze data that is updated in real time |
CrateDB | Java | |
C-Store | ||
DuckDB | C++ | An embeddable, in-process, column-oriented SQL OLAP RDBMS |
Databend | Rust | An elastic and reliable Serverless Data Warehouse |
InfluxDB | Go | time series database |
Greenplum Database | C | |
PostgreSQL cstore fdw,[1] vops[2] | C | cstore_fdw uses ORC format |
MariaDB ColumnStore | C & C++ | formerly Calpont InfiniDB |
MapD | C++ | |
Metakit | C++ | |
MonetDB | C |
Columns store db:
are mainly relational, primarily used for data warehouse applications
store /access as columns
optimized for answering questions about a single attribute
MariaDB ColumnStore
it could be select, sum join, where , group by order by etc on columnar db:
select c.c_mktsegment cust_mkt_segment,
sum(o.o_totalprice) total_order_amount
from orders o join customer c on o.o_custkey = c.c_custkey
where c.c_nationkey = 24
and o.o_orderDATE >= ‘2016-10-01’
and o.o_orderDATE < ‘2017-01-01’
group by total_order_amount
order by total_order_amount;
Bigtable/HBase/Cassandra ( Column Family db):
Bigtable/HBase ( schema-less)
it is a column family db,( based on google’s bigtable struct)
not columnar-db
good for big data on hadoop, better more than 5 servers
and stored/accessed as row ( or column/row mixed )
optimized for answers questions about a row
CF1 CF2
col1, col2 col3
where CF1, CF2 are stored as column
and col1/2 are stored as row
hbase shell:
create ‘customers’, ‘details’,’relatives’,’accounts’
# put <table>,<row key>, <CF:Qualifier>, <Value> //insert or update the row
put ‘customers’, ‘101’,’details:name’, ‘Adam’ # can only write one qualifier/value at one time
put ‘customers’, ‘101’,’details:Email’, ‘Adam@email.com’
scan ‘customers’, {COLUMNS => ‘details’}
alter ‘customers’, {NAME =>’accounts’, VERSIONS =>5 } # enable 5 versions of accounts
get ‘customers’, ‘102’, { COLUMN=>’accounts:checking’, VERSION=>5} #return multiple versions
delete ‘customers’, ‘101’, ‘accounts:business’ # delete that column family’s qualifier/attributes
show_filters
scan ‘customers’, {COLUMNS=>’details:name’, FILTER => “ValueFilter(>,’binaryprefix:B’)”}
Cassandra ( need schema)
used to be schema-less, but as Cassandra matured, its developers started to realize that schema-less isn’t as great as they once thought it was. Cassandra is now schema-full, all the way
If you need to use “group by,order by,count,sum,ifnull,concat ,joins and some times nested querys” as you state then you probably don’t want to use Cassandra, since it doesn’t support most of those.
CQL supports COUNT, but only up to 10000. It supports ORDER BY, but only on clustering keys. The other things you mention are not supported at all.
References
https://dzone.com/articles/how-mariadb-columnstore-handles-big-data-workloads