Free and open-source software Columnar DB


Database Name	Language Implemented in	Notes
Apache Druid	Java	started in 2011 for low-latency massive ingestion and queries
Apache Kudu	C++	released in 2016 to complete the Apache Hadoop ecosystem
Apache Pinot	Java	open sourced in 2015 for real-time low-latency analytics
Calpont InfiniDB	C++
ClickHouse	C++	released in 2016 to analyze data that is updated in real time
CrateDB	Java
C-Store
DuckDB	C++	An embeddable, in-process, column-oriented SQL OLAP RDBMS
Databend	Rust	An elastic and reliable Serverless Data Warehouse
InfluxDB	Go	time series database
Greenplum Database	C
PostgreSQL cstore fdw,^[1] vops^[2]	C	cstore_fdw uses ORC format
MariaDB ColumnStore	C & C++	formerly Calpont InfiniDB
MapD	C++
Metakit	C++
MonetDB	C

Columns store db:

are mainly relational, primarily used for data warehouse applications

store /access as columns

optimized for answering questions about a single attribute

MariaDB ColumnStore

it could be select, sum join, where , group by order by etc on columnar db:

select c.c_mktsegment cust_mkt_segment,

sum(o.o_totalprice) total_order_amount

from orders o join customer c on o.o_custkey = c.c_custkey

where c.c_nationkey = 24

and o.o_orderDATE >= ‘2016-10-01’

and o.o_orderDATE < ‘2017-01-01’

group by total_order_amount

order by total_order_amount;

Bigtable/HBase/Cassandra ( Column Family db):

Bigtable/HBase ( schema-less)

it is a column family db,( based on google’s bigtable struct)

not columnar-db

good for big data on hadoop, better more than 5 servers

and stored/accessed as row ( or column/row mixed )

optimized for answers questions about a row

CF1 CF2

col1, col2 col3

where CF1, CF2 are stored as column

and col1/2 are stored as row

hbase shell:

create ‘customers’, ‘details’,’relatives’,’accounts’

# put <table>,<row key>, <CF:Qualifier>, <Value> //insert or update the row

put ‘customers’, ‘101’,’details:name’, ‘Adam’ # can only write one qualifier/value at one time

put ‘customers’, ‘101’,’details:Email’, ‘Adam@email.com’

scan ‘customers’, {COLUMNS => ‘details’}

alter ‘customers’, {NAME =>’accounts’, VERSIONS =>5 } # enable 5 versions of accounts

get ‘customers’, ‘102’, { COLUMN=>’accounts:checking’, VERSION=>5} #return multiple versions

delete ‘customers’, ‘101’, ‘accounts:business’ # delete that column family’s qualifier/attributes

show_filters

scan ‘customers’, {COLUMNS=>’details:name’, FILTER => “ValueFilter(>,’binaryprefix:B’)”}

Cassandra ( need schema)

used to be schema-less, but as Cassandra matured, its developers started to realize that schema-less isn’t as great as they once thought it was. Cassandra is now schema-full, all the way

If you need to use “group by,order by,count,sum,ifnull,concat ,joins and some times nested querys” as you state then you probably don’t want to use Cassandra, since it doesn’t support most of those.

CQL supports COUNT, but only up to 10000. It supports ORDER BY, but only on clustering keys. The other things you mention are not supported at all.

References

https://dzone.com/articles/how-mariadb-columnstore-handles-big-data-workloads

column-oriented DB

ByMin Wang

Free and open-source software Columnar DB

Columns store db:

MariaDB ColumnStore

Bigtable/HBase/Cassandra ( Column Family db):

Bigtable/HBase ( schema-less)

Cassandra ( need schema)

References

By Min Wang

Related Post

Amazon data services

kafka msg format, how to publish, read

Cassandra Query

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference