Chinese Yellow Pages | Classifieds | Knowledge | Tax | IME

Free and open-source software Columnar DB

Database Name Language Implemented in Notes
Apache Druid Java started in 2011 for low-latency massive ingestion and queries
Apache Kudu C++ released in 2016 to complete the Apache Hadoop ecosystem
Apache Pinot Java open sourced in 2015 for real-time low-latency analytics
Calpont InfiniDB C++
ClickHouse C++ released in 2016 to analyze data that is updated in real time
CrateDB Java
DuckDB C++ An embeddable, in-process, column-oriented SQL OLAP RDBMS
Databend Rust An elastic and reliable Serverless Data Warehouse
InfluxDB Go time series database
Greenplum Database C
PostgreSQL cstore fdw,[1] vops[2] C cstore_fdw uses ORC format
MariaDB ColumnStore C & C++ formerly Calpont InfiniDB
MapD C++
Metakit C++
MonetDB C


Columns store db:

are mainly relational, primarily used for data warehouse applications

store /access as columns

optimized for answering questions about a single attribute

MariaDB ColumnStore

it could be select, sum join, where , group by order by etc  on columnar db:

select c.c_mktsegment cust_mkt_segment,

sum(o.o_totalprice) total_order_amount

from orders o join customer c on o.o_custkey = c.c_custkey

where c.c_nationkey = 24

and o.o_orderDATE >= ‘2016-10-01’

and o.o_orderDATE < ‘2017-01-01’

group by total_order_amount

order by total_order_amount;


Bigtable/HBase/Cassandra ( Column Family db):

Bigtable/HBase ( schema-less)

it is a column family db,( based on google’s bigtable struct)

not columnar-db

good for big data on hadoop, better more than 5 servers

and stored/accessed as row ( or column/row mixed )

optimized for answers questions about a row

CF1          CF2

col1, col2    col3

where CF1, CF2 are stored as column

and col1/2 are stored as row

hbase shell:

create ‘customers’, ‘details’,’relatives’,’accounts’

# put <table>,<row key>, <CF:Qualifier>, <Value>  //insert or update the row

put ‘customers’, ‘101’,’details:name’, ‘Adam’  # can only write one qualifier/value at one time

put ‘customers’, ‘101’,’details:Email’, ‘’

scan ‘customers’, {COLUMNS => ‘details’}

alter ‘customers’, {NAME =>’accounts’, VERSIONS =>5 } # enable 5 versions of accounts

get ‘customers’, ‘102’, { COLUMN=>’accounts:checking’, VERSION=>5} #return multiple versions

delete ‘customers’, ‘101’, ‘accounts:business’ # delete that column family’s qualifier/attributes


scan ‘customers’, {COLUMNS=>’details:name’, FILTER => “ValueFilter(>,’binaryprefix:B’)”}


Cassandra ( need schema)

used to be schema-less, but as Cassandra matured, its developers started to realize that schema-less isn’t as great as they once thought it was. Cassandra is now schema-full, all the way

If you need to use “group by,order by,count,sum,ifnull,concat ,joins and some times nested querys” as you state then you probably don’t want to use Cassandra, since it doesn’t support most of those.

CQL supports COUNT, but only up to 10000. It supports ORDER BY, but only on clustering keys. The other things you mention are not supported at all.