What is a columnstore index?
WARNING: The blog-post is based on pre-release software so things could change. For more details on CTP, please refer SQL Server Code-Named “Denali” CTP1 Release Notes
Upcoming SQL Product, introduces a new data warehouse query acceleration feature based on a new type of index called columnstore. Before we move any further exploring this new feature, I want to take time to explain the basics behind a columnstore index and how different is it from a traditional index (rowstore).
What is columnstore? And what is a rowstore?
To understand this, lets see a simple illustration below. Here I have a table with 4 columns (First name, Email, Phone, Street Address) . Below is a representation of how the index data will be stored and their associated pros and cons.
As opposed to a rowstore, a columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. So in above example, columns (First name, Email, Phone, Street Address) are stored in different groups of pages in a columnstore index.
So what’s BAD with rowstore design?
Say if we have to run a query like ‘select first_name, phone from emp’. In a rowstore design, DBMS will transfer the ENTIRE ROW from disk to memory buffer even though the query required just 2 attributes. In case of a large read intensive queries, we do so much of un-necessary disk I/O and thus wasting precious disk bandwidth.
And what’s good with columnstore design?
1. Better performance for SELECT’s – only the attributes needed to solve a query are fetched from disk, thereby saving on disk I/O.
2. Better compression ratio – it’s easier to compress the data due to the redundancy of data within a column
Really are they so good?
Wait, “There’s no free lunch”. Due to change in the index storage design, any tuple (row) writes are very expensive on a column store index. As such, in Denali, tables with columnstore indexes can’t be updated directly using INSERT, UPDATE, DELETE, and MERGE statements, or bulk load operations. Hence to perform a DML on table, we may need to disable/drop an index temporarily and then re-create post DML activity.
Hope this provides you with some initial understanding of a ROWSTORE vs COLUMNSTORE. This feature is expected to be available in next CTP build of Denali, so once we have the build I will be able to share a demo.
Thanks of reading!
Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.