Home > DBA Things > What is a columnstore index?

What is a columnstore index?

WARNING: The blog-post is based on pre-release software so things could change. For more details on CTP, please refer SQL Server Code-Named “Denali” CTP1 Release Notes

Upcoming SQL Product, introduces a new data warehouse query acceleration feature based on a new type of index called columnstore. Before we move any further exploring this new feature, I want to take time to explain the basics behind a columnstore index and how different is it from a traditional index (rowstore).

What is columnstore? And what is a rowstore?

To understand this, lets see a simple illustration below. Here I have a table with 4 columns (First name, Email, Phone, Street Address) . Below is a representation of how the index data will be stored and their associated pros and cons.

image

As opposed to a rowstore, a columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. So in above example, columns (First name, Email, Phone, Street Address) are stored in different groups of pages in a columnstore index.

So what’s BAD with rowstore design?

Say if we have to run a query like ‘select first_name, phone from emp’. In a rowstore design, DBMS will transfer the ENTIRE ROW from disk to memory buffer even though the query required just 2 attributes. In case of a large read intensive queries, we do so much of un-necessary disk I/O and thus wasting precious disk bandwidth.

And what’s good with columnstore design?

1. Better performance for SELECT’s – only the attributes needed to solve a query are fetched from disk, thereby saving on disk I/O.
2. Better compression ratio – it’s easier to compress the data due to the redundancy of data within a column

Really are they so good?

Wait, “There’s no free lunch”. Due to change in the index storage design, any tuple (row) writes are very expensive on a column store index. As such, in Denali, tables with columnstore indexes can’t be updated directly using INSERT, UPDATE, DELETE, and MERGE statements, or bulk load operations. Hence to perform a DML on table, we may need to disable/drop an index temporarily and then re-create post DML activity.

Hope this provides you with some initial understanding of a ROWSTORE vs COLUMNSTORE.  This feature is expected to be available in next CTP build of Denali, so once we have the build I will be able to share a demo.

Thanks of reading!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

About these ads
  1. January 13, 2012 at 8:14 am

    Great Post varun, You explained above example in very simple manner as usual you do. Very usefull for me. Thanks.

    • January 13, 2012 at 10:56 am

      Hello Gaurav
      Glad to know that you liked the post…this actually came in a question from #vinod_sql (SQM MVP) during #SQLSAT116 session…

      Thanks, for reading
      Varun

    • January 14, 2012 at 6:09 am

      Hello Gaurav
      Glad to know that you liked the post…this actually came in a question from #vinod_sql (SQM MVP) during #SQLSAT116 session…

      Thanks, for reading
      Varun

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: