Category Archives: DBA Things

Learning beyond SQL…PostgreSQL – Indexes

It’s been some time since I’ve blogged, even though I’ve been reading a lot all this while. One of the reason is that I couldn’t find enough compelling topics to write and share. Microsoft has been moving too fast with their SQL Server releases (2012, 2014, 2016….and we’re talking  Linux beta right now) and I’ve always been catching up.

However, between all this, something has changed. Due to an ever-growing buzz around Open Source, I haltingly started looking into PostgreSQL.  Truth be told, I’m starting from ground zero (so nothing to loose) and will be writing on topics that might sound too simple for some of you, nevertheless you may still find ’em helpful.

So starting with Indexes in PostgreSQL

PostgreSQL offers several index types:

  • B-tree
  • Hash
  • GiST  and GIN

Each index type uses a different algorithm that is best suited to different types of queries. In this post we’ll talk about B-tree indices.

Why Index?

  • Speed up data retrievals
  • Indexes reference data locations, explicitly, for the indexed column, consequently reducing data retrieval time
  • Without indices, SQL performs sequential table scans in search for data (applies to SELECT and DMLs)
  • B-tree index is sorted in ascending order by default


#Create Index Syntax
CREATE INDEX name ON table USING btree (coulmn);

#Check of the existing indices on  a table (as they can also be created implicitly by PRIMARY or UNIQUE key definition)
SELECT * FROM pg_indexes WHERE schemaname = ‘public’;
SELECT * FROM pg_stat_all_indexes WHERE schemaname NOT IN (‘pg_catalog’,’pg_toast’);

#Query a table without a filter condition and get query plan using EXPLAIN

Seq Scan on film  (cost=0.00..127.00 rows=2000 width=384)


#Query a table with a filter condition and get query plan using EXPLAIN
SELECT title AS Name, release_year AS year
WHERE title in (‘Clones Pinocchio’,’Vanilla Day’);

Index Scan using idx_title on film (cost=0.28..21.63 rows=4 width=19)
Index Cond: ((title)::text = ANY (‘{“Clones Pinocchio”,”Vanilla Day”}’::text[]))


Here, after specifying the WHERE condition Postgres Planner (aka Optimizer) decided to choose an index, instead of sequentially scanning the table. Postgres is able to find the targeted rows in an index, and then fetch them from disk selectively.

It’s a very wide topic, so I’ll write more about indexes. However to sum it up, PostgreSQL provides lot of flexibility with B-tree indexes so they can be optimized to suit your requirement and keep your queries snappy.

Identify queries that consume a large amount of log space in SQL Server

One of regular issues DBA’s get are about the T-log growth. Situations, wherein one “bad”  or “poorly-designed” query can eat up entire T-log space, bring the free space to zero and then bring your application down. The cause and remedy of most these issue is discussed in this KB # 317375(I’m  big fan on Microsoft KB’s).

While the KB discussed about the causes and approaches to deal with high T-log growth situations, it also hints about how we can ‘proactively’ find the queries that are consuming your T-log space at any given moment using DMV’s. Taking cue from this, I have written a below T-SQL Code:

Identify queries consuming large T-log space:

— Description: T-SQL to find queries that consume a large amount of log space in SQL Server
— Source: KB # 317375
— Author: varun.dhawan
SELECT dtst.session_id                                                       AS
       CAST(Db_name(dtdt.database_id) AS VARCHAR(20))                        AS
       Substring(st.TEXT, ( der.statement_start_offset / 2 ) + 1,
       ( (
       CASE der.statement_end_offset
       WHEN -1 THEN Datalength(st.TEXT)
       ELSE der.statement_end_offset
                                                                      END –
       der.statement_start_offset ) / 2 ) +
       1)                                                                    AS
       Coalesce(Quotename(Db_name(st.dbid)) + N’.’ + Quotename(
                st.dbid)) +
                N’.’ + Quotename(Object_name(st.objectid, st.dbid)), ”)     AS
       dtdt.database_transaction_log_bytes_used / 1024.0 / 1024.0            AS
       ‘MB used’,
       dtdt.database_transaction_log_bytes_used_system / 1024.0 / 1024.0     AS
       ‘MB used system’,
       dtdt.database_transaction_log_bytes_reserved / 1024.0 / 1024.0        AS
       ‘MB reserved’,
       dtdt.database_transaction_log_bytes_reserved_system / 1024.0 / 1024.0 AS
       ‘MB reserved system’,
       dtdt.database_transaction_log_record_count                            AS
       ‘Rec count’
FROM   sys.dm_tran_database_transactions dtdt
       JOIN sys.dm_tran_session_transactions dtst
         ON dtdt.transaction_id = dtst.transaction_id
       JOIN sys.dm_exec_requests der
            CROSS APPLY sys.Dm_exec_sql_text(der.sql_handle) AS st
         ON dtst.session_id = der.session_id

Hope that this will help you too!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

Tagged , , , , , , , ,

What is a columnstore index?

WARNING: The blog-post is based on pre-release software so things could change. For more details on CTP, please refer SQL Server Code-Named “Denali” CTP1 Release Notes

Upcoming SQL Product, introduces a new data warehouse query acceleration feature based on a new type of index called columnstore. Before we move any further exploring this new feature, I want to take time to explain the basics behind a columnstore index and how different is it from a traditional index (rowstore).

What is columnstore? And what is a rowstore?

To understand this, lets see a simple illustration below. Here I have a table with 4 columns (First name, Email, Phone, Street Address) . Below is a representation of how the index data will be stored and their associated pros and cons.


As opposed to a rowstore, a columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. So in above example, columns (First name, Email, Phone, Street Address) are stored in different groups of pages in a columnstore index.

So what’s BAD with rowstore design?

Say if we have to run a query like ‘select first_name, phone from emp’. In a rowstore design, DBMS will transfer the ENTIRE ROW from disk to memory buffer even though the query required just 2 attributes. In case of a large read intensive queries, we do so much of un-necessary disk I/O and thus wasting precious disk bandwidth.

And what’s good with columnstore design?

1. Better performance for SELECT’s – only the attributes needed to solve a query are fetched from disk, thereby saving on disk I/O.
2. Better compression ratio – it’s easier to compress the data due to the redundancy of data within a column

Really are they so good?

Wait, “There’s no free lunch”. Due to change in the index storage design, any tuple (row) writes are very expensive on a column store index. As such, in Denali, tables with columnstore indexes can’t be updated directly using INSERT, UPDATE, DELETE, and MERGE statements, or bulk load operations. Hence to perform a DML on table, we may need to disable/drop an index temporarily and then re-create post DML activity.

Hope this provides you with some initial understanding of a ROWSTORE vs COLUMNSTORE.  This feature is expected to be available in next CTP build of Denali, so once we have the build I will be able to share a demo.

Thanks of reading!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

Tagged , , , , , , , , ,