Index Uniqueness OverheadWhy is Clustered Index on Primary Key compulsory?Why index REBUILD does not reduce index fragmentatation with a page count > 1000?Unable to drop non-PK index because it is referenced in a foreign key constraintAdding index to large mysql tablesShould the index on an identity column be nonclustered?SQL Server equivalent of Oracle USING INDEX clauseShould I remove this clustered index?Ordering of table is based on Clustered index or non clustered primary key?Nonclustered index storage on clustered columnstoreClustered Index Maintenance vs. Nonclustered Index Maintenance

Spoken encryption

If my business card says 〇〇さん, does that mean I'm referring to myself with an honourific?

Can I paint a load center cover?

Area of parallelogram = Area of square. Shear transform

What is a Union Word™?

Can GPL and BSD licensed applications be used for government work?

The seven story archetypes. Are they truly all of them?

Other than a swing wing, what types of variable geometry have flown?

Why is chess failing to attract big name sponsors?

This message is flooding my syslog, how to find were it comes from?

What are the exact meanings of roll, pitch and yaw?

Very basic singly linked list

What does Kasparov mean here?

Is the statement 'Gods love the mysterious' also found in the Vedic Samhitas?

How to write a sincerely religious protagonist without preaching or affirming or judging their worldview?

Are gangsters hired to attack people at a train station classified as a terrorist attack?

How do professional electronic musicians/sound engineers combat listening fatigue?

How can I receive packages while in France?

Where is this photo of a group of hikers taken? Is it really in the Ural?

Sitecore Powershell extensions module compatibility with Sitecore 9.2

What exactly makes a General Products hull nearly indestructible?

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

Why keep the bed heated after initial layer(s) with PLA (or PETG)?

Who has jurisdiction for a crime committed in an embassy?



Index Uniqueness Overhead


Why is Clustered Index on Primary Key compulsory?Why index REBUILD does not reduce index fragmentatation with a page count > 1000?Unable to drop non-PK index because it is referenced in a foreign key constraintAdding index to large mysql tablesShould the index on an identity column be nonclustered?SQL Server equivalent of Oracle USING INDEX clauseShould I remove this clustered index?Ordering of table is based on Clustered index or non clustered primary key?Nonclustered index storage on clustered columnstoreClustered Index Maintenance vs. Nonclustered Index Maintenance






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








13















I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.



Background



I have previously read a discussion that stated a Unique index is no additional cost to maintain, since an Insert operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique index has no additional cost.



My coworker combats this statement by saying that Unique is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.



At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.



I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.



Question



Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?



Example Table Definition



create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varchar(100) not null,
is_deleted bit not null default(0),
primary key nonclustered(id desc),
unique clustered(dt desc, id desc)
);

create index
[nonunique_nonclustered_example]
on #test_index
(is_deleted)
include
(val);

create unique index
[unique_nonclustered_example]
on #test_index
(is_deleted, dt desc, id desc)
include
(val);


Example



An example of why I would add the Unique key to the end of an index is in one of our fact tables. There is a Primary Key that is an Identity column. However, the Clustered Index is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key with a key lookup rather than leveraging the Clustered Index. Other tables that follow a similar design, but have the Primary Key appended to the end have considerably better performance.



-- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
create partition function
pf_date_int (int)
as range right for values
(19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
go

if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
create partition scheme
ps_date_int
as partition
pf_date_int all
to
([PRIMARY]);
go

if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
create table dbo.bad_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
fk_id int not null,
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
)
on ps_date_int(date_int);
go

if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
create table dbo.better_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
)
on ps_date_int(date_int);
go









share|improve this question






























    13















    I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.



    Background



    I have previously read a discussion that stated a Unique index is no additional cost to maintain, since an Insert operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique index has no additional cost.



    My coworker combats this statement by saying that Unique is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.



    At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.



    I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.



    Question



    Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?



    Example Table Definition



    create table #test_index
    (
    id int not null identity(1, 1),
    dt datetime not null default(current_timestamp),
    val varchar(100) not null,
    is_deleted bit not null default(0),
    primary key nonclustered(id desc),
    unique clustered(dt desc, id desc)
    );

    create index
    [nonunique_nonclustered_example]
    on #test_index
    (is_deleted)
    include
    (val);

    create unique index
    [unique_nonclustered_example]
    on #test_index
    (is_deleted, dt desc, id desc)
    include
    (val);


    Example



    An example of why I would add the Unique key to the end of an index is in one of our fact tables. There is a Primary Key that is an Identity column. However, the Clustered Index is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key with a key lookup rather than leveraging the Clustered Index. Other tables that follow a similar design, but have the Primary Key appended to the end have considerably better performance.



    -- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
    if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
    create partition function
    pf_date_int (int)
    as range right for values
    (19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
    go

    if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
    create partition scheme
    ps_date_int
    as partition
    pf_date_int all
    to
    ([PRIMARY]);
    go

    if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
    create table dbo.bad_fact_table
    (
    id int not null, -- Identity implemented elsewhere, and CDC populates
    date_int int not null,
    dt date not null,
    group_id int not null,
    group_entity_id int not null, -- member of group
    fk_id int not null,
    -- tons of other columns
    primary key nonclustered(id, date_int),
    index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
    )
    on ps_date_int(date_int);
    go

    if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
    create table dbo.better_fact_table
    (
    id int not null, -- Identity implemented elsewhere, and CDC populates
    date_int int not null,
    dt date not null,
    group_id int not null,
    group_entity_id int not null, -- member of group
    -- tons of other columns
    primary key nonclustered(id, date_int),
    index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
    )
    on ps_date_int(date_int);
    go









    share|improve this question


























      13












      13








      13


      3






      I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.



      Background



      I have previously read a discussion that stated a Unique index is no additional cost to maintain, since an Insert operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique index has no additional cost.



      My coworker combats this statement by saying that Unique is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.



      At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.



      I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.



      Question



      Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?



      Example Table Definition



      create table #test_index
      (
      id int not null identity(1, 1),
      dt datetime not null default(current_timestamp),
      val varchar(100) not null,
      is_deleted bit not null default(0),
      primary key nonclustered(id desc),
      unique clustered(dt desc, id desc)
      );

      create index
      [nonunique_nonclustered_example]
      on #test_index
      (is_deleted)
      include
      (val);

      create unique index
      [unique_nonclustered_example]
      on #test_index
      (is_deleted, dt desc, id desc)
      include
      (val);


      Example



      An example of why I would add the Unique key to the end of an index is in one of our fact tables. There is a Primary Key that is an Identity column. However, the Clustered Index is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key with a key lookup rather than leveraging the Clustered Index. Other tables that follow a similar design, but have the Primary Key appended to the end have considerably better performance.



      -- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
      if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
      create partition function
      pf_date_int (int)
      as range right for values
      (19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
      go

      if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
      create partition scheme
      ps_date_int
      as partition
      pf_date_int all
      to
      ([PRIMARY]);
      go

      if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
      create table dbo.bad_fact_table
      (
      id int not null, -- Identity implemented elsewhere, and CDC populates
      date_int int not null,
      dt date not null,
      group_id int not null,
      group_entity_id int not null, -- member of group
      fk_id int not null,
      -- tons of other columns
      primary key nonclustered(id, date_int),
      index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
      )
      on ps_date_int(date_int);
      go

      if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
      create table dbo.better_fact_table
      (
      id int not null, -- Identity implemented elsewhere, and CDC populates
      date_int int not null,
      dt date not null,
      group_id int not null,
      group_entity_id int not null, -- member of group
      -- tons of other columns
      primary key nonclustered(id, date_int),
      index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
      )
      on ps_date_int(date_int);
      go









      share|improve this question
















      I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.



      Background



      I have previously read a discussion that stated a Unique index is no additional cost to maintain, since an Insert operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique index has no additional cost.



      My coworker combats this statement by saying that Unique is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.



      At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.



      I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.



      Question



      Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?



      Example Table Definition



      create table #test_index
      (
      id int not null identity(1, 1),
      dt datetime not null default(current_timestamp),
      val varchar(100) not null,
      is_deleted bit not null default(0),
      primary key nonclustered(id desc),
      unique clustered(dt desc, id desc)
      );

      create index
      [nonunique_nonclustered_example]
      on #test_index
      (is_deleted)
      include
      (val);

      create unique index
      [unique_nonclustered_example]
      on #test_index
      (is_deleted, dt desc, id desc)
      include
      (val);


      Example



      An example of why I would add the Unique key to the end of an index is in one of our fact tables. There is a Primary Key that is an Identity column. However, the Clustered Index is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key with a key lookup rather than leveraging the Clustered Index. Other tables that follow a similar design, but have the Primary Key appended to the end have considerably better performance.



      -- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
      if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
      create partition function
      pf_date_int (int)
      as range right for values
      (19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
      go

      if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
      create partition scheme
      ps_date_int
      as partition
      pf_date_int all
      to
      ([PRIMARY]);
      go

      if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
      create table dbo.bad_fact_table
      (
      id int not null, -- Identity implemented elsewhere, and CDC populates
      date_int int not null,
      dt date not null,
      group_id int not null,
      group_entity_id int not null, -- member of group
      fk_id int not null,
      -- tons of other columns
      primary key nonclustered(id, date_int),
      index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
      )
      on ps_date_int(date_int);
      go

      if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
      create table dbo.better_fact_table
      (
      id int not null, -- Identity implemented elsewhere, and CDC populates
      date_int int not null,
      dt date not null,
      group_id int not null,
      group_entity_id int not null, -- member of group
      -- tons of other columns
      primary key nonclustered(id, date_int),
      index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
      )
      on ps_date_int(date_int);
      go






      sql-server index-tuning database-internals unique-constraint






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 15 at 18:45









      Paul White

      57.9k15 gold badges304 silver badges477 bronze badges




      57.9k15 gold badges304 silver badges477 bronze badges










      asked Jul 15 at 16:00









      SolonotixSolonotix

      685 bronze badges




      685 bronze badges




















          4 Answers
          4






          active

          oldest

          votes


















          16















          I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.




          The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.




          Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?




          Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?



          Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?



          In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.



          As David Browne mentioned in a comment:





          Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.





          Take the following minimally complete and verifiable example:



          USE tempdb;

          DROP TABLE IF EXISTS dbo.IndexTest;
          CREATE TABLE dbo.IndexTest
          (
          id int NOT NULL
          CONSTRAINT IndexTest_pk
          PRIMARY KEY
          CLUSTERED
          IDENTITY(1,1)
          , rowDate datetime NOT NULL
          );


          I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:



          CREATE INDEX IndexTest_rowDate_ix01
          ON dbo.IndexTest(rowDate);

          CREATE UNIQUE INDEX IndexTest_rowDate_ix02
          ON dbo.IndexTest(rowDate, id);


          Next, we'll several rows to the table:



          INSERT INTO dbo.IndexTest (rowDate)
          VALUES (DATEADD(SECOND, 0, GETDATE()))
          , (DATEADD(SECOND, 0, GETDATE()))
          , (DATEADD(SECOND, 0, GETDATE()))
          , (DATEADD(SECOND, 1, GETDATE()))
          , (DATEADD(SECOND, 2, GETDATE()));


          As you can see above, three rows contain the same value for the rowDate column, and two rows contain unique values.



          Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE command:



          DECLARE @dbid int = DB_ID();
          DECLARE @fileid int;
          DECLARE @pageid int;
          DECLARE @indexid int;

          SELECT @fileid = ddpa.allocated_page_file_id
          , @pageid = ddpa.allocated_page_page_id
          FROM sys.indexes i
          CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
          WHERE i.name = N'IndexTest_rowDate_ix01'
          AND ddpa.is_allocated = 1
          AND ddpa.is_iam_page = 0;

          PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
          DBCC TRACEON(3604);
          DBCC PAGE (@dbid, @fileid, @pageid, 1);
          DBCC TRACEON(3604);
          PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';

          SELECT @fileid = ddpa.allocated_page_file_id
          , @pageid = ddpa.allocated_page_page_id
          FROM sys.indexes i
          CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
          WHERE i.name = N'IndexTest_rowDate_ix02'
          AND ddpa.is_allocated = 1
          AND ddpa.is_iam_page = 0;

          PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
          DBCC TRACEON(3604);
          DBCC PAGE (@dbid, @fileid, @pageid, 1);
          DBCC TRACEON(3604);
          PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';


          I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.



          enter image description here



          You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.



          There are several excellent resources in the Interwebz about this topic, including:



          • Where Clustered Indexes Dare

          • SQL Server and Binary Search

          • Performance Benefits of Unique Indexes

          FYI, the mere presence of an identity column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON; statement will allow you to insert to non-unique values into a column defined as identity.






          share|improve this answer
































            5














            Just an add-on to Max's excellent answer.



            When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.



            This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).



            When this happens, you will receive this error:



            The maximum system-generated unique value for a duplicate group 
            was exceeded for index with partition ID (someID).

            Dropping and re-creating the index may resolve this;
            otherwise, use another clustering key.


            Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows.



            So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.






            share|improve this answer

























            • I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

              – Solonotix
              Jul 15 at 19:13











            • A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

              – Chessbrain
              Jul 15 at 19:32







            • 1





              @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

              – Max Vernon
              Jul 15 at 19:38


















            -2














            I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design



            1. dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().

            2. create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than WHERE is_deleted = 0) and look at using a filtered index. I would even consider using 2 filtered indexes, one for where is_deleted = 0 and the other for where is_deleted = 1

            Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.






            share|improve this answer























            • The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

              – Solonotix
              2 days ago


















            -4














            It looks like your simply using PK to make an alternate, smaller index. Hence, performance on it is faster.



            You see this at companies that have massive data tables (eg: master data tables). Someone decides to have one massive clustered index on it expecting it to fill the needs of various reporting groups.



            But, one group may need only a few parts of that index while another group needs other parts.. so the index just slapping in every column under the sun to "optimize performance" doesn't really help.



            Meanwhile, breaking it down to create multiple, smaller, targeted indices, often solves the problem.



            And, that seems to be what you're doing. You have this massive clustered index with awful performance, then you're using PK to create another index with fewer columns that (no surprise) has better performance.



            So, just do an analysis and figure out if you can take the single clustered index and break it down into smaller, targeted indices that specific jobs need.



            You would have to analyze performance then from a "single index vs. multiple index" stand-point, because there's overhead in making and updating indices. But, you have to analyze this from an overall perspective.



            EG: it may be less resource-intensive to one massive clustered index, and more resource intensive to have several smaller targeted indices. But, if you're then able to run targeted queries at the back-end much quicker, saving time (and money) there, it might be worth it.



            So, you'd have to do end-to-end analysis.. not just look at how it impacts your own world, but also how it impacts end-users.



            I just feel like you're mis-using the PK identifier. But, you may be using a database system that only allows 1 index (?), but you can sneak another in if you PK (b/c every relational database system these days seems to automatically index the PK). However, most modern RDBMS' should allow multiple index creation; there should be no limit to the number of indexes you can make (as opposed to a limit of 1 PK).



            So, by making a PK whicih just acts like an alt index.. you're using up your PK, which may be needed if the table is later expanded in it's role.



            That's not to say your table doesn't need a PK.. SOP DB's 101 says "every table should have a PK". But, in a data-warehousing situation or such.. having a PK on a table may just be extra overhead that you don't need. Or, it could be a god-send to ensure you're not double-adding dupe entries. It's really a matter of what you're doing and why you're doing it.



            But, massive tables defnitely benefit from having indexes. But, assuming a single massive clustered index will be best is just ... it may BE the best.. but I'd recommend testing out on a test env breaking the index apart into multiple smaller indices targeting specific use-case scenarios.






            share|improve this answer

























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "182"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f242908%2findex-uniqueness-overhead%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              16















              I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.




              The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.




              Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?




              Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?



              Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?



              In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.



              As David Browne mentioned in a comment:





              Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.





              Take the following minimally complete and verifiable example:



              USE tempdb;

              DROP TABLE IF EXISTS dbo.IndexTest;
              CREATE TABLE dbo.IndexTest
              (
              id int NOT NULL
              CONSTRAINT IndexTest_pk
              PRIMARY KEY
              CLUSTERED
              IDENTITY(1,1)
              , rowDate datetime NOT NULL
              );


              I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:



              CREATE INDEX IndexTest_rowDate_ix01
              ON dbo.IndexTest(rowDate);

              CREATE UNIQUE INDEX IndexTest_rowDate_ix02
              ON dbo.IndexTest(rowDate, id);


              Next, we'll several rows to the table:



              INSERT INTO dbo.IndexTest (rowDate)
              VALUES (DATEADD(SECOND, 0, GETDATE()))
              , (DATEADD(SECOND, 0, GETDATE()))
              , (DATEADD(SECOND, 0, GETDATE()))
              , (DATEADD(SECOND, 1, GETDATE()))
              , (DATEADD(SECOND, 2, GETDATE()));


              As you can see above, three rows contain the same value for the rowDate column, and two rows contain unique values.



              Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE command:



              DECLARE @dbid int = DB_ID();
              DECLARE @fileid int;
              DECLARE @pageid int;
              DECLARE @indexid int;

              SELECT @fileid = ddpa.allocated_page_file_id
              , @pageid = ddpa.allocated_page_page_id
              FROM sys.indexes i
              CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
              WHERE i.name = N'IndexTest_rowDate_ix01'
              AND ddpa.is_allocated = 1
              AND ddpa.is_iam_page = 0;

              PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
              DBCC TRACEON(3604);
              DBCC PAGE (@dbid, @fileid, @pageid, 1);
              DBCC TRACEON(3604);
              PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';

              SELECT @fileid = ddpa.allocated_page_file_id
              , @pageid = ddpa.allocated_page_page_id
              FROM sys.indexes i
              CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
              WHERE i.name = N'IndexTest_rowDate_ix02'
              AND ddpa.is_allocated = 1
              AND ddpa.is_iam_page = 0;

              PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
              DBCC TRACEON(3604);
              DBCC PAGE (@dbid, @fileid, @pageid, 1);
              DBCC TRACEON(3604);
              PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';


              I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.



              enter image description here



              You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.



              There are several excellent resources in the Interwebz about this topic, including:



              • Where Clustered Indexes Dare

              • SQL Server and Binary Search

              • Performance Benefits of Unique Indexes

              FYI, the mere presence of an identity column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON; statement will allow you to insert to non-unique values into a column defined as identity.






              share|improve this answer





























                16















                I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.




                The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.




                Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?




                Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?



                Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?



                In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.



                As David Browne mentioned in a comment:





                Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.





                Take the following minimally complete and verifiable example:



                USE tempdb;

                DROP TABLE IF EXISTS dbo.IndexTest;
                CREATE TABLE dbo.IndexTest
                (
                id int NOT NULL
                CONSTRAINT IndexTest_pk
                PRIMARY KEY
                CLUSTERED
                IDENTITY(1,1)
                , rowDate datetime NOT NULL
                );


                I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:



                CREATE INDEX IndexTest_rowDate_ix01
                ON dbo.IndexTest(rowDate);

                CREATE UNIQUE INDEX IndexTest_rowDate_ix02
                ON dbo.IndexTest(rowDate, id);


                Next, we'll several rows to the table:



                INSERT INTO dbo.IndexTest (rowDate)
                VALUES (DATEADD(SECOND, 0, GETDATE()))
                , (DATEADD(SECOND, 0, GETDATE()))
                , (DATEADD(SECOND, 0, GETDATE()))
                , (DATEADD(SECOND, 1, GETDATE()))
                , (DATEADD(SECOND, 2, GETDATE()));


                As you can see above, three rows contain the same value for the rowDate column, and two rows contain unique values.



                Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE command:



                DECLARE @dbid int = DB_ID();
                DECLARE @fileid int;
                DECLARE @pageid int;
                DECLARE @indexid int;

                SELECT @fileid = ddpa.allocated_page_file_id
                , @pageid = ddpa.allocated_page_page_id
                FROM sys.indexes i
                CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                WHERE i.name = N'IndexTest_rowDate_ix01'
                AND ddpa.is_allocated = 1
                AND ddpa.is_iam_page = 0;

                PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
                DBCC TRACEON(3604);
                DBCC PAGE (@dbid, @fileid, @pageid, 1);
                DBCC TRACEON(3604);
                PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';

                SELECT @fileid = ddpa.allocated_page_file_id
                , @pageid = ddpa.allocated_page_page_id
                FROM sys.indexes i
                CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                WHERE i.name = N'IndexTest_rowDate_ix02'
                AND ddpa.is_allocated = 1
                AND ddpa.is_iam_page = 0;

                PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
                DBCC TRACEON(3604);
                DBCC PAGE (@dbid, @fileid, @pageid, 1);
                DBCC TRACEON(3604);
                PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';


                I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.



                enter image description here



                You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.



                There are several excellent resources in the Interwebz about this topic, including:



                • Where Clustered Indexes Dare

                • SQL Server and Binary Search

                • Performance Benefits of Unique Indexes

                FYI, the mere presence of an identity column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON; statement will allow you to insert to non-unique values into a column defined as identity.






                share|improve this answer



























                  16












                  16








                  16








                  I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.




                  The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.




                  Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?




                  Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?



                  Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?



                  In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.



                  As David Browne mentioned in a comment:





                  Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.





                  Take the following minimally complete and verifiable example:



                  USE tempdb;

                  DROP TABLE IF EXISTS dbo.IndexTest;
                  CREATE TABLE dbo.IndexTest
                  (
                  id int NOT NULL
                  CONSTRAINT IndexTest_pk
                  PRIMARY KEY
                  CLUSTERED
                  IDENTITY(1,1)
                  , rowDate datetime NOT NULL
                  );


                  I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:



                  CREATE INDEX IndexTest_rowDate_ix01
                  ON dbo.IndexTest(rowDate);

                  CREATE UNIQUE INDEX IndexTest_rowDate_ix02
                  ON dbo.IndexTest(rowDate, id);


                  Next, we'll several rows to the table:



                  INSERT INTO dbo.IndexTest (rowDate)
                  VALUES (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 1, GETDATE()))
                  , (DATEADD(SECOND, 2, GETDATE()));


                  As you can see above, three rows contain the same value for the rowDate column, and two rows contain unique values.



                  Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE command:



                  DECLARE @dbid int = DB_ID();
                  DECLARE @fileid int;
                  DECLARE @pageid int;
                  DECLARE @indexid int;

                  SELECT @fileid = ddpa.allocated_page_file_id
                  , @pageid = ddpa.allocated_page_page_id
                  FROM sys.indexes i
                  CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                  WHERE i.name = N'IndexTest_rowDate_ix01'
                  AND ddpa.is_allocated = 1
                  AND ddpa.is_iam_page = 0;

                  PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
                  DBCC TRACEON(3604);
                  DBCC PAGE (@dbid, @fileid, @pageid, 1);
                  DBCC TRACEON(3604);
                  PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';

                  SELECT @fileid = ddpa.allocated_page_file_id
                  , @pageid = ddpa.allocated_page_page_id
                  FROM sys.indexes i
                  CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                  WHERE i.name = N'IndexTest_rowDate_ix02'
                  AND ddpa.is_allocated = 1
                  AND ddpa.is_iam_page = 0;

                  PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
                  DBCC TRACEON(3604);
                  DBCC PAGE (@dbid, @fileid, @pageid, 1);
                  DBCC TRACEON(3604);
                  PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';


                  I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.



                  enter image description here



                  You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.



                  There are several excellent resources in the Interwebz about this topic, including:



                  • Where Clustered Indexes Dare

                  • SQL Server and Binary Search

                  • Performance Benefits of Unique Indexes

                  FYI, the mere presence of an identity column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON; statement will allow you to insert to non-unique values into a column defined as identity.






                  share|improve this answer
















                  I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.




                  The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.




                  Does uniqueness have an additional cost on the back-end of an Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?




                  Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?



                  Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?



                  In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.



                  As David Browne mentioned in a comment:





                  Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.





                  Take the following minimally complete and verifiable example:



                  USE tempdb;

                  DROP TABLE IF EXISTS dbo.IndexTest;
                  CREATE TABLE dbo.IndexTest
                  (
                  id int NOT NULL
                  CONSTRAINT IndexTest_pk
                  PRIMARY KEY
                  CLUSTERED
                  IDENTITY(1,1)
                  , rowDate datetime NOT NULL
                  );


                  I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:



                  CREATE INDEX IndexTest_rowDate_ix01
                  ON dbo.IndexTest(rowDate);

                  CREATE UNIQUE INDEX IndexTest_rowDate_ix02
                  ON dbo.IndexTest(rowDate, id);


                  Next, we'll several rows to the table:



                  INSERT INTO dbo.IndexTest (rowDate)
                  VALUES (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 0, GETDATE()))
                  , (DATEADD(SECOND, 1, GETDATE()))
                  , (DATEADD(SECOND, 2, GETDATE()));


                  As you can see above, three rows contain the same value for the rowDate column, and two rows contain unique values.



                  Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE command:



                  DECLARE @dbid int = DB_ID();
                  DECLARE @fileid int;
                  DECLARE @pageid int;
                  DECLARE @indexid int;

                  SELECT @fileid = ddpa.allocated_page_file_id
                  , @pageid = ddpa.allocated_page_page_id
                  FROM sys.indexes i
                  CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                  WHERE i.name = N'IndexTest_rowDate_ix01'
                  AND ddpa.is_allocated = 1
                  AND ddpa.is_iam_page = 0;

                  PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
                  DBCC TRACEON(3604);
                  DBCC PAGE (@dbid, @fileid, @pageid, 1);
                  DBCC TRACEON(3604);
                  PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';

                  SELECT @fileid = ddpa.allocated_page_file_id
                  , @pageid = ddpa.allocated_page_page_id
                  FROM sys.indexes i
                  CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
                  WHERE i.name = N'IndexTest_rowDate_ix02'
                  AND ddpa.is_allocated = 1
                  AND ddpa.is_iam_page = 0;

                  PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
                  DBCC TRACEON(3604);
                  DBCC PAGE (@dbid, @fileid, @pageid, 1);
                  DBCC TRACEON(3604);
                  PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';


                  I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.



                  enter image description here



                  You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.



                  There are several excellent resources in the Interwebz about this topic, including:



                  • Where Clustered Indexes Dare

                  • SQL Server and Binary Search

                  • Performance Benefits of Unique Indexes

                  FYI, the mere presence of an identity column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON; statement will allow you to insert to non-unique values into a column defined as identity.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jul 15 at 18:49









                  Paul White

                  57.9k15 gold badges304 silver badges477 bronze badges




                  57.9k15 gold badges304 silver badges477 bronze badges










                  answered Jul 15 at 16:20









                  Max VernonMax Vernon

                  54.8k13 gold badges119 silver badges245 bronze badges




                  54.8k13 gold badges119 silver badges245 bronze badges























                      5














                      Just an add-on to Max's excellent answer.



                      When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.



                      This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).



                      When this happens, you will receive this error:



                      The maximum system-generated unique value for a duplicate group 
                      was exceeded for index with partition ID (someID).

                      Dropping and re-creating the index may resolve this;
                      otherwise, use another clustering key.


                      Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows.



                      So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.






                      share|improve this answer

























                      • I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                        – Solonotix
                        Jul 15 at 19:13











                      • A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                        – Chessbrain
                        Jul 15 at 19:32







                      • 1





                        @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                        – Max Vernon
                        Jul 15 at 19:38















                      5














                      Just an add-on to Max's excellent answer.



                      When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.



                      This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).



                      When this happens, you will receive this error:



                      The maximum system-generated unique value for a duplicate group 
                      was exceeded for index with partition ID (someID).

                      Dropping and re-creating the index may resolve this;
                      otherwise, use another clustering key.


                      Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows.



                      So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.






                      share|improve this answer

























                      • I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                        – Solonotix
                        Jul 15 at 19:13











                      • A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                        – Chessbrain
                        Jul 15 at 19:32







                      • 1





                        @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                        – Max Vernon
                        Jul 15 at 19:38













                      5












                      5








                      5







                      Just an add-on to Max's excellent answer.



                      When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.



                      This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).



                      When this happens, you will receive this error:



                      The maximum system-generated unique value for a duplicate group 
                      was exceeded for index with partition ID (someID).

                      Dropping and re-creating the index may resolve this;
                      otherwise, use another clustering key.


                      Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows.



                      So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.






                      share|improve this answer















                      Just an add-on to Max's excellent answer.



                      When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.



                      This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).



                      When this happens, you will receive this error:



                      The maximum system-generated unique value for a duplicate group 
                      was exceeded for index with partition ID (someID).

                      Dropping and re-creating the index may resolve this;
                      otherwise, use another clustering key.


                      Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows.



                      So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Jul 16 at 14:40









                      marcello miorelli

                      6,55324 gold badges76 silver badges167 bronze badges




                      6,55324 gold badges76 silver badges167 bronze badges










                      answered Jul 15 at 19:03









                      ChessbrainChessbrain

                      2651 silver badge8 bronze badges




                      2651 silver badge8 bronze badges












                      • I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                        – Solonotix
                        Jul 15 at 19:13











                      • A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                        – Chessbrain
                        Jul 15 at 19:32







                      • 1





                        @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                        – Max Vernon
                        Jul 15 at 19:38

















                      • I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                        – Solonotix
                        Jul 15 at 19:13











                      • A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                        – Chessbrain
                        Jul 15 at 19:32







                      • 1





                        @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                        – Max Vernon
                        Jul 15 at 19:38
















                      I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                      – Solonotix
                      Jul 15 at 19:13





                      I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how Case and If structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index or if the clustering key is Unique then there isn't a problem for Nonclustered indexes?

                      – Solonotix
                      Jul 15 at 19:13













                      A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                      – Chessbrain
                      Jul 15 at 19:32






                      A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…

                      – Chessbrain
                      Jul 15 at 19:32





                      1




                      1





                      @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                      – Max Vernon
                      Jul 15 at 19:38





                      @Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.

                      – Max Vernon
                      Jul 15 at 19:38











                      -2














                      I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design



                      1. dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().

                      2. create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than WHERE is_deleted = 0) and look at using a filtered index. I would even consider using 2 filtered indexes, one for where is_deleted = 0 and the other for where is_deleted = 1

                      Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.






                      share|improve this answer























                      • The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                        – Solonotix
                        2 days ago















                      -2














                      I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design



                      1. dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().

                      2. create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than WHERE is_deleted = 0) and look at using a filtered index. I would even consider using 2 filtered indexes, one for where is_deleted = 0 and the other for where is_deleted = 1

                      Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.






                      share|improve this answer























                      • The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                        – Solonotix
                        2 days ago













                      -2












                      -2








                      -2







                      I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design



                      1. dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().

                      2. create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than WHERE is_deleted = 0) and look at using a filtered index. I would even consider using 2 filtered indexes, one for where is_deleted = 0 and the other for where is_deleted = 1

                      Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.






                      share|improve this answer













                      I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design



                      1. dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().

                      2. create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than WHERE is_deleted = 0) and look at using a filtered index. I would even consider using 2 filtered indexes, one for where is_deleted = 0 and the other for where is_deleted = 1

                      Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jul 15 at 22:58









                      TobyToby

                      1495 bronze badges




                      1495 bronze badges












                      • The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                        – Solonotix
                        2 days ago

















                      • The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                        – Solonotix
                        2 days ago
















                      The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                      – Solonotix
                      2 days ago





                      The most you will save using datetime2 instead of datetime is 1 byte, and that is if your precision is less than 3, which would mean losing precision on fractional seconds, which isn't always a viable solution. As for the example index provided, the design was kept simple to focus on my question. A Nonclustered index will have the clustering key appended to the end of the data row for key lookups internally. As such, the two indexes are physically the same, which was the point of my question.

                      – Solonotix
                      2 days ago











                      -4














                      It looks like your simply using PK to make an alternate, smaller index. Hence, performance on it is faster.



                      You see this at companies that have massive data tables (eg: master data tables). Someone decides to have one massive clustered index on it expecting it to fill the needs of various reporting groups.



                      But, one group may need only a few parts of that index while another group needs other parts.. so the index just slapping in every column under the sun to "optimize performance" doesn't really help.



                      Meanwhile, breaking it down to create multiple, smaller, targeted indices, often solves the problem.



                      And, that seems to be what you're doing. You have this massive clustered index with awful performance, then you're using PK to create another index with fewer columns that (no surprise) has better performance.



                      So, just do an analysis and figure out if you can take the single clustered index and break it down into smaller, targeted indices that specific jobs need.



                      You would have to analyze performance then from a "single index vs. multiple index" stand-point, because there's overhead in making and updating indices. But, you have to analyze this from an overall perspective.



                      EG: it may be less resource-intensive to one massive clustered index, and more resource intensive to have several smaller targeted indices. But, if you're then able to run targeted queries at the back-end much quicker, saving time (and money) there, it might be worth it.



                      So, you'd have to do end-to-end analysis.. not just look at how it impacts your own world, but also how it impacts end-users.



                      I just feel like you're mis-using the PK identifier. But, you may be using a database system that only allows 1 index (?), but you can sneak another in if you PK (b/c every relational database system these days seems to automatically index the PK). However, most modern RDBMS' should allow multiple index creation; there should be no limit to the number of indexes you can make (as opposed to a limit of 1 PK).



                      So, by making a PK whicih just acts like an alt index.. you're using up your PK, which may be needed if the table is later expanded in it's role.



                      That's not to say your table doesn't need a PK.. SOP DB's 101 says "every table should have a PK". But, in a data-warehousing situation or such.. having a PK on a table may just be extra overhead that you don't need. Or, it could be a god-send to ensure you're not double-adding dupe entries. It's really a matter of what you're doing and why you're doing it.



                      But, massive tables defnitely benefit from having indexes. But, assuming a single massive clustered index will be best is just ... it may BE the best.. but I'd recommend testing out on a test env breaking the index apart into multiple smaller indices targeting specific use-case scenarios.






                      share|improve this answer



























                        -4














                        It looks like your simply using PK to make an alternate, smaller index. Hence, performance on it is faster.



                        You see this at companies that have massive data tables (eg: master data tables). Someone decides to have one massive clustered index on it expecting it to fill the needs of various reporting groups.



                        But, one group may need only a few parts of that index while another group needs other parts.. so the index just slapping in every column under the sun to "optimize performance" doesn't really help.



                        Meanwhile, breaking it down to create multiple, smaller, targeted indices, often solves the problem.



                        And, that seems to be what you're doing. You have this massive clustered index with awful performance, then you're using PK to create another index with fewer columns that (no surprise) has better performance.



                        So, just do an analysis and figure out if you can take the single clustered index and break it down into smaller, targeted indices that specific jobs need.



                        You would have to analyze performance then from a "single index vs. multiple index" stand-point, because there's overhead in making and updating indices. But, you have to analyze this from an overall perspective.



                        EG: it may be less resource-intensive to one massive clustered index, and more resource intensive to have several smaller targeted indices. But, if you're then able to run targeted queries at the back-end much quicker, saving time (and money) there, it might be worth it.



                        So, you'd have to do end-to-end analysis.. not just look at how it impacts your own world, but also how it impacts end-users.



                        I just feel like you're mis-using the PK identifier. But, you may be using a database system that only allows 1 index (?), but you can sneak another in if you PK (b/c every relational database system these days seems to automatically index the PK). However, most modern RDBMS' should allow multiple index creation; there should be no limit to the number of indexes you can make (as opposed to a limit of 1 PK).



                        So, by making a PK whicih just acts like an alt index.. you're using up your PK, which may be needed if the table is later expanded in it's role.



                        That's not to say your table doesn't need a PK.. SOP DB's 101 says "every table should have a PK". But, in a data-warehousing situation or such.. having a PK on a table may just be extra overhead that you don't need. Or, it could be a god-send to ensure you're not double-adding dupe entries. It's really a matter of what you're doing and why you're doing it.



                        But, massive tables defnitely benefit from having indexes. But, assuming a single massive clustered index will be best is just ... it may BE the best.. but I'd recommend testing out on a test env breaking the index apart into multiple smaller indices targeting specific use-case scenarios.






                        share|improve this answer

























                          -4












                          -4








                          -4







                          It looks like your simply using PK to make an alternate, smaller index. Hence, performance on it is faster.



                          You see this at companies that have massive data tables (eg: master data tables). Someone decides to have one massive clustered index on it expecting it to fill the needs of various reporting groups.



                          But, one group may need only a few parts of that index while another group needs other parts.. so the index just slapping in every column under the sun to "optimize performance" doesn't really help.



                          Meanwhile, breaking it down to create multiple, smaller, targeted indices, often solves the problem.



                          And, that seems to be what you're doing. You have this massive clustered index with awful performance, then you're using PK to create another index with fewer columns that (no surprise) has better performance.



                          So, just do an analysis and figure out if you can take the single clustered index and break it down into smaller, targeted indices that specific jobs need.



                          You would have to analyze performance then from a "single index vs. multiple index" stand-point, because there's overhead in making and updating indices. But, you have to analyze this from an overall perspective.



                          EG: it may be less resource-intensive to one massive clustered index, and more resource intensive to have several smaller targeted indices. But, if you're then able to run targeted queries at the back-end much quicker, saving time (and money) there, it might be worth it.



                          So, you'd have to do end-to-end analysis.. not just look at how it impacts your own world, but also how it impacts end-users.



                          I just feel like you're mis-using the PK identifier. But, you may be using a database system that only allows 1 index (?), but you can sneak another in if you PK (b/c every relational database system these days seems to automatically index the PK). However, most modern RDBMS' should allow multiple index creation; there should be no limit to the number of indexes you can make (as opposed to a limit of 1 PK).



                          So, by making a PK whicih just acts like an alt index.. you're using up your PK, which may be needed if the table is later expanded in it's role.



                          That's not to say your table doesn't need a PK.. SOP DB's 101 says "every table should have a PK". But, in a data-warehousing situation or such.. having a PK on a table may just be extra overhead that you don't need. Or, it could be a god-send to ensure you're not double-adding dupe entries. It's really a matter of what you're doing and why you're doing it.



                          But, massive tables defnitely benefit from having indexes. But, assuming a single massive clustered index will be best is just ... it may BE the best.. but I'd recommend testing out on a test env breaking the index apart into multiple smaller indices targeting specific use-case scenarios.






                          share|improve this answer













                          It looks like your simply using PK to make an alternate, smaller index. Hence, performance on it is faster.



                          You see this at companies that have massive data tables (eg: master data tables). Someone decides to have one massive clustered index on it expecting it to fill the needs of various reporting groups.



                          But, one group may need only a few parts of that index while another group needs other parts.. so the index just slapping in every column under the sun to "optimize performance" doesn't really help.



                          Meanwhile, breaking it down to create multiple, smaller, targeted indices, often solves the problem.



                          And, that seems to be what you're doing. You have this massive clustered index with awful performance, then you're using PK to create another index with fewer columns that (no surprise) has better performance.



                          So, just do an analysis and figure out if you can take the single clustered index and break it down into smaller, targeted indices that specific jobs need.



                          You would have to analyze performance then from a "single index vs. multiple index" stand-point, because there's overhead in making and updating indices. But, you have to analyze this from an overall perspective.



                          EG: it may be less resource-intensive to one massive clustered index, and more resource intensive to have several smaller targeted indices. But, if you're then able to run targeted queries at the back-end much quicker, saving time (and money) there, it might be worth it.



                          So, you'd have to do end-to-end analysis.. not just look at how it impacts your own world, but also how it impacts end-users.



                          I just feel like you're mis-using the PK identifier. But, you may be using a database system that only allows 1 index (?), but you can sneak another in if you PK (b/c every relational database system these days seems to automatically index the PK). However, most modern RDBMS' should allow multiple index creation; there should be no limit to the number of indexes you can make (as opposed to a limit of 1 PK).



                          So, by making a PK whicih just acts like an alt index.. you're using up your PK, which may be needed if the table is later expanded in it's role.



                          That's not to say your table doesn't need a PK.. SOP DB's 101 says "every table should have a PK". But, in a data-warehousing situation or such.. having a PK on a table may just be extra overhead that you don't need. Or, it could be a god-send to ensure you're not double-adding dupe entries. It's really a matter of what you're doing and why you're doing it.



                          But, massive tables defnitely benefit from having indexes. But, assuming a single massive clustered index will be best is just ... it may BE the best.. but I'd recommend testing out on a test env breaking the index apart into multiple smaller indices targeting specific use-case scenarios.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Jul 16 at 18:20









                          blahblahblahblah

                          1




                          1



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Database Administrators Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f242908%2findex-uniqueness-overhead%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                              Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                              Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?