When to use negative identity

Categories: ArticlesTags: DatabaseDesign

The identity value in relational databases is a field that increases automatically. It is often used to create surrogate primary keys.

Surrogate keys

Surrogate keys are meaningless and are only used to uniquely identify the row, not the data itself. For example, assume the following table:

CREATE TABLE people (
	id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	first_name VARCHAR(200),
	second_name VARCHAR(200)
)

+----+------------+-------------+
| id | first_name | second_name |
+----+------------+-------------+
|  1 | Marcin     | Gminski     |
|  2 | Marcin     | Gminski     |
+----+------------+-------------+

As we can see, the identity values are 1, 2 and satisfy the uniqueness of the Primary Key. In such a case, the identity does not prevent duplicate data being inserted into the table. If we wanted to prevent duplicated persons, we would have to create a unique index on first_name and second_name columns but that’s a separate discussion.

So, if the identity fields are meaningless, why do they often start with 1?

INT IDENTITY(1,1)

Capacity of data types

In our example, the id field has been defined with the integer data type INT. An integer is 4 bytes long which means it can accommodate up to 4294967296 values and thus records. However, important to note that the range is from -2147483648 to 2147483647 and therefore the highest (max) number is 2147483647 and not 4294967296.

With this in mind, if we start with 1, the max number of records we can have is 2147483647, exactly half of the integer capacity.

In order to accommodate more records, we would often change the INT type to BIGINT. However, BIGINT is 8 bytes long which could drastically increase the size of the table. We could simply start the identity from the negative value which would give us twice the capacity without increasing the storage utilisation beyond the actual data.

INT IDENTITY(-2147483648,1)

Data Compression

The physical database design is an important factor when building a database. In SQL Server, ROW and PAGE compression will compress smaller values better:

Uses only the bytes that are needed. For example, if a value can be stored in 1 byte, storage will take only 1 byte.

This means that if we start identity with 1 we are going to have a better compression from day 1 on the smaller values. The compression efficiency will decrease as values increase. If we start the identity from the negative value, the compression will be inefficient at the beginning getting better towards the zero and then worsen again.

For example, if we know that our table will not exceed 1 billion rows we can safely start identity from 1 and benefit from more efficient data compression.

Application impact

This is something I have heard many times:

“We cannot use negative IDs because the application does not handle it”.

This suggests to me that you are misusing surrogate keys. Your application should not rely on surrogate keys, which are by definition meaningless. If your application would break with negative keys, it means they are not meaningless.

Conclusion

There are many aspects that dictate good database design. The first and foremost is to understand the data that will be stored in the database. Using the correct data types, compression and the correct identity field allocation should all be taken into account when designing the database and long term storage capacity.

This post was originally published on June 16, 2020.

The best career advice I ever received (T-SQL Tuesday #163)

How SQL configuration can save you money

How Policy Based Management can help you manage SQL Server

When to use negative identity

Surrogate keys

Capacity of data types

Data Compression

Application impact

Conclusion

Help us grow, please share this content!

Author

Let us help you. Get in touch.

Recent Posts

Help us grow, please share this content!

Contact