Have Code, Will Travel

December 10, 2009

Database Standards and Conventions

Filed under: SQL — Phil Sandler @ 4:38 pm
Tags:

I was asked recently to write up a basic set of database standards and methodologies we would be using for an upcoming project.  I tend to use the same standards on every project that I manage, and I am usually the final authority on determining our approach, so there usually isn’t a lot of discussion around this topic.  However, in this case the client wanted some insight into our best practices, so we needed to hash it out and provide it in writing.

As a general matter, I feel strongly that having a set of standards, any standards, is preferable to everyone just doing things in their own way.  I would prefer to have a terrible standard that everyone follows than to not have a standard at all–this goes for all aspects of the code base: the database, application code, javascript, CSS, etc.  Obviously, having a solid standard that is rooted in pragmatism is preferable to one that is thrown together with little or no thought, or is based on an outdated principle (e.g. scarcity of storage space or Hungarian notation). I believe that conforming to a solid standard is one of the keys to developing maintainable software.

Anyway, here are my thoughts on data modeling standards and conventions.  It’s basically just a brain dump at this point, and I am posting it (almost) verbatim from the initial, informal draft, with minor commentary in italics.

Modeling and Design

The logical and physical database modeling will be created up front, and will be built to reflect our best understanding of the data storage/retrieval requirements of the system.

Although the deployed database may be generated using [our ORM’s] schema export tool, the database will be designed and modeled up front, and independently of the application domain model. In other words, we will not design the database to conform with the domain structure, nor design the domain structure to conform with the database—we will develop each independently and use [our ORM tool] to bridge the gap.

This is something I firmly believe it: your data model is at least as important as your domain model, and should reflect the absolute best practices for the storing and querying of the system’s data.  The data model is not an afterthought or an implementation detail—it is one of the core components of your system.

I may do a full post on this specific point, as I have pretty strong opinions on it.  🙂

Naming Conventions

  • Every effort will be made to make each identifier in the database as descriptive and expressive as possible.
  • Unless noted elsewhere, identifiers will not use a type-specific prefix (e.g. tables will not begin with “tbl”, views will not begin with “vw”).
  • All table, column, and view names will be expressed in lower case letters.
  • All table, column, and view names will be singular (“customer” instead of “customers”).
  • Words that make up table, column, and view names will be separated by an underscore (e.g. “customer_status_id”).
  • Words that make up table, column, and view names will not be abbreviated, concatenated, or acronym-based unless necessary.
  • Many-to-many table names will include both related table names, when possible.
  • Where appropriate, column names will be prefixed with their table name (“customer_id” instead of “id”).

I have to admit I don’t have an air-tight reason for this last one.  I like PK/FK names to be the same on both sides, but this is not always possible anyway in cases of a self-referencing table, or multiple FKs to another table.

  • All identifiers will be limited to 30 characters. This rule supersedes the rules regarding expressiveness and abbreviations. [I only include this requirement if there is any chance the system will need to be database platform agnostic.]
  • Indexes will be prefixed with “IX_”.
  • Primary Keys will be prefixed with “PK_”.
  • Foreign Keys will be prefixed with “FK_”.
  • Unique Constraints will be prefixed with “UC_”.

General Design Principles

  • The physical representation of the data model will use physical PK/FK relationships, not logical/metadata relationships.

Seem like a ridiculous thing to have to say?  As a consultant, you will definitely see legacy systems with no foreign key constraints—there are still developers out there who believe that formally enforcing referential integrity is overkill.

  • Each logical entity in the database will be represented by a separate table. There will be no concept of a “general entity” table.

I am not a fan of systems that track relationships between logical entities as metadata.  To me this is a design smell that typifies developing a domain model (based on OOAD/OOP principles) and trying to retrofit it into a data model.  A data model and a class hierarchy are fundamentally different conceptually, and the concerns that each addresses should be considered completely without the context of the other. There are ways of doing inheritance in a data model, and they certainly can work, and there are circumstances where they may be necessary, but a “separate table per logical entity” is always my opening position when starting a new project.

  • Common sense will dictate the up-front creation of basic indexes (e.g. indexes on foreign keys). Performance testing and tuning will determine where other indexes need to be created.
  • Index creation and tuning may be implementation specific; each client implementation may need a different level of tuning based on their data.
  • Database normalization (to 3NF) will be adhered to all aspects of the transactional data model. Normalization principles will only be violated in situations where a performance problem can’t be solved by any other means.

(EDIT: I can’t believe it I did this, as it’s such a key topic for me, but I substituted “4NF” both in this post and in the comments when I really meant “3NF”.  I fixed it above.)

I originally wrote about three paragraphs about normalization, as I have very strong opinions on this topic.  It was mostly ranting about misconceptions about normalization, what it does and doesn’t mean, and (again) how OOAD/OOP principles tend to bleed into database conversations (just because you have a separate address class, doesn’t necessarily mean you need a separate address table!).

Of course, ranting belongs here, not in a client deliverable, so I pared it down to the text shown above.

  • Where possible, a single-column, database-generated surrogate key will be created for each table in lieu of using a natural or composite key.
  • Fine-grained security will be handled in the application layer, and where possible a single database login will be used for application access to the database.

So that’s my list/rant of standards, or at least what I came up with off the top if my head.  If you vehemently disagree with any of the above, feel free to say so in the comments.

Shout it

Advertisements

3 Comments »

  1. I don’t vehemently disagree with any of the standards but I do disagree with some:

    – Not using “tbl” prefix for tables I agree with, nor should “table” be used as a suffix as in customer_table.

    – Lowercase Identifiers – I can see where this would make things consistent but does it really matter if a field name is called productid or ProductID (I the the latter)

    – Singular entity names – I agree but sometimes forget such as creating a database called Tickets instead of Ticket.

    – Underscores – Don’t like ’em one but, but have used them. I can type CustomerID much faster than I can Customer_ID. I guess the Shift underline key sequence is one my fingers haven’t learned well enough yet.

    Indexes, default constrains, etc should use names that make sense not the oddball names SQL Server creates. This is much better in 2005 but most of the databases I work with were upgraded inplace from 2000.

    Standards are nice and I’m slowly tweaking our databases to have some standardization to them. It’s tough though when you have other database users that say “but I’ve been type sbatc_config_persistent for the last 10 years!”

    Comment by Zack Jones — December 11, 2009 @ 8:00 am | Reply

  2. Hey Phil – We should document this more formally internally, right along side the coding standards. Also, I can’t think of anything right now – but some of these standards are different when you’re using Oracle (i.e. everything is capitalized), or there are other things you can do if you’re potentially planning on migrating from SQL to Oracle or vice-versa.

    I agree with almost all of the points except maybe –

    -Underscores in SQL Server – I’ve found PascalCasing much easier to work with in SQL, but usually use underscores in Oracle though.

    -FKs should be named as FK_TABLE1_COLUMN1_TABLE2_COLUMN2 if possible

    -Where appropriate, column names will be prefixed with their table name (“customer_id” instead of “id”). – I usually don’t really like doing this, but it makes things easier if you’re doing more in-database work (reports, queries, sprocs, etc.). Less aliasing required.

    -Database normalization (to 4NF) – Sometimes this can make things way more difficult than they need to be if you take this to the Nth degree.

    Comment by John — December 11, 2009 @ 3:01 pm | Reply

  3. @John and @Zack: it may be thin, but underscores force the separation and easy reading of words within a name. In SQL Server (which is by far the databsae I use the most), names are not case sensitive, and casing is a much bigger factor if you use straight camel casing than if you use underscores. For example, “AssignmentTypeConfiguration” becoming “assignmenttypeconfiguration” vs. “assignment_type_configuration” becoming “Assignment_Type_Configuration”.

    The best case against underscores is a situation where you may need cross-database (Oracle) support at some point, and characters in a name are limited.

    At the end of the day, though, I’d rather have everyone do camel case then have everyone do it their own way.

    @John: I agree on the FK names–the more expressive the better. Re: normalization, I’m ok with things being difficult if the data in question is important. As I said, I’m a hardass about 4NF. But without knowing what you had in mind with your comment, I probably shouldn’t get too huffy. 🙂

    Comment by Phil Sandler — December 11, 2009 @ 8:43 pm | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: