I was asked recently to write up a basic set of database standards and methodologies we would be using for an upcoming project. I tend to use the same standards on every project that I manage, and I am usually the final authority on determining our approach, so there usually isn’t a lot of discussion around this topic. However, in this case the client wanted some insight into our best practices, so we needed to hash it out and provide it in writing.
Anyway, here are my thoughts on data modeling standards and conventions. It’s basically just a brain dump at this point, and I am posting it (almost) verbatim from the initial, informal draft, with minor commentary in italics.
Modeling and Design
The logical and physical database modeling will be created up front, and will be built to reflect our best understanding of the data storage/retrieval requirements of the system.
Although the deployed database may be generated using [our ORM’s] schema export tool, the database will be designed and modeled up front, and independently of the application domain model. In other words, we will not design the database to conform with the domain structure, nor design the domain structure to conform with the database—we will develop each independently and use [our ORM tool] to bridge the gap.
This is something I firmly believe it: your data model is at least as important as your domain model, and should reflect the absolute best practices for the storing and querying of the system’s data. The data model is not an afterthought or an implementation detail—it is one of the core components of your system.
I may do a full post on this specific point, as I have pretty strong opinions on it.
- Every effort will be made to make each identifier in the database as descriptive and expressive as possible.
- Unless noted elsewhere, identifiers will not use a type-specific prefix (e.g. tables will not begin with “tbl”, views will not begin with “vw”).
- All table, column, and view names will be expressed in lower case letters.
- All table, column, and view names will be singular (“customer” instead of “customers”).
- Words that make up table, column, and view names will be separated by an underscore (e.g. “customer_status_id”).
- Words that make up table, column, and view names will not be abbreviated, concatenated, or acronym-based unless necessary.
- Many-to-many table names will include both related table names, when possible.
- Where appropriate, column names will be prefixed with their table name (“customer_id” instead of “id”).
I have to admit I don’t have an air-tight reason for this last one. I like PK/FK names to be the same on both sides, but this is not always possible anyway in cases of a self-referencing table, or multiple FKs to another table.
- All identifiers will be limited to 30 characters. This rule supersedes the rules regarding expressiveness and abbreviations. [I only include this requirement if there is any chance the system will need to be database platform agnostic.]
- Indexes will be prefixed with “IX_”.
- Primary Keys will be prefixed with “PK_”.
- Foreign Keys will be prefixed with “FK_”.
- Unique Constraints will be prefixed with “UC_”.
General Design Principles
- The physical representation of the data model will use physical PK/FK relationships, not logical/metadata relationships.
Seem like a ridiculous thing to have to say? As a consultant, you will definitely see legacy systems with no foreign key constraints—there are still developers out there who believe that formally enforcing referential integrity is overkill.
- Each logical entity in the database will be represented by a separate table. There will be no concept of a “general entity” table.
I am not a fan of systems that track relationships between logical entities as metadata. To me this is a design smell that typifies developing a domain model (based on OOAD/OOP principles) and trying to retrofit it into a data model. A data model and a class hierarchy are fundamentally different conceptually, and the concerns that each addresses should be considered completely without the context of the other. There are ways of doing inheritance in a data model, and they certainly can work, and there are circumstances where they may be necessary, but a “separate table per logical entity” is always my opening position when starting a new project.
- Common sense will dictate the up-front creation of basic indexes (e.g. indexes on foreign keys). Performance testing and tuning will determine where other indexes need to be created.
- Index creation and tuning may be implementation specific; each client implementation may need a different level of tuning based on their data.
- Database normalization (to 3NF) will be adhered to all aspects of the transactional data model. Normalization principles will only be violated in situations where a performance problem can’t be solved by any other means.
(EDIT: I can’t believe it I did this, as it’s such a key topic for me, but I substituted “4NF” both in this post and in the comments when I really meant “3NF”. I fixed it above.)
I originally wrote about three paragraphs about normalization, as I have very strong opinions on this topic. It was mostly ranting about misconceptions about normalization, what it does and doesn’t mean, and (again) how OOAD/OOP principles tend to bleed into database conversations (just because you have a separate address class, doesn’t necessarily mean you need a separate address table!).
Of course, ranting belongs here, not in a client deliverable, so I pared it down to the text shown above.
- Where possible, a single-column, database-generated surrogate key will be created for each table in lieu of using a natural or composite key.
- Fine-grained security will be handled in the application layer, and where possible a single database login will be used for application access to the database.
So that’s my list/rant of standards, or at least what I came up with off the top if my head. If you vehemently disagree with any of the above, feel free to say so in the comments.