Spot the web RSS 2.0
# Tuesday, February 12, 2008

There are at least 5 ways to return data from one table which is not in another table. Two of these are SQL Server 2005 and greater only. This is a post mostly for beginners but hopefully everyone will get something out of it.

Here are the 5 different ways

NOT IN
NOT EXISTS
OUTER JOIN
OUTER APPLY (2005+)
EXCEPT (2005+)

Let's see how this all works
First create these two tables with the Celko approved naming convention.

 

CREATE TABLE #testnulls (ID INT)

INSERT INTO #testnulls VALUES (1)

INSERT INTO #testnulls VALUES (2)

INSERT INTO #testnulls VALUES (null)

 

CREATE TABLE #testjoin (ID INT)

INSERT INTO #testjoin VALUES (1)

INSERT INTO #testjoin VALUES (3)

NOT IN
Run the following Code

 

SELECT * FROM #testjoin WHERE ID NOT IN(SELECT ID FROM #testnulls)

What happened? Nothing gets returned! The reason is because the subquery returns a NULL and you can't compare a NULL to anything

Now run this

SELECT * FROM #testjoin

WHERE ID NOT IN(SELECT ID FROM #testnulls WHERE ID IS NOT NULL)

That worked because we eliminated the NULL values in the subquery

This also works

SELECT * FROM #testjoin j

WHERE j.ID NOT IN(SELECT ID FROM #testnulls n WHERE n.ID = j.ID)

 


NOT EXISTS
NOT EXISTS doesn't have the problem that NOT IN has. Run the following code

 

SELECT * FROM #testjoin j

WHERE NOT EXISTS (SELECT 1

FROM #testnulls n

WHERE n.ID = j.ID)

Everything worked as expected


LEFT and RIGHT JOIN
Plain vanilla LEFT and RIGHT JOINS

 

SELECT j.* FROM #testjoin j

LEFT OUTER JOIN #testnulls n ON n.ID = j.ID

WHERE n.ID IS NULL

With a RIGHT Join you just switch the tables around

SELECT j.* FROM #testnulls n

RIGHT OUTER JOIN #testjoin j ON n.ID = j.ID

WHERE n.ID IS NULL

 

And we can also do a full outer join

SELECT j.* FROM #testnulls n

FULL OUTER JOIN #testjoin j ON n.ID = j.ID

WHERE n.ID IS NULL

AND j.ID IS NOT NULL


You might wonder why we have LEFT and RIGHT Joins, here is why:
<AttemptToBeFunny>LEFT joins are for people who tend to vote for the democrats, RIGHT joins are for people who tend to vote for Republicans. FULL Joins are for independents/undecided people. </AttemptToBeFunny>

You can be real silly and do a subquery LEFT join
 

SELECT j.* FROM #testjoin j

LEFT OUTER JOIN (SELECT ID FROM #testnulls ) n ON n.ID = j.ID

WHERE n.ID IS NULL

 

Now let's talk about SQL 2005 and up

OUTER APPLY (SQL 2005 +)
OUTER APPLY is something that got added to SQL 2005

SELECT j.* FROM #testjoin j

OUTER APPLY

(SELECT id FROM #testnulls n

WHERE n.ID = j.ID) a

WHERE a.ID IS NULL

 

EXCEPT(SQL 2005 +)
EXCEPT is something that got added to SQL 2005. It basically returns everything from the top table which is not in the bottom table

 

SELECT * FROM #testjoin

EXCEPT

SELECT * FROM #testnulls


I am also mentioning INTERSECT since some people might not have seen it before. INTERSECT returns what ever is in both tables(like a regular join)

 

SELECT * FROM #testjoin

INTERSECT

SELECT * FROM #testnulls

 

So there you have it, most likely you already know all these types of joins. If you learned something from this post that is a good thing also.

Tuesday, February 12, 2008 5:05:36 PM (Jerusalem Standard Time, UTC+02:00)  #    Comments [1] - Trackback
Programming | SQL
# Tuesday, February 05, 2008

Working as a team against a common database schema can be a real challenge. Some teams prefer to have their local code connect to a centralized database, but this approach can create many headaches. If I make a schema change to a shared database, but am not ready to check in my code, that can break the site for another developer. For a project like Subtext, it is just not feasible to have a central database.

Instead, I prefer to work on a local copy of the database and propagate changes via versioned change scripts. That way, when I check in my code, I can let others know which scripts to run on their local database when they get latest source code. Of course this can be also be a big challenge as the number of scripts starts to grow and developers are stuck bookkeeping which scripts they have run and which they haven’t.

That is why I always recommend to my teams that we script schema and data changes in an idempotent manner whenever possible. That way, it is much easier to simply batch updates together in a single file (per release for example) and a developer simply runs that single script any time an update is made.

As an example, suppose we have a Customer table and we need to add a column for the customer’s favorite color. I would script it like so:

IF NOT EXISTS 
(
    SELECT * FROM [information_schema].[columns] 
    WHERE   table_name = 'Customer' 
    AND table_schema = 'dbo'
    AND column_name = 'FavoriteColorId'
)
BEGIN
    ALTER TABLE [dbo].[Customer]
    ADD FavoriteColorId int
END

This script basically checks for the existence of the FavoriteColorId column on the table Customer and if it doesn’t exist, it adds it. You can run this script a million times, and it will only make the schema change once.

You’ll notice that I didn’t query against the system tables, instead choosing to lookup the information in an INFORMATION_SCHEMA view named Columns. This is the Microsoft recommendation as they reserve the right to change the system tables at any time. The information views are part of the SQL-92 standard, so they are not likely to change.

There are 20 schema views in all, listed below with their purpose (aggregated from SQL Books). Note that in all cases, only data accessible to the user executing the query against the information_schema views is returned.

Name Returns
CHECK_CONSTRAINTS Check Constraints
COLUMN_DOMAIN_USAGE Every column that has a user-defined data type.
COLUMN_PRIVILEGES Every column with a privilege granted to or by the current user in the current database.
COLUMNS Lists every column in the system
CONSTRAINT_COLUMN_USAGE Every column that has a constraint defined on it.
CONSTRAINT_TABLE_USAGE Every table that has a constraint defined on it.
DOMAIN_CONSTRAINTS Every user-defined data type with a rule bound to it.
DOMAINS Every user-defined data type.
KEY_COLUMN_USAGE Every column that is constrained as a key
PARAMETERS Every parameter for every user-defined function or stored procedure in the datbase. For functions this returns one row with return value information.
REFERENTIAL_CONSTRAINTS Every foreign constraint in the system.
ROUTINE_COLUMNS Every column returned by table-valued functions.
ROUTINES Every stored procedure and function in the database.
SCHEMATA Every database in the system.
TABLE_CONSTRAINTS Every table constraint.
TABLE_PRIVILEGES Every table privilege granted to or by the current user.
TABLES Every table in the system.
VIEW_COLUMN_USAGE Every column used in a view definition.
VIEW_TABLE_USAGE Every table used in a view definition.
VIEWS Every View

When selecting rows from these views, the table must be prefixed with information_schema as in SELECT * FROM information_schema.tables.

Please note that the information schema views are based on a SQL-92 standard so some of the terms used in these views are different than the terms in Microsoft SQL Server. For example, in the example above, I set table_schema = 'dbo'. The term schema refers to the owner of the database object.

Here is another code example in which I add a constraint to the Customer table.

IF NOT EXISTS(
    SELECT * 
    FROM [information_schema].[referential_constraints] 
    WHERE constraint_name = 'FK_Customer_Color' 
      AND constraint_schema = 'dbo'
)
BEGIN
  ALTER TABLE dbo.Customer WITH NOCHECK 
  ADD CONSTRAINT
  FK_Customer_Color FOREIGN KEY
  (
    FavoriteColorId
  ) REFERENCES dbo.Color
  (
    Id
  )
END

I generally don’t go to all this trouble for stored procedures, user defined functions, and views. In those cases I will use Enterprise manager generate a full drop and create script. When a stored procedure is dropped and re-created, you don’t lose data as you would if you dropped and re-created a table that contained some data.

With this approach in hand, I can run an update script with new schema changes confident that I any changes in the script that I have already applied will not be applied again. The same approach works for lookup data as well. Simply check for the data’s existence before inserting the data. It is a little bit more work up front, but it is worth the trouble and schema changes happen less frequently than code or stored procedure changes.

Tuesday, February 05, 2008 11:57:38 AM (Jerusalem Standard Time, UTC+02:00)  #    Comments [0] - Trackback
Programming | SQL

Some developers love working with relational databases, and other developers can't stand to touch them. Either way - if your application uses a database, you have to treat the database with some respect. The database is as much a part of an application as the code and the models inside the software.

Here are three rules I've learned to live by over the years of working with relational databases.

1. Never use a shared database server for development work.

Fossil!The convenience of a shared database is tempting. All developers point their workstations to a single database server where they can test and make schema changes. The shared server functions as an authoritative source for the database schema, and schema changes appear immediately to all team members. The shared database also serves as a central repository for test data.

Like many conveniences in software development, a shared database is a tar pit waiting to fossilize a project. Developers overwrite each other's changes. The changes I make on the server break the code on your development machine. Remote development is slow and difficult.

Avoid using a shared database at all costs, as they ultimately waste time and help produce bugs.

2. Always Have a Single, Authoritative Source For Your Schema

Ideally, this single source will be your source control repository (see rule #3). Consider the following conversation:

Developer 1: It's time to push the app into testing. Do we copy the database from Jack's machine, or Jill's machine?

Developer 2: Ummmmmmmm, I don't remember which one is up to date.

Developer 1: We're screwed.

Everyone should know where the official schema resides, and have a frictionless experience in getting a fresh database setup. I should be able to walk up to a computer, get the latest from source control, build, and run a simple tool to setup the database (in many scenarios, the build process can even setup a database if none exists, so the process is one step shorter).

How you put your database into source control depends on your situation and preferences. Any decent O/R mapping tool should be able to create a database given the mappings you've defined in a project. You can also script out the database as a set of one or more files full of SQL DDL commands. I generally prefer to keep database views and programmatic features (including functions, triggers, and stored procedures) as separate files - but more on this in a later post.

There are plenty of tools to help. Leon Bambrick has a long list (albeit one year old list) of tools and articles that can help, while Jeff Atwood gushes over the virtues of Visual Studio for Database Professionals.

3. Always Version Your Database

There are many ways to version databases, but the common goal is to propagate changes from development, to test, and ultimately to production in a controlled and consistent manner. A second goal is to have the ability to recreate a database at any point in time. This second goal is particularly important if you are shipping software to clients. If someone finds a bug in build 20070612.1 of your application, you must be able to recreate the application as it appeared in that build - database and all.

In a future post, I'll describe an approach I've used for database versioning that has worked well for many years of commercial development.

In the meantime, if you are looking for more database rule, then Adam Cogan and SSW maintain an excellent list.

Tuesday, February 05, 2008 11:35:24 AM (Jerusalem Standard Time, UTC+02:00)  #    Comments [0] - Trackback
Programming | SQL
Navigation
Archive
<July 2010>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
Guy Levin
Sign In
Statistics
Total Posts: 63
This Year: 0
This Month: 0
This Week: 0
Comments: 14
Themes
All Content © 2010, Guy Levin