Persistence of Vision: SELECT INTO is faster than INSERT SELECT

Thursday, August 23, 2012

SELECT INTO is faster than INSERT SELECT

I had heard that SELECT INTO is faster than INSERT SELECT, so I wanted to prove it.

Here are the results of testing in both FULL and SIMPLE recover models. Each test was run twice, once with INSERT-SELECT first and once with SELECT-INTO first.

The net result is that, at least for this test code on my hardware, using SELECT INTO is an order of magnitude faster in SIMPLE recover mode than INSERT SELECT. Using SELECT INTO is faster in FULL recover mode than INSERT SELECT, but the difference is less significant.

Test Results

Recovery Model: Simple
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT: 8,391,839
Elapsed Mcs - SELECT-INTO: 962,096

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT:7,205,720
Elapsed Mcs - SELECT-INTO: 918,092

Recovery Model: Full
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT:7,095,709
Elapsed Mcs - SELECT-INTO:1,289,129

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT: 6,274,628
Elapsed Mcs - SELECT-INTO:1,902,190

Test Script

USE MASTER;
GO
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SelectInto') ALTER DATABASE SelectInto SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SelectInto') DROP DATABASE  SelectInto;
GO
CREATE DATABASE SelectInto;
GO
ALTER DATABASE SelectInto SET RECOVERY FULL;
--ALTER DATABASE SelectInto SET RECOVERY SIMPLE;
GO
USE SelectInto;
GO

-- Source table: t1
CREATE TABLE dbo.t1 (
   ColID       INT         NOT NULL,
   Col1        CHAR(30)    NOT NULL DEFAULT 'asdf',
   CONSTRAINT PK_t1 PRIMARY KEY CLUSTERED (ColID)
);

-- Populate t1
DECLARE @MaxRows    INT     = 1000000;
DECLARE @i          INT     = 0;
WHILE @i < @MaxRows
BEGIN
   SET @i += 1;
   INSERT INTO dbo.t1 (ColID) VALUES (@i);
END;

-- Destination heap: t2
CREATE TABLE dbo.t2 (
   ColID       INT         NOT NULL,
   Col1        CHAR(30)    NOT NULL DEFAULT 'asdf'
);

DECLARE @t1     time(7);
DECLARE @t2     time(7);

-- INSERT/SELECT
SET @t1 = SYSUTCDATETIME();

INSERT INTO dbo.t2 (ColID, Col1)
SELECT ColID, Col1 FROM dbo.t1;

SET @t2 = SYSUTCDATETIME();
SELECT DATEDIFF(mcs, @t1, @t2) AS ElapsedMcs;

-- SELECT INTO
SET @t1 = SYSUTCDATETIME();

SELECT ColID, Col1
INTO dbo.t3
FROM dbo.t1;

SET @t2 = SYSUTCDATETIME();
SELECT DATEDIFF(mcs, @t1, @t2) AS ElapsedMcs;

References

INTO Clause at MSDN

5 comments:

Anonymous6:38 AM
Does the same difference hold true for a SELECT * INTO with a WHERE clause or a SELECT {fieldlist} INTO statement? Is it possible that SQL interprets a SELECT * INTO as a simple table copy operation and uses an alternate method under the covers?
ReplyDelete
Replies
Rob Garrison12:55 PM
I changed the script so that it inserts only every fifth record and only populates Col1. With 100,000 records in the base table, the results still show SELECT INTO being significantly faster.

Recovery Model: Simple
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT: 402,040
Elapsed Mcs - SELECT-INTO: 37,003

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT: 197,020
Elapsed Mcs - SELECT-INTO: 73,007

Recovery Model: Full
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT: 247,025
Elapsed Mcs - SELECT-INTO: 36,003

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT: 183,018
Elapsed Mcs - SELECT-INTO: 91,009
ReplyDelete
Replies
Paul White6:46 AM
Hi Rob,

You don't say which version of SQL Server you tested on, but if it is a reasonably recent one (2008+) you might find you get very different performance results if you add a WITH (TABLOCK) hint to the INSERT:

INSERT INTO dbo.t2 WITH (TABLOCK) (ColID, Col1)

Paul
ReplyDelete
Replies
Rob Garrison8:32 AM
Paul, the original results were on SQL Server 2008 R2.

I reran the test with TABLOCK added. The results are quite interesting.

Records Inserted: 1,000,000
Order: INSERT-SELECT first

Recovery Model: Full
Elapsed Mcs - INSERT-SELECT: 1,840,184
Elapsed Mcs - SELECT-INTO: 1,371,137

Recovery Model: Simple
Elapsed Mcs - INSERT-SELECT: 904,091
Elapsed Mcs - SELECT-INTO: 872,087

This shows that the differences are quite small with TABLOCK.

Great comment. Thanks!

I will rerun the test using SQL 2012 and publish those results also.
ReplyDelete
Replies
Rob Garrison9:18 AM
Very interesting results with SQL 2012. In both of these tests, SELECT INTO was actually slower.

Records Inserted: 1,000,000
Order: INSERT-SELECT first

Recovery Model: Full
Elapsed Mcs - INSERT-SELECT: 2,714,272
Elapsed Mcs - SELECT-INTO: 3,032,303

Recovery Model: Simple
Elapsed Mcs - INSERT-SELECT: 1,383,138
Elapsed Mcs - SELECT-INTO: 1,196,119
ReplyDelete
Replies

Add comment