Thursday, August 23, 2012

SELECT INTO is faster than INSERT SELECT

I had heard that SELECT INTO is faster than INSERT SELECT, so I wanted to prove it.

Here are the results of testing in both FULL and SIMPLE recover models. Each test was run twice, once with INSERT-SELECT first and once with SELECT-INTO first.

The net result is that, at least for this test code on my hardware, using SELECT INTO is an order of magnitude faster in SIMPLE recover mode than INSERT SELECT. Using SELECT INTO is faster in FULL recover mode than INSERT SELECT, but the difference is less significant.

Test Results


Recovery Model: Simple
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT: 8,391,839
Elapsed Mcs - SELECT-INTO: 962,096

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT:7,205,720
Elapsed Mcs - SELECT-INTO: 918,092

Recovery Model: Full
Order: INSERT-SELECT first
Elapsed Mcs - INSERT-SELECT:7,095,709
Elapsed Mcs - SELECT-INTO:1,289,129

Order: SELECT-INTO first
Elapsed Mcs - INSERT-SELECT: 6,274,628
Elapsed Mcs - SELECT-INTO:1,902,190

Test Script


USE MASTER;
GO
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SelectInto') ALTER DATABASE SelectInto SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SelectInto') DROP DATABASE  SelectInto;
GO
CREATE DATABASE SelectInto;
GO
ALTER DATABASE SelectInto SET RECOVERY FULL;
--ALTER DATABASE SelectInto SET RECOVERY SIMPLE;
GO
USE SelectInto;
GO

-- Source table: t1
CREATE TABLE dbo.t1 (
  
ColID       INT         NOT NULL,
  
Col1        CHAR(30)    NOT NULL DEFAULT 'asdf',
  
CONSTRAINT PK_t1 PRIMARY KEY CLUSTERED (ColID)
);

-- Populate t1
DECLARE @MaxRows    INT     = 1000000;
DECLARE @i          INT     = 0;
WHILE @i < @MaxRows
BEGIN
   SET
@i += 1;
  
INSERT INTO dbo.t1 (ColID) VALUES (@i);
END;

-- Destination heap: t2
CREATE TABLE dbo.t2 (
  
ColID       INT         NOT NULL,
  
Col1        CHAR(30)    NOT NULL DEFAULT 'asdf'
);

DECLARE @t1     time(7);
DECLARE @t2     time(7);

-- INSERT/SELECT
SET @t1 = SYSUTCDATETIME();

INSERT INTO dbo.t2 (ColID, Col1)
SELECT ColID, Col1 FROM dbo.t1;

SET @t2 = SYSUTCDATETIME();
SELECT DATEDIFF(mcs, @t1, @t2) AS ElapsedMcs;

-- SELECT INTO
SET @t1 = SYSUTCDATETIME();

SELECT ColID, Col1
INTO dbo.t3
FROM dbo.t1;

SET @t2 = SYSUTCDATETIME();
SELECT DATEDIFF(mcs, @t1, @t2) AS ElapsedMcs;

References


5 comments:

  1. Anonymous6:38 AM

    Does the same difference hold true for a SELECT * INTO with a WHERE clause or a SELECT {fieldlist} INTO statement? Is it possible that SQL interprets a SELECT * INTO as a simple table copy operation and uses an alternate method under the covers?

    ReplyDelete
  2. I changed the script so that it inserts only every fifth record and only populates Col1. With 100,000 records in the base table, the results still show SELECT INTO being significantly faster.

    Recovery Model: Simple
    Order: INSERT-SELECT first
    Elapsed Mcs - INSERT-SELECT: 402,040
    Elapsed Mcs - SELECT-INTO: 37,003


    Order: SELECT-INTO first
    Elapsed Mcs - INSERT-SELECT: 197,020
    Elapsed Mcs - SELECT-INTO: 73,007

    Recovery Model: Full
    Order: INSERT-SELECT first
    Elapsed Mcs - INSERT-SELECT: 247,025
    Elapsed Mcs - SELECT-INTO: 36,003

    Order: SELECT-INTO first
    Elapsed Mcs - INSERT-SELECT: 183,018
    Elapsed Mcs - SELECT-INTO: 91,009

    ReplyDelete
  3. Hi Rob,

    You don't say which version of SQL Server you tested on, but if it is a reasonably recent one (2008+) you might find you get very different performance results if you add a WITH (TABLOCK) hint to the INSERT:

    INSERT INTO dbo.t2 WITH (TABLOCK) (ColID, Col1)

    Paul

    ReplyDelete
  4. Paul, the original results were on SQL Server 2008 R2.

    I reran the test with TABLOCK added. The results are quite interesting.

    Records Inserted: 1,000,000
    Order: INSERT-SELECT first

    Recovery Model: Full
    Elapsed Mcs - INSERT-SELECT: 1,840,184
    Elapsed Mcs - SELECT-INTO: 1,371,137

    Recovery Model: Simple
    Elapsed Mcs - INSERT-SELECT: 904,091
    Elapsed Mcs - SELECT-INTO: 872,087

    This shows that the differences are quite small with TABLOCK.

    Great comment. Thanks!

    I will rerun the test using SQL 2012 and publish those results also.

    ReplyDelete
  5. Very interesting results with SQL 2012. In both of these tests, SELECT INTO was actually slower.

    Records Inserted: 1,000,000
    Order: INSERT-SELECT first

    Recovery Model: Full
    Elapsed Mcs - INSERT-SELECT: 2,714,272
    Elapsed Mcs - SELECT-INTO: 3,032,303

    Recovery Model: Simple
    Elapsed Mcs - INSERT-SELECT: 1,383,138
    Elapsed Mcs - SELECT-INTO: 1,196,119

    ReplyDelete