Search this blog

Monday, November 23, 2009

Select Distinct Record thru SSIS without using SQL Query

In SQL Server Integration services (SSIS), we can retrieve the distinct records from table without using SQL Query.

SSIS provided a Data Flow component, which is known as "Sort", helps to implement this operation.

The main purpose of this sort components are
1. Sorting Record (Order By operation in SQL) - Ascending and descending
2. Removing duplicate record (Distinct Operation in SQL)


Here I explained "distinct" operation in SSIS by simple example.


1. Place "Data Flow Task" in "Control Flow" Designer Page of SSIS
2. Double Click on this Control, it will take you to Data Flow Tab Page in Designer
3. Place some source control, configure with Data source and table on which you want to filter distinct data (unique). here I used OLE DB Source Editor, and configured with Sales.SalesOrderHeader Table from "Adventure Works2008" Database

4. I planned to retrieve distinct customer ID from this table. I placed Sort control and make an input from source control. Configure "Sort" as shown below.

5. check "Remove rows with duplicate sort rows", click Ok
6. Use some destination component and configure it. Here I used Data Reader Destination which will load on memory
7. Now if you execute this package, it will return unique data with ascending order.
 

 
Total Records: 31,465 rows
Distinct Records: 19,119 rows

4 comments:

  1. Thank you, it was really useful for me!

    ReplyDelete
    Replies
    1. I have a table with 200 columns and 2 billion records .In this case is it wise to use sort componet or select distinct cl1,....cl200 from table statement which one is performance wise better?

      Delete