In SQL Server Integration services (SSIS), we can retrieve the distinct records from table without using SQL Query.
SSIS provided a Data Flow component, which is known as "Sort", helps to implement this operation.
The main purpose of this sort components are
1. Sorting Record (Order By operation in SQL) - Ascending and descending
2. Removing duplicate record (Distinct Operation in SQL)
Here I explained "distinct" operation in SSIS by simple example.
1. Place "Data Flow Task" in "Control Flow" Designer Page of SSIS
2. Double Click on this Control, it will take you to Data Flow Tab Page in Designer
3. Place some source control, configure with Data source and table on which you want to filter distinct data (unique). here I used OLE DB Source Editor, and configured with Sales.SalesOrderHeader Table from "Adventure Works2008" Database
4. I planned to retrieve distinct customer ID from this table. I placed Sort control and make an input from source control. Configure "Sort" as shown below.
5. check "Remove rows with duplicate sort rows", click Ok
6. Use some destination component and configure it. Here I used Data Reader Destination which will load on memory
7. Now if you execute this package, it will return unique data with ascending order.
Total Records: 31,465 rows
Distinct Records: 19,119 rows
Thank you, it was really useful for me!
ReplyDeleteI have a table with 200 columns and 2 billion records .In this case is it wise to use sort componet or select distinct cl1,....cl200 from table statement which one is performance wise better?
Deletefoi muito bom!
ReplyDeleteexplained nicely.thanks
ReplyDelete