Introduction
Whenever we work with data, sometimes there are some possibilities of getting duplicate data, not only because of the duplicate entries but also because of the T-SQL statement. We may need to remove it as well. In this instance, most of us follow to implement the DISTINCT keyword in our T-SQL Statement. Applying DISTINCT is fine, which will remove the duplicates. But it will not improve the query performance. The query will consume more time for execution and assume what if we have more data?
This article will discuss the other possibilities for removing duplicate entries without using the DISTINCT keyword.
Removing Duplicates Using Windows Function
Using ROW_NUMBER(), we can remove the duplicates in the result set. The ROW_NUMBER() is a window function that assigns a sequential integer number to each row in the query’s result set.
ROW_NUMBER() Syntax
ROW_NUMBER() OVER (
[PARTITION BY expr1, expr2,...]
ORDER BY expr1 [ASC | DESC], expr2,...
)
We have to wrap the T-SQL statement as CTE, which helps the query to simplify
Syntax
;WITH CTE
AS
(
SELECT
Column1,
Column2,
Column3
ROW_NUMBER() OVER(PARTITION BY Column1 ORDER BY Column1 ) AS RowCount
FROM TableName
)
SELECT * FROM CTE WHERE RowCount= 1
Removing Duplicates Using GROUP BY
Another option is Removing the duplicates using GROUP BY Statement. The GROUP BY statement groups rows that have the same values into summary rows
Syntax
SELECT
Column1,
Column2,
Column3
FROM TableName
GROUP BY Column1, Column2, Column3;
Conclusion
In this article, we discussed removing the duplicate data without using DISTINCT, which will also improve the query performance. I hope you all found this useful. We will discuss more tips and tricks, and other query optimization techniques in an upcoming article. Please share your feedback in the comment section
Leave a Reply