Find duplicate records in oracle database
Identifying duplicate records is one of the most frequently asked questions in the ETL Testing interview.
In this article, we will learn 3 different methods to accomplish this task in the Oracle database. I will also explain all query elements for a clear understanding. If you want to do the same thing in the Microsoft SQLserver database, then check out this article.
Note: I am using Oracle XE 18c – HR data model for all examples.
Identify duplicate records in the table.
Method-1: (Using Group By)

1 2 3 4 5 | SELECT EMPLOYEE_ID, COUNT (*) FROM EMP GROUP BY EMPLOYEE_ID HAVING COUNT (*)>1 ; |
How the query is working:
- Select count of all EMPLOYEE_ID from table emp
- Group by clause will group the rows by values of EMPLOYEE_ID
- Having clause filter out the records and keep only duplicate rows (i.e. count > 1)
Result shows that there are 3 records for EMPLOYEE_ID=184 and 2 records for EMPLOYEE_ID=188.
If you want to use more multiple columns to find duplicate, just change the query as:
1 2 3 4 5 | SELECT EMPLOYEE_ID, column 2… , column-n, COUNT (*) FROM EMP GROUP BY EMPLOYEE_ID, column 2… , column-n HAVING COUNT (*)>1 ; |
Let’s see the complete record for better analysis.

1 2 3 4 5 6 7 8 9 10 | SELECT * FROM EMP WHERE EMPLOYEE_ID IN (SELECT EMPLOYEE_ID FROM ( SELECT EMPLOYEE_ID, COUNT (*) FROM EMP GROUP BY EMPLOYEE_ID HAVING COUNT (*)>1 )) ORDER BY EMPLOYEE_ID; |
How query is working:
- Select only the EMPLOYEE_ID by passing the original query as a sub-query.
- Select all columns from table EMP by passing EMPLOYEE_ID in WHERE clause as filter criteria.
We can clearly see in the screenshot that there are duplicate records.
Method-2: (Using Window function)

1 2 3 4 5 6 | SELECT E.* FROM ( SELECT EMPLOYEE_ID, COUNT(*) OVER (PARTITION BY EMPLOYEE_ID) AS CNT FROM EMP) E WHERE E.CNT > 1; |
How query is working:
- Inner query is using a window function of count aggregate function to calculate the number of rows per EMPLOYEE_ID.
- Outer query is filtering out records with count less than 1 i.e. duplicate records.
You can also modify this query to get all duplicate records and count of duplicate values in a single query.

1 2 3 4 5 6 | SELECT E.* FROM ( SELECT EP.*, COUNT(*) OVER (PARTITION BY EMPLOYEE_ID) AS CNT FROM EMP EP) E WHERE E.CNT > 1; |
Method 3: (Using ROW_NUMBER() Function)

1 2 3 4 5 6 | SELECT E.* FROM ( SELECT EMPLOYEE_ID, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY EMPLOYEE_ID) AS RNUM FROM EMP) E WHERE E.RNUM > 1; |
ROW_NUMBER() is an analytic function that will assign a unique number to each record in the partition or the result dataset. One important point is that it is mandatory to use ORDER_BY with this function to arrange the partition/result dataset in ascending or descending order.
How query is working:
- Inner query is assigning a unique number to the dataset partitioned on EMPLOYEE_ID.
- In the outer query, we are selecting records with a value greater than 1 because they are duplicate records.
This query can also be tuned to get duplicate dataset:

1 2 3 4 5 6 | SELECT E.* FROM ( SELECT EP.*, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY EMPLOYEE_ID) AS RNUM FROM EMP EP) E WHERE E.RNUM > 1; |