Task
You are given a daily search log from TELUS Digital Search and need to clean duplicate search events. Write a PostgreSQL query that identifies duplicate rows for the same user_id, search_query, and calendar day, keeps the earliest event for each duplicate set, and removes the rest from the table.
Treat rows with the same user, same exact search text, and same date portion of searched_at as duplicates. Ignore rows where search_query is NULL or an empty string. Your result should show which rows would be deleted before the delete is executed.
Schema
| column | type | description |
|---|
| log_id | INT | Primary key for the search log row |
| user_id | INT | User who performed the search |
| search_query | VARCHAR(255) | Raw search text entered by the user |
| searched_at | TIMESTAMP | Time the search was performed |
| surface | VARCHAR(100) | TELUS Digital surface where the search happened |
Sample data
| log_id | user_id | search_query | searched_at | surface |
|---|
| 103 | 1 | iphone 15 | 2024-05-01 08:05:00 | TELUS Digital Search |
| 101 | 1 | iphone 15 | 2024-05-01 08:00:00 | TELUS Digital Search |
| 102 | 1 | iphone 15 | 2024-05-01 08:01:00 | TELUS Digital Search |
| 106 | 2 | roaming plans | 2024-05-01 09:05:00 | My TELUS |
| 104 | 2 | roaming plans | 2024-05-01 09:00:00 | My TELUS |
Expected output
| deleted_log_id | user_id | search_query | search_date |
|---|
| 102 | 1 | iphone 15 | 2024-05-01 |
| 103 | 1 | iphone 15 | 2024-05-01 |
| 106 | 2 | roaming plans | 2024-05-01 |