Don’t Make DeDuplicating Dastardly Difficult – A devious demystification of the dreaded duplicate

Jun 11, 2024

1 min read

0

72

0

Hey Lucas,

Yes boss?

Can you tell me which tables have duplicates in the Sales schema in the Adventures Works database by lunchtime?

But boss, it’s quarter to 12!

Lucky for you, you read crypticsql.com.

That sounds difficult right?

That’s a lot of aggregating to do!

We can do it in one query:

WITH BASE AS (

SELECT

CONCAT(

CONCAT('SELECT ',STRING_AGG(COLUMN_NAME, ', '))

, ', ROW_NUMBER () OVER (PARTITION BY ') as stmt_x, TABLE_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = 'Sales'AND LEFT(TABLE_NAME,1) != 'v'

GROUP BY TABLE_NAME

)

,

ORDINAL AS (

SELECT COLUMN_NAME, TABLE_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE 1=1 AND

TABLE_SCHEMA = 'Sales'AND LEFT(TABLE_NAME,1) != 'v'

AND ORDINAL_POSITION = 1

)

,

TWO AS (

SELECT COLUMN_NAME, TABLE_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE 1=1 AND

TABLE_SCHEMA = 'Sales'AND LEFT(TABLE_NAME,1) != 'v'

AND ORDINAL_POSITION = 2

)

SELECT

CONCAT(CONCAT('SELECT * FROM (',

CONCAT (stmt_x, o.column_name,' ORDER BY ', t.COLUMN_NAME,') AS RN FROM Sales.', B.TABLE_NAME)),') as A WHERE RN > 1')

FROM BASE as B INNER JOIN ORDINAL AS O

ON UPPER(B.TABLE_NAME) = UPPER(O.TABLE_NAME)

INNER JOIN TWO as T ON T.TABLE_NAME = B.TABLE_NAME

A bit of revision here:

Recursive Common Table expression used here:

We can see that the BASE set is aggregating the columns,

The ORDINAL set is returning the first column (the partition – in this case the primary column),

THE TWO set is selecting the order column condition

They are then joined together the table name, which is the aggregating column in all three sets.

Running this we can output the query:

It has generated the queries for us.

Some tables have duplicates. Some don't.

Mission accomplished.

Time for a kebab.

Don't forget to pay the lady.

Jun 11, 2024

1 min read

0

72

0

Don’t Make DeDuplicating Dastardly Difficult – A devious demystification of the dreaded duplicate

Related Posts

Finding the Pythagorean Triplet: Project Euler Problem 9

Finding the Largest Product in a Series: Project Euler Problem 8

Finding the 10,001st Prime Number - Project Euler Problem 7