Santhosh Aditya

A SQL scenario where you can avoid JOIN

November 30th,2023

I want to talk about this interesting small SQL Optimisation I have read in PlanetScale newsletter.

Let's say there are 2 tables. Users and posts and you want to get all the users who have written popular(Posts. viewCount > 90000) posts. Both of them are linked together by user_id.

You would normally write it like this

SELECT distinct user_id FROM Users

LEFT JOIN Posts

ON Users.user_id = Posts.user_id WHERE Posts.viewCount > 9000

Here, we are joining the two tables which will produce duplicate entries if a user has written multiple popular posts, filtering them and finally we are deduplicating them. That's a bit too much Instead, we can use SubQueries to improve performance and reduce Cognitive Overhead(IMO)

SELECT user_id FROM Users WHERE user_id IN ( SELECT user_id FROM Posts WHERE viewCount > 9000)

Here, we do not need to JOIN and de-duplicate. If we do not need the Posts Table data, it is kind of pointless to do a join and discard all of it.

TLDR: If you need to filter a table based on a related Table and you do not need any of the data in a related table, just do a subquery instead of a join