Logic of EXISTS
EXISTSis a frequently used SQL function.
EXISTS will return the boolean value whether the condition is satisfied or not.
For example, suppose we have the following tables:
| ID | NAME |
|---|---|
| 1 | A1 |
| 2 | A2 |
| 3 | A3 |
| ID | UID | NAME |
|---|---|---|
| 1 | 1 | B1 |
| 2 | 2 | B2 |
| 3 | 2 | B3 |
SELECT ID, NAME FROM A EXISTS (SELECT * FROM B WHERE A.ID=B.UID)
the output is:
| ID | NAME |
|---|---|
| 1 | A1 |
| 2 | A2 |
As we can see that the function is excuted by rows. It only return those rows satisfied the condition.
EXISTS will not save the result of the query result of the condition because it doesnot matter. It only return the boolean result according to whther the query is empty or not. The logic of EXISTS looks like the following codes:
result = []
for i in range(len(A)):
if exists(A[i].ID):
result.append(A[i])
return result
From the above codes we will find that the EXISTS is preferred when table B is larger than table A.
Compare with IN
We still use the previous table. But we use IN instead.
SELECT ID, NAME FROM A IN (SELECT UID FROM B)
It is equivelant to the following codes:
result = []
for i in range(len(A)):
for j in range(len(B)):
if A[i].ID == B[i].UID:
result.append(A[i])
break
return result
The function IN will only be executed once, but it will traversal all elements in A and B, which may takes $O(mn)$.
A small example
Suppose we have the following table:
| Amount | Product | Price | Date |
|---|---|---|---|
| 1 | Rice Ball | 1.99 | 2019-03-22 |
| 1 | Soy Drink | 3.69 | 2019-03-23 |
| 1 | Pork Sung | 13.99 | 2019-03-24 |
| 2 | Hot Chili | 4.99 | 2019-03-21 |
| 2 | Mushroom | 0.59 | 2019-03-22 |
We want to find the data with newest date for each amount.
SELECT Amount, Product, Price, Date
FROM table t
WHERE (NOT EXISTS
(SELECT Amount, Product, Price, Date FROM table
WHERE Amount = t.Amount AND Date > t.Date))
| Amount | Product | Price | Date |
|---|---|---|---|
| 1 | Pork Sung | 13.99 | 2019-03-24 |
| 2 | Mushroom | 0.59 | 2019-03-22 |
If we use DISTINCT, we will still get the raw table, because the DISTINCT will apply on all selected columns.
SELECT DISTINCT Amount, Product, Price, Date FROM table
| Amount | Product | Price | Date |
|---|---|---|---|
| 1 | Rice Ball | 1.99 | 2019-03-22 |
| 1 | Soy Drink | 3.69 | 2019-03-23 |
| 1 | Pork Sung | 13.99 | 2019-03-24 |
| 2 | Hot Chili | 4.99 | 2019-03-21 |
| 2 | Mushroom | 0.59 | 2019-03-22 |