Theoretical evaluation of SQL query cost


I need to evaluate a SQL query in theory by analyzing the results and the number of lines looked upon. Any link to information online is welcomed as I couldn't find help about this issue anywhere.

What I learned in class

Suppose I have 2 tables,

  • A contains 100 tuples
  • B contains 500 tuples.

Notation : |A| is the number of tuples after the query {A} is the number of tuples examined to produce the query

I've been shown that

R = |A JOIN B| = 500 (we take the biggest of the two)
{A JOIN B} = 100 * 500 = 50 000 (we need to check every tuple)

R' = |R WHERE NO=1| = 5 (we suppose each no has 5 occurences)
{R'} = 500 (we had to loop through the 500)

R'' = |R'[name]| ~5
{R''} = 5

My question

  • A contains 50 tuples
  • B contains 100 tuples.

R = |A JOIN B|

My teacher says that {A JOIN B} cost 150, 100 to go through B and 50 to go through A. But why isn't this 50*100 like in the previous example?

He further says that a restriction applied after the joint costs 5000, the total number of lines being of 1500 in the joint table. Wouldn't the number of lines be 50*100?

Show source
| oracle   | performance   | mysql   | sql   | query-optimization   2017-04-16 23:04 1 Answers

Answers ( 1 )

  1. 2017-04-16 23:04

    The number of rows in the result is 50*100 = 5,000 if the result is a Cartesian product.

    That is, if there are no conditions on the join, then every row of A is joined to every row of B, and you get a result that includes every combination of rows.

    But if the join has some restriction, then you would typically get a much smaller set of rows. Your teacher has supposed an example where the number of combinations between the two tables that satisfy the condition gives a result of 1,500 rows.

    The 5,000 row result would be the greatest possible result. There could be other join conditions that give different results, even down to zero rows if no combination of rows from A and B satisfy the condition.

◀ Go back