I am trying to understand how disk reads are handled when performing query operations on B-trees and have TWO questions.
I have the following B-tree structure:
[10, 20, 30]
/ |
[1, 5, 7] [11, 15, 18] [21, 25, 28, 35]
Suppose I have a query: SELECT * FROM table WHERE id = 15.
From my understanding, the process involves:
- Start with the Root Node:
- The root node
[10, 20, 30]
is fetched from the disk to memory in a single read operation. - The processor compares the key 15 with the keys in the root node and determines that it should follow the second child pointer, leading to the node
[11, 15, 18]
.
- Fetch the Next Relevant Node:
- A second read operation fetches the node
[11, 15, 18]
from the disk to memory. - The processor then searches within this node to find the key
15
.
So, the process involves:
- First Disk Read: Fetch the root node
[10, 20, 30]
. - Processor Comparison: Determine which child node to access.
- Second Disk Read: Fetch the relevant child node
[11, 15, 18]
. - Processor Comparison: Locate the key 15 within the node
[11, 15, 18]
.
My FISRT question is: Does this description accurately represent how disk reads and processor comparisons are handled during a query on a B-tree? Is there anything I am missing or misunderstanding about this process?
Secondly, I have read the following quote in another Stack Overflow post Advantage of B+ trees over BSTs:
“A BST touches fewer memory locations on lookups than B trees, but the cost of those accesses is high because each access likely costs a cache miss.”
When converting the above B-tree to a BST, it would look something like this:
10
/
5 20
/ /
1 7 15 30
/
25 35
28
For a query SELECT * FROM table WHERE id = 15 in this BST, three nodes are accessed (10, 20, 15). My SECOND questions are:
- What does “Fewer Memory Locations” mean in this context?
- Why is the cost of accesses considered high due to cache misses?
- With a B-tree, two nodes are accessed (root and child node), but the quote implies that BST touches fewer memory locations. How can three node accesses in a BST be considered fewer memory locations than two node accesses in a B-tree?
Thank you for your help! (Any detailed explanations or references to documentation would be greatly appreciated!)