I have question on delta vaccum with file inventory. According to this article, The standard Delta vacuum operates in three stages. The first stage involves performing a recursive listing of all the files under the Delta Lake table while eliminating certain hidden files and folders. In the second stage, the set of actively referenced files from the Delta log is joined with the file list collected from the first stage. Any files without matching entries in the actively referenced list are earmarked for deletion. Finally, in the third stage, the files identified for deletion are deleted in parallel, or on the Spark driver, based on whether parallel deletion is enabled.
My question is why we are going for recursive listing of all the files under the Delta Lake table in first stage? Why cant we get the list of files to remove from delta log itself, as its has the entries for files to removed?
Its just general question to understand the working of Delta vaccum with file inventory