SQLite is known for implementing its database as a single file, whereas other databases use multiple files. Likewise many applications condense their files / filesystem into one file that they read with offsets instead of accessing multiple different files on the operating system. An example is video games that pack their files into a big archive. In the case of read-only files (ie., no files are going to be modified) and that are zipped at one stage and only read from there on, I see the advantage being in condensed single file archives. This is because fewer system calls to the OS and searching for files to open. You keep a table of indices and just index into the same file stream, or mapped pointer in the case you’ve memory mapped the file.
However for pretty much any other case storing files as separate files on the Operating System is superior. This is because in the case of a single file-archive/virtual file system:
-
In the case of removing a file/document from the archive you end up with a gap (fragmentation) and the only way of closing that gap is by iterating over the ENTIRE archive.
-
In the case you want to add a file/document to the archive and you need to grow the buffer, if the metadata (indices and offsets) are contained within the same archive you have to recopy all the metadata to the new buffer. This depends, if the metadata is at the end of the file then it has to move if the buffer before it grows, and if the metadata is at the beginning then the buffer has to move when the metadata space needs to grow:
documents occupied in buffer | metadata
Whereas in the case that the metadata is stored separately in a separate file, you can just grow the buffer and add a new entry to the metadata in a separate file.
Basically what I’m saying is that outside of the very specific scenario where the archive is packed like a zip file archive and simply read-only, the advantage is in a single archive, because why not? But in pretty much every single other case separation separating the files into as many files as possible is superior. Am I wrong?
What happens in a SQLite database file that keeps on resizing to become bigger? Let’s say you have a 5GB file and you add something to it and it needs to grow? The fact that indices metadata is within the same file is only to its detriment, right? As it then needs to move.