I have searched and could not find any information about it. What is the best approach to storing revisions?
I have a website where the user can write a document which can be fairly long (200-300 lines). How do you determine when to make a revision?
Is it not a scalable solution to make a new one whenever the save, because that would be useless to the user when the want to look back, and it would require quite a lot of space.
You could use time and say for every 15 minute they are working on it there would be a revision, but that would sometimes be nothing or the whole document have completely changed.
I could make a diff from the previous revision, and compare by line and look at how many percent of the lines have been changed.
What are other doing revisions?
1
Wikis and similar generally create new revision whenever the user saves and many of them (including stack exchange) merge the revisions if the previous was less than 5 minutes ago.
Any decent backend will only store the changes and having them spread over more revisions usually does not add much actual data. Besides users can only write so fast and text does not take that much space (see Michael’s answer). So the scalability is unlikely an issue.
A standard widely used revision control system is probably best backend. All of them do the delta compression.
When the user saves check if he or she actually changed something before storing a revision.
Additionally you could possibly merge saves that are really close together, depending on your datamodel. Any saves that are more then for instance 10 minutes apart would get a revision, anything smaller would get merged with the previous revision.
Sometimes you see users getting an option to indicate if a save is major or minor. You could use this as input for revising also.
I suggest that modern devices may actually have the space to store the full versions. Space may have been an issue with 5MB Drives but with 256GB drives aplenty that may no longer apply.
So I would do the math, e.g. 5000 users * 100 edits a day * 300 characters = 150000000…. less than one fifth of 1 GB.