Word processors (e.g. Microsoft Word) display documents to the user as styled text. The user can select a part of this text and apply styles to it, and edit the text.
The word processor must (I guess) have an internal representation of the document that indicates which bits of the text displayed to the user should be styled differently. But the user doesn’t interact with this internal representation. They interact with the text, by using the caret and selecting portions of it.
What data structure might the word processor use to efficiently map the text with which the user interacts to its internal model of the document?
8
In computer graphics, there are a couple common spatial subdivision data structures that can quickly tell you whether a particular point in “world” space is inside, on, or near an object in that world.
The ones I’ve worked with are all essentially trees of nested volumes. For example, a bounding volume hierarchy is a tree of nested volumes (like spheres or axis-aligned boxes). You test your point against the bounding volume. If the point is inside, then you recurse into that bounding volume. If the point is not inside, then you can ignore that entire branch of the tree and check the next bounding volume. At the leaves are the actual objects in your model world.
With some approaches, the tree is implicit, but the idea is the same. When you binary search through a sorted array, you’re doing a tree search even though there’s no data structure explicitly modeling the tree.
Two-dimensional versions of these structures can map mouse or caret positions in screen space to a point in document space (and sometimes vice versa). The spatial map is used not only locate the caret, but to optimize the rendering by know which parts don’t have to be “painted.”
The actual document model might be a tree that follows the logical structure of the document (chapters, containing sections, containing paragraphs, containing spans of style settings, containing strings of text, containing characters). Or it might just be a long sequence of characters.
Imagine the entire formatted representation of your document is like a tall bitmap image. What the user sees on the screen is a rectangular portion of that bitmap selected based on the scroll position and zoom settings, so it’s just a little arithmetic to convert an x, y mouse position to a (u, v) coordinate in our “bitmap space.”
Our spatial subdivision structure could be a tree of nested rectangles that carve up the bitmap space with leaf nodes that refer back to positions in document space. This would allow us to query with a (u, v) coordinate to get a document location in O(log n) time.
Unfortunately, it can be difficult to keep a spatial data structure up-to-date as the document is edited. If the user types a character on page 1 of the document, you’ll have to reformat the entire line. And maybe the paragraph. And that might affect the vertical position of everything else down the document.
We can slash the amount of bookkeeping necessary for these ripple effects by using parent-relative coordinates for the rectangles and document positions in our subdivision tree. An edit that adds or deletes a leaf node affects (at most) only the sizes of its ancestors and the offsets of its younger siblings. That’s O(log(n) + m) where n is the length of the document and m is the number of younger siblings.
As with many subdivision algorithms, once you drill deep enough, it’s often faster in practice to finish the computation with a linear algorithm over the remaining items than to recurse all the way to the conclusion. For example, maybe you’d use a subdivision structure that maps a (u, v) coordinate to a particular line of text and then scan the line to figure out the precise character position within that line.
There are an infinite number of ways to do it, but the basic concept isn’t that hard. Say you used html for your internal representation:
<strong>bold text here</strong> non-bold.
If your GUI widget tells you the cursor is after the 4th character, all you have to do is count to the 4th character, but when you come to something inside angle brackets, you don’t count it, but instead keep track of what formatting it is applying or unapplying.
Obviously, for a large document you don’t want to count characters from the beginning every time you press a key, so they use more efficient data structures, but the basic concept is the same. You have some sort of mapping between visible characters and their metadata.
1
I would like to preface this post by saying that I haven’t actually worked on any text editors yet. There may be some standard practice that I am unaware of…that being said, a tree doesn’t seem to make much sense here; what would you have for siblings? If you tried to use a tree, I think you would end up with a linked list, which does work well here.
What is it that you want from your data structure? I assume you will need to move the cursor around quickly, making copy / pasting fast, etc. With a linked list, you have have constant time insertions, so moving the cursor is as simple as removing a special cursor node __c
and inserting it one place forward.
(T) -> (h) -> (__c) -> (i) -> (s)
(T) -> (h) -> (i) -> (__c) -> (s)
It also works well because the majority of a users navigation will be “linear”; they probably move the cursor one space forwards or backwards at a time. The usual downside to linked lists is indexing, but this behavior lends itself well to an iterator.
You will always have cases where the user uses the mouse to click and move the cursor to a much earlier position, and you might want to change the node structure to contain more than just one letter or you will end up iterating through thousands of nodes to move the cursor, but this is the general idea. Maybe you want to make each node a “paragraph” node that points to another linked list of characters? You might want to make the primary node a “formatting” node; the choice is yours.
I assume it is using some kind of tree representation. Nodes of this tree would define styling , eg. all text under this node has this style and end leafs of this tree would be raw text. Then by editing the leaves, you edit the raw text, but formatting remains same. This also allows for nesting of styles.
Selecting the text usually means picking node where selection starts and node where selection ends, thus allowing you select different styled and formatted text. Then by applying formatting to the selection, you apply complex modification to this tree.
Displaying this tree then might result in complex visual document.
You might also check out LaTex, which is pretty much creation of rich visual documents and DTP from it’s internal representation. Because LaTex documents are by definition readable and processable by computers, it might also be used as internal representation of those documents.
1