I’m trying to create a basic Manim
animation that shows the autoregressive generation process for how language models produce text by predicting tokens from left to right.
Currently, I’m using a VGroup
which grows as each new token is “predicted”.
However, the baseline alignment of the tokens appears “ragged”.
How can I ensure that the tokens are positioned correctly, so that the text is aligned?
Below is my minimal example demonstrating the problem:
from manim import *
class textConcat(Scene):
def construct(self):
# List of tokens to predict
tokens = "The cat is hungry".split(" ")
# Initial token setup
text = Text(tokens.pop(0), font="Helvetica", font_size=20).set_color(BLUE)
text_group = (
VGroup(text).arrange(RIGHT, aligned_edge=DOWN).to_corner(UL, buff=1.0)
)
self.play(Write(text_group))
for token in tokens:
# machine outputs the next token to the right
prediction = Text(token, font="Helvetica", font_size=20).set_color(BLUE)
# Instantiate the prediction behind the machine
self.play(FadeIn(prediction), run_time=0.01)
# # Slide the prediction to the right of the machine
# self.play(prediction.animate.next_to(machine, RIGHT * 4), run_time=0.5)
# Create an arc path from the right of the machine to the left
start_point = prediction.get_left()
end_point = text_group.get_right() + RIGHT * 0.3
arc_path = ArcBetweenPoints(
start_point, end_point, angle=-PI
) # Negative for an arc underneath
# Animate the prediction following the arc path
self.play(MoveAlongPath(prediction, arc_path), run_time=1.0)
# Update the VGroup with the new prediction token # TODO: Fix alignment of baseline
text_group.add(prediction)
self.play(
prediction.animate.next_to(text_group[-2], RIGHT),
run_time=0.5,
)
# Finish animation
self.wait(2)
if __name__ == "__main__":
from manim import *
textConcat().render()
And the result:
I’ve tried capturing the y coordinate of the initial token and forcing the alignment accordingly, e.g.:
text_group.shift(UP * (text_group.get_y() - text_group[0].get_y()))
But this hasn’t helped.
Any pointers would be much appreciated!
Note, I’m using ManimCommunity v0.18.1
.
In manim text is really difficult to align because manim has no fixed height for text of a specific size. Instead, the height of the underlying SVG image is used (I assume). So in your case “The” has a different height and center than “hungry” because of “g” and “y” which are lower.
The only solution I can think of is to use a Tex() object and create a copy of your text (in variable token_copy) at the latter position
from manim import *
class textConcat(Scene):
def construct(self):
# List of tokens to predict
tokens = "The cat is hungry".split(" ")
# Initial token setup
token_copy = Tex(*tokens, arg_separator = ' ', font_size=30).set_color(RED).to_corner(UL, buff=1.0)
text = Tex(tokens.pop(0), font_size=30).set_color(BLUE)
text_group = (
text.move_to(token_copy[0].get_center())
)
self.play(Write(text_group))
for i in range(1,4):
# machine outputs the next token to the right
prediction = Tex(tokens[i-1], font_size=30).set_color(BLUE)
# Instantiate the prediction behind the machine
self.play(FadeIn(prediction), run_time=0.01)
# # Slide the prediction to the right of the machine
# self.play(prediction.animate.next_to(machine, RIGHT * 4), run_time=0.5)
# Create an arc path from the right of the machine to the left
start_point = prediction.get_left()
end_point = token_copy[i].get_center() #text_group.get_right() + RIGHT * 0.3
arc_path = ArcBetweenPoints(
start_point, end_point, angle=-PI
) # Negative for an arc underneath
# Animate the prediction following the arc path
self.play(MoveAlongPath(prediction, arc_path), run_time=1.0)
# Update the VGroup with the new prediction token # TODO: Fix alignment of baseline
text_group.add(prediction)
self.wait(0.5)
# Finish animation
self.wait(2)
The code (for example, the for-loop) could obv. be more optimized. But I just wanted to show the general idea.