I have a table of about 1.3 billion rows that is in active use (millions of new entries every day, around the clock even though there are periods of lower activity). This database is running on a db.t4g.xlarge instance with gp3 storage (1500 IOPS / 500 MB/s throughput).
I need to add a new index to this table, but because it’s very large and the index is using three columns this process is several hours long. For the first three hours operations seem undisturbed and free storage space is slowly going down (about 34 GB consumed during this period), then it starts consuming a lot of space very quickly (about 70 GB in 15-20 minutes). This latter spurt causes the EBSByteBalance% to crash down to 0% and the database becoming unreachable.
I assume this is normal behaviour. Maybe first part it’s reading a lot from the table, doing some compute and writing to a temporary index/table, and the later part it’s committing the data and consolidating it with data that’s been received during the write process. But how can I make it not crash from this? Taking a long time to finish the operation is perfectly fine, but I need the database to be usable during this period.