I am currently suffering from an issue of
org.postgresql.util.PSQLException: ERROR: insert or update on table "high_level_attributes" violates foreign key constraint "fk_high_level_attributes_track_id__id"
2024-08-17T16:25:04.658340344Z Detail: Key (track_id)=(29132336-080a-444b-a397-20b98a1cc2ee) is not present in table "tracks".
“dependent data arrives faster than the source data and we get error”, and there is literally only one possibility of this.
val jsonElements: Sequence<Triple<...>> =
sequence {yield(Triple(...))}
I have trimmed out the entire code, but it basically boils down to yield
. I am consuming an input stream and yielding three elements from this sequence, there is two relations in this triple.
attribute.trackID -> track.id -> cluster.trackID
sequence yields the track, attribute, cluster
by Triple. consumer of this sequence adds to the to be inserted
list together. If sequence yields everything in order, then
for (chunk in jsonElements.chunked(1000)) {
for ((track,attr,mirex) in chunk) {
trackDataList.add(track)
highLevelAttributeDataList.addAll(attr)
mirexClusterDataList.add(mirex)
...
connection.batchInsertTracks(trackDataList)
trackDataList.clear()
commit()
connection.batchInsertHighLevelAttributes(highLevelAttributeDataList)
highLevelAttributeDataList.clear()
commit()
connection.batchInsertMirexClusters(mirexClusterDataList)
mirexClusterDataList.clear()
commit()
connection.createStatement().execute("SELECT 1")
}
}
}
}
must definitely success, because there is no other case of attribute data is produced faster than track data
, they are all produced and yielded together. Actually 25k or so elements are inserted before everything errors out.
So I am suspected of
sequence 1 yields A, B, C triple
sequence 2 yields D, E, F triple
producer gets the 2 chunks, mixes them up, now I have A,E,C triple
E is inserted to database
D is not inserted to database, as we have mixed A, E, C triple
E fails out
is it possible?