I need to run something on Flink that does the following:
Take a bunch of audio data from Kafka and stream them to AWS transcription service.
Sounds easy, my problem is that the TranscribeStreamingAsyncClient gets data from a publisher. This publisher is an interface from org.reactstreams. And it’s really hard to get this publisher to work unless it’s publishing data from an InputStream.
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/examples-transcribe.html
I tried to get the publisher to work by reading audio byte input from a Buffer and even a Queue. But it always seem to be problematic (i.e I couldn’t keep the stream alive if there’s a break in the stream). Mean while, in their example, everything works flawlessly if the publisher is reading from an inputstream i.e the one provided by Java’s dataline API which reads from your microphone.
So my question is, is there a way to construct an inputstream using data coming off of Kafka and have it being functionally the same as getting data from your microphone?
Thanks