I’ve been working on a Java application where I need to extract metadata from various media files, including video files like .avi. Initially, I was using FFmpeg to handle this, but due to security concerns and known vulnerabilities, I upgraded to the latest version of FFmpeg. Unfortunately, even the latest version still had vulnerabilities.
To address this, I switched to using JAVE2, a Java-based media processing library. However, I later discovered that JAVE2 relies on FFmpeg under the hood, which didn’t resolve my security concerns.
I then decided to use Apache Tika for extracting metadata. I have successfully implemented it and it works well for most formats. However, I’ve encountered a problem when trying to extract the duration of .avi files. The duration simply isn’t present in the metadata returned by Tika for these files.
Here’s an example of what I’ve implemented so far:
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
public class MediaMetadataExtractor {
public static Integer getDuration(String filePath) {
File source = new File(filePath);
try (InputStream input = Files.newInputStream(source.toPath())) {
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
parser.parse(input, new BodyContentHandler(), metadata, new ParseContext());
String durationStr = metadata.get("xmpDM:duration");
if (durationStr != null) {
long durationMs = Long.parseLong(durationStr);
return (int) (durationMs / 1000); // Convert milliseconds to seconds
}
} catch (IOException | SAXException | TikaException e) {
throw new RuntimeException("Error retrieving duration with Apache Tika", e);
}
return -1; // Unknown duration
}
public static void main(String[] args) {
String filePath = "path/to/your/video.avi";
Integer duration = getDuration(filePath);
System.out.println("Duration: " + duration + " seconds");
}
}
This code works well for other video formats like .mp4, but for .avi files, the duration is not available in the metadata extracted by Apache Tika. Below is an example of the metadata I get for an .avi file:
X-TIKA:Parsed-By -> org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By-Full-Set -> org.apache.tika.parser.DefaultParser
Content-Type -> video/x-msvideo
As you can see, there’s no duration information.
The filetypes I am working with are: jpeg, mp3, mp4, png, jpg, gif, mov, avi, bmp.
I am only facing this issue with .avi files, I can extract duration for .mov files, mp4 and mp3.
Is there a way to extract the duration of .avi files using Apache Tika?
If Tika cannot extract this information, is there an alternative approach I can use that doesn’t involve FFmpeg due to its vulnerabilities?
3