I’m trying to grab YouTube video captions using the youtube-captions-scraper package. Most of the time, it works perfectly, but sometimes it fails with the following error: Could not find captions for video: video-id
This error seems to occur randomly, even for videos where captions are available when checked manually.
const { getSubtitles } = require('youtube-captions-scraper');
const videoId = 'some-video-id';
getSubtitles({
videoID: videoId, // YouTube video ID
lang: 'en' // language code
})
.then(captions => {
console.log(captions);
})
.catch(err => {
console.error('Error:', err.message);
});
- The video definitely has captions (I verified them on the YouTube).
- The error does not always occur for the same video ID; sometimes the same video will work fine on subsequent attempts.
- I have tried this on different networks and systems, but the behavior is inconsistent.
Has anyone faced this issue or know why this happens? Could it be related to rate-limiting or API throttling by YouTube? Any advice on how to handle this more gracefully or avoid the error altogether would be appreciated.
3
This library contains one small function. You can see the code here.
The part that yields that error is this:
const data = await fetchData(
`https://youtube.com/watch?v=${videoID}`
);
// * ensure we have access to captions data
if (!data.includes('captionTracks'))
throw new Error(`Could not find captions for video: ${videoID}`);
This means the response is good, but the content is not what expected. It’s probably a defense against scrapers like a captcha test. You may have better results if you use a headless browser like puppeteer
to fetch the HTML. Also consider using different proxies for even better success rate.
Once you get html using puppeteer, or using fetch
with rotating proxies, you can then use following module which is a based on the source code of the library with an additional param: the HTML of the video’s page.
import he from 'he';
import axios from 'axios';
import {
find
} from 'lodash';
import striptags from 'striptags';
const fetchData =
typeof fetch === 'function' ?
async function fetchData(url) {
const response = await fetch(url);
return await response.text();
} :
async function fetchData(url) {
const {
data
} = await axios.get(url);
return data;
};
export async function getSubtitles({
videoID,
lang = 'en',
}: {
videoID: string,
lang: 'en' | 'de' | 'fr' | void,
}, html) {
const data = html || await fetchData(
`https://youtube.com/watch?v=${videoID}`
);
// * ensure we have access to captions data
if (!data.includes('captionTracks'))
throw new Error(`Could not find captions for video: ${videoID}`);
const regex = /"captionTracks":([.*?])/;
const [match] = regex.exec(data);
const {
captionTracks
} = JSON.parse(`{${match}}`);
const subtitle =
find(captionTracks, {
vssId: `.${lang}`,
}) ||
find(captionTracks, {
vssId: `a.${lang}`,
}) ||
find(captionTracks, ({
vssId
}) => vssId && vssId.match(`.${lang}`));
// * ensure we have found the correct subtitle lang
if (!subtitle || (subtitle && !subtitle.baseUrl))
throw new Error(`Could not find ${lang} captions for ${videoID}`);
const transcript = await fetchData(subtitle.baseUrl);
const lines = transcript
.replace('<?xml version="1.0" encoding="utf-8" ?><transcript>', '')
.replace('</transcript>', '')
.split('</text>')
.filter(line => line && line.trim())
.map(line => {
const startRegex = /start="([d.]+)"/;
const durRegex = /dur="([d.]+)"/;
const [, start] = startRegex.exec(line);
const [, dur] = durRegex.exec(line);
const htmlText = line
.replace(/<text.+>/, '')
.replace(/&/gi, '&')
.replace(/</?[^>]+(>|$)/g, '');
const decodedText = he.decode(htmlText);
const text = striptags(decodedText);
return {
start,
dur,
text,
};
});
return lines;
}
2