I am trying to use the pdf.js range request to load a huge pdf, but now I found the range request works but the first request load the whole pdf. From the official document is told that the range request will load the rest of part, it should stop the main reqeust.
Actually, PDF.js is doing just that. PDF is a complicated format; in
most of the cases, the vital data of a PDF document is located at the
end. Depending on browser support and on what web server returns the
HTTP Range Requests headers, PDF.js may automatically start using HTTP
Range Requests to fetch not-yet-loaded portions of a PDF needed for
rendering visible pages, so a document can be rendered without fully
loading it.
but it seems the first request load the whole pdf. from the answer https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range, the pdf.js will stop load the whole pdf and range fetch the rest content of the pdf. Am I missing something? this is the client pdf url setting:
"/tex/file/pdf/partial?proj_id=" + pid
and this is the server side code witch implement the partial request handle:
/**
* when the pdf become huge, loading the whole pdf everytime wasted too much resource
* this api provide partial pdf loading to improve the performance and save system resource
*/
pub async fn load_partial(
req: HttpRequest,
params: actix_web_validator::Query<PdfPartial>,
login_user_info: LoginUserInfo,
) -> impl Responder {
let pdf_info = get_proj_latest_pdf(¶ms.0.proj_id, &login_user_info.userId).await;
if let Err(err) = pdf_info {
return box_err_actix_rest_response(err);
}
let range_header = req.headers().get("Range");
let collar_query = CollarQueryParams {
project_id: params.0.proj_id.clone(),
user_id: login_user_info.userId,
};
let relation = get_collar_relation(&collar_query).await;
if relation.is_none() {
return box_err_actix_rest_response(InfraError::AccessResourceDenied);
}
if relation.unwrap()[0].collar_status == CollarStatus::Exit as i32 {
return box_err_actix_rest_response(InfraError::AccessResourceDenied);
}
return get_partial_pdf(&pdf_info.unwrap(), range_header);
}
and this is the get_partial_pdf
function look like:
pub fn get_partial_pdf(lastest_pdf: &LatestCompile, range: Option<&HeaderValue>) -> HttpResponse {
let proj_base_dir = get_proj_base_dir(&lastest_pdf.project_id);
let pdf_name = format!(
"{}{}",
get_filename_without_ext(&lastest_pdf.file_name),
".pdf"
);
let pdf_file_path = join_paths(&[proj_base_dir, pdf_name]);
if range.is_none() {
let mut file = File::open(pdf_file_path).expect("Failed to open file");
let mut buf = Vec::new();
file.read_to_end(&mut buf);
let metadata = file.metadata().expect("Failed to get metadata");
let file_size = metadata.len();
return HttpResponse::PartialContent()
.insert_header(CacheControl(vec![CacheDirective::NoCache]))
.append_header(("Accept-Ranges", "bytes"))
.append_header(("Content-Length", file_size))
.append_header((
"Access-Control-Expose-Headers",
"Accept-Ranges,Content-Range",
))
.content_type("application/pdf")
.body(buf);
}
let mut parts = range.unwrap().to_str().unwrap().split('-');
let start = parts.next().unwrap_or("0").parse::<u64>().unwrap_or(0);
let end = parts.next().unwrap_or("0").parse::<u64>().unwrap_or(0);
let mut file = File::open(pdf_file_path).expect("Failed to open file");
let metadata = file.metadata().expect("Failed to get metadata");
let file_size = metadata.len();
file.seek(SeekFrom::Start(start))
.expect("Failed to seek file");
let mut buf = vec![0; (end - start + 1) as usize];
file.take(end - start + 1)
.read_exact(&mut buf)
.expect("Failed to read file");
let content_range = format!("bytes {}-{}/{}", start, end, file_size);
return HttpResponse::PartialContent()
.insert_header(CacheControl(vec![CacheDirective::NoCache]))
.append_header(("Content-Range", content_range))
.append_header(("Accept-Ranges", "bytes"))
.append_header(("Content-Length", file_size))
.append_header((
"Access-Control-Expose-Headers",
"Accept-Ranges,Content-Range",
))
.content_type("application/pdf")
.body(buf);
}
the first time the pdf.js did not add range header, then the server return the whole pdf as partial, then the pdf.js found the server support partial request then stop the fisrt request and switch to 206 partial request with range header. everything works fine except the fisrt request did not stop and request the whole pdf.
Am I missing something? what should I do to fixed this issue? the initial idea is that I just told the client the pdf length and without actual file, then the pdf.js will switch to range request with minimal file transfer. but fianally I found the content length must be the same as actual content.
1