I try to download an HTML page from the internet as Markdown and with all the linked images. Here is what I use currently for this:
pandoc http://foo.bar/baz.html -o baz.md --wrap=none --extract-media="./images"
The problem is that the images are saved with SHA1-based filenames, and even though I’ve read the user’s guide,
--extract-media=DIR
Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. Media are downloaded, read from the file system, or extracted from a binary container (e.g. docx), as needed. The original file paths are used if they are relative paths not containing
..
. Otherwise filenames are constructed from the SHA1 hash of the contents.
I still don’t understand how to preserve regular filenames. What exactly should I do to preserve regular filenames, like foo.png
, instead of having SHA1-based ones, like 6f10f22599031f605714f6b3f1153b5c12d96084.png
?