Question

I've searched high and low to try and work out how to batch process pandoc.

How do I convert a folder and nested folders containing html files to markdown?

I'm using os x 10.6.8

Was it helpful?

Solution

You can apply any command across the files in a directory tree using find:

find . -name \*.md -type f -exec pandoc -o {}.txt {} \;

would run pandoc on all files with a .md suffix, creating a file with a .md.txt suffix. (You will need a wrapper script if you want to get a .txt suffix without the .md, or do ugly things with subshell invocations.) {} in any word from -exec to the terminating \; will be replaced by the filename.

OTHER TIPS

I made a bash script that would not work recursively, perhaps you could adapt it to your needs:

#!/bin/bash    
newFileSuffix=md # we will make all files into .md

for file in $(ls ~/Sites/filesToMd );
do
  filename=${file%.html} # remove suffix
  newname=$filename.$newFileSuffix # make the new filename
#  echo "$newname" # uncomment this line to test for your directory, before you break things
   pandoc ~/Sites/filesToMd/$file -o $newname # perform pandoc operation on the file,
                                                     # --output to newname


done
# pandoc Catharsis.html -o test

This builds upon the answer by geekosaur to avoid the .old.new extension and use just .new instead. Note that it runs silently, displaying no progress.

find -type f -name '*.docx' -exec bash -c 'pandoc -f docx -t gfm "$1" -o "${1%.docx}".md' - '{}' \;

After the conversion, when you're ready to delete the original format:

find -type f -name '*.docx' -delete
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top