Question

I am using a toolchain to convert markdown to HMTL5 using Pandoc for insertion into WordPress's visual editor as HTML content.

When it comes to inserting images, WordPress puts what is called a shortcode of the form

[caption id="attachment_100" align="aligncenter" width="300" caption="This is an image caption"]

into the HTML text. This is not really markdown but is so interpreted by Pandoc which translates each " ... " pair into a <q> ... </q> pair for HTML output. This does not work correctly in WordPress.

I need to prevent the conversion of the " ... " but only those which occur within the well-defined [caption ... ] square brackets which are exclusively put in by WordPress and cannot be confused with other content that I put in.

I do not know enough about the Pandoc API or Haskell to write an inline paseser/filter to exempt this text fragment from Pandoc processing. The advice I have received on the pandoc mailing list has gone above my head so far, given my lack of acquaintance with Pandoc and Haskell.

I thought of writing a Perl filter but have been strongly dissuaded from using regexps for very good reason.

I am asking here to find out if there is a robust way to make the reverse substitution from <q> ... </q> tags to " ... " only for the text within the [caption ... ] block after it has been run through pandoc, as a post-processing step.

Can someone please suggest how I might go about this?

Many thanks.

Was it helpful?

Solution

Did you want something like this?

import Data.List
import System.IO

main = do
   inh  <- openFile "input.txt"  ReadMode
   outh <- openFile "output.txt" WriteMode
   str <- hGetContents inh
   hPutStrLn outh (outsideCaption str) 
   hClose inh
   hClose outh

outsideCaption::String->String
outsideCaption [] = []
outsideCaption str@(x:xs)
    | isPrefixOf "[caption" str = insideCaption str
    | otherwise                 = x:outsideCaption xs


insideCaption::String->String
insideCaption []       = []
insideCaption (']':xs) = ']':outsideCaption xs
insideCaption str@(x:xs)
    | (isPrefixOf "<q>"  str) = '\"':insideCaption (drop 3 str)
    | (isPrefixOf "</q>" str) = '\"':insideCaption (drop 4 str)
    |  otherwise              = x   :insideCaption         xs

This piece of code reads a file named "input.txt", does the substitution you described and prints the result to "output.txt".

replacing the current main with:

main = interact outsideCaption 

makes it read from stdin to stdout, example:

[rothesay]Ygfijj: echo "testing <q> [caption<q></q>]" | ./test 
testing <q> [caption""] 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top