Changing <q> and </q> tags to " pairs in specific places
Question
I am using a toolchain to convert markdown to HMTL5 using Pandoc for insertion into WordPress's visual editor as HTML content.
When it comes to inserting images, WordPress puts what is called a shortcode
of the form
[caption id="attachment_100" align="aligncenter" width="300" caption="This is an image caption"]
into the HTML text. This is not really markdown but is so interpreted by Pandoc which translates each " ... "
pair into a <q> ... </q>
pair for HTML output. This does not work correctly in WordPress.
I need to prevent the conversion of the " ... "
but only those which occur within the well-defined [caption ... ]
square brackets which are exclusively put in by WordPress and cannot be confused with other content that I put in.
I do not know enough about the Pandoc API or Haskell to write an inline paseser/filter to exempt this text fragment from Pandoc processing. The advice I have received on the pandoc mailing list has gone above my head so far, given my lack of acquaintance with Pandoc and Haskell.
I thought of writing a Perl filter but have been strongly dissuaded from using regexps for very good reason.
I am asking here to find out if there is a robust way to make the reverse substitution from <q> ... </q>
tags to " ... "
only for the text within the [caption ... ]
block after it has been run through pandoc, as a post-processing step.
Can someone please suggest how I might go about this?
Many thanks.
Solution
Did you want something like this?
import Data.List
import System.IO
main = do
inh <- openFile "input.txt" ReadMode
outh <- openFile "output.txt" WriteMode
str <- hGetContents inh
hPutStrLn outh (outsideCaption str)
hClose inh
hClose outh
outsideCaption::String->String
outsideCaption [] = []
outsideCaption str@(x:xs)
| isPrefixOf "[caption" str = insideCaption str
| otherwise = x:outsideCaption xs
insideCaption::String->String
insideCaption [] = []
insideCaption (']':xs) = ']':outsideCaption xs
insideCaption str@(x:xs)
| (isPrefixOf "<q>" str) = '\"':insideCaption (drop 3 str)
| (isPrefixOf "</q>" str) = '\"':insideCaption (drop 4 str)
| otherwise = x :insideCaption xs
This piece of code reads a file named "input.txt", does the substitution you described and prints the result to "output.txt".
replacing the current main with:
main = interact outsideCaption
makes it read from stdin to stdout, example:
[rothesay]Ygfijj: echo "testing <q> [caption<q></q>]" | ./test
testing <q> [caption""]