Converting ipython notebook files (JSON based) to other formats with pandoc

https://stackoverflow.com/questions/8782677

15-04-2021
|

Question

When attempting to use pandoc to convert JSON based files (.ipynb) from iPython notebook (0.12), I receive an error stating "bad decodeArgs" for the JSON. I suspect that it may be due to the Ubuntu provided version of pandoc that I am using (1.8.1.1). It seems that getting the latest pandoc version requires setting up the Haskell platform which I was not successful doing because of dependency challenges (and really don't want to). I don't want to spend any more time trying to install Haskell if this is not my problem.

Is there a way to get the latest pandoc binaries for Ubuntu without rebuilding it?

Given that iPython notebook is new (and very cool!!), it would be nice to hear about experiences related to translating the JSON to other formats. Perhaps there is a different way to accomplish this other than pandoc.

Solution

Regarding your "keeping up to date with Pandoc", I'm afraid you do need Haskell installed. The best way to do this via the Haskell Platform ("HP") package, and then just like with Ruby, it's a lot more consistent to use the environment's package manager for dependencies than your OS. I've had no trouble getting it working, even in Windoze. . .

I'm sure questions to the Haskell mailing list would result in quick help for a platform as mainstream as Debian/Ubuntu, but you might need to manually install a newer version of HP that what's available through the OS package manager.

Once you get HP up and running, the dev Pandoc is dead easy to compile, and git will keep you up to date with the latest - specific instructions here, currently maintained: https://github.com/jgm/pandoc/wiki/Installing-the-development-version-of-pandoc-1.9

Note that v1.9 has now been officially released if you really don't want to go to the trouble of keeping up to date with the dev cycle, but of course again you won't get it in your OS package manager for quite some time after that (I assume anyway).

========================== Regarding your attempts to treat JSON as a document syntax:

The best syntax inputs for Pandoc at this point are its native markdown+extensions, and reST (especially for Python people/environments), basically maintained as functionally equivalent, although there may be features available in the former that aren't represented in the latter, since John can just add extensions anytime he wants. AFAIK Pandoc hasn't begun to support the Sphinx extensions (yet?)

The JSON format used internally within Pandoc isn't documented (yet?) but it's the native Haskell data type. As the Thomas K notes, there may be some similarity between how the two tools represent data, but probably not enough to treat either as "just another markup format".

However, if you're working on this, it's easy enough to see what Pandoc looks for in the way of JSON input.

pandoc -t json

compare this to

pandoc -t native

and it's easy to see the specs created by Text.Pandoc.Definition and Text.JSON.Generic

Using Pandoc's internal data representation as input would obviously be more stable than a marked up text stream, and others have expressed a desire for documentation on this and it would be a great contribution to the community.

Please do inform the Pandoc mail list of any work done in this area. The crew there is very responsive, including getting quick feedback from John M (the lead developer) himself directly.

OTHER TIPS

I doubt pandoc or any other tool knows what to do with ipynb files yet (at the time of writing, the IPython notebook was released less than a month ago). JSON is just a generic data structure like XML, not a document format.

We're (IPython) working on tools to export notebooks to other formats, but they're not ready for a proper release yet. If you want to help develop that, see this mailing list thread. Hopefully it will be part of the next IPython release.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow