Library/service for extracting information for Microsoft Onenote documents

https://stackoverflow.com/questions/8252360

07-03-2021
|

Question

Does there exist a PHP/Ruby library or a web-service that enables programmatic extraction of information from Microsoft Onenote documents?

The solution is to be implemented in a web application backend.

I am not looking for windows specific solutions. Also I am not looking for solutions that require users to download application extensions or installable softwares.

Solution

Here's a cross platform one-note parser. (.one -> .html) It's pretty primitive, but it's open source and may get you going

https://github.com/dropbox/onenote-parser in case that helps you parse the file format.

Feel free to use it (apache license)

OTHER TIPS

Easy solution

You could easily write your own extractor utility in C# using the Microsoft.Office.Interop.OneNote API.

You can find a detailed walkthrough in this msdn article, then you could access the content with a code similar to this:

using System;
using System.Linq;
using System.Xml.Linq;
using Microsoft.Office.Interop.OneNote;

class Program
{
  static void Main(string[] args)
  {
    var onenoteApp = new Application();

    string notebookXml;
    onenoteApp.GetHierarchy(null, HierarchyScope.hsPages, out notebookXml);

    var doc = XDocument.Parse(notebookXml);
    var ns = doc.Root.Name.Namespace;
    var pageNode = doc.Descendants(ns + "Page").Where(n => 
      n.Attribute("name").Value == "Test page").FirstOrDefault();
    if (pageNode != null)
    {
      string pageXml;
      onenoteApp.GetPageContent(pageNode.Attribute("ID").Value, out pageXml);
      Console.WriteLine(XDocument.Parse(pageXml));
    }
  }
}

You can read the api documentation here, which also contains a few examples.

Low level approach

In the case your environment does not allow to use this official library, then I don't know of a unix port, but an Office document is stored in XML format. You only need an XML parser to extract the information you need. Here you have the OneNote format specification. (there is a pdf link to the latest update at the top) You may then use the parser of your choice and create your little utility. My suggestion for ruby would be libxml.

I hope this suits your needs.

Best bet is to learn how to do XML parsing in PHP/Ruby and analyse OneNote documents to figure out how they're structured. Once you figure the .one files out, you can use PHP to extract the required information from it. Check this link out, might help you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow