Question

I have a large dump of data from an outlook email account that comes entirely in .msg files. A quick call to ubuntu's file method revealed that they were Composite Document File V2 Documents (whatever that means). I would really like to be able to read these files as plaintext. Is that possible at all?

Update: Turns out it wasn't totally possible to do what I wanted for large scale data mining on these kinds of files which was a bummer. In case you face the same issue I made a library to address this issue. https://github.com/Slater-Victoroff/msgReader

Documentation isn't great, but it's a pretty small library so it should be self explanatory.

Was it helpful?

Solution

I faced the same problem this morning. I didn't find any information on the file format but it was possible to extract the required information from the file using strings and grep:

strings -e l *.msg | grep pattern

The -e l (that's a small L) converts from UTF-16.

This will only work if you can grep the data you need from the file (i.e. all required lines contain a standard string or pattern).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top