What's the easiest way to build an F# compiler that runs on the JVM and generates Java bytecode?

Question 1

Another option that should probably be considered is to convert the .NET CLR byte code into JVM byte-code like http://www.ikvm.net does with JVM > CLR byte codes. Although this approach has been considered and dismissed by the fjord owner.

Getting buy-in from the top with option 1) and have the F# compiler team have pluggable backends that could emit Java bytecode sounds in theory like it would produce the most polished solution.

But if you look at other languages that have been ported to different platforms this is rarely the case. Most of the time it's been a rewrite from scratch. But this is also likely due to the original language team having no interest in supporting alternative platforms themselves and that the original host implementation might've not been able to support multiple backends and it's already too slow for this to be a feasible option to start with.

My hunch is a combination of re-writing from scratch and being able to do as much code sharing and automation as possible from the original implementation. E.g. if the test suites could be re-used for both implementations it would take a lot of the burden off the JVM port and go a long way in ensuring language parity.

Question 2

If I really had to do this, I would probably start with the #1 approach - add JVM backend to the existing compiler. But I would also try to argue for a different target VM.

Quotations are not very relevant - as an author of WebSharper I can assure you that while quotations can give you a nice F#-like language to program with, they are restrictive, and not optimized. I imagine that for potential JVM F# users the bar would be a lot higher - full language compatibility and comparable performance. This is very hard.

Take tail calls, for example. In WebSharper we apply heuristics to optimize some local tail calls to loops in JavaScript, but that is not enough - you cannot in general rely on TCO, as you do in general F# libraries. This is ok for WebSharper as our users do not expect to have full F#, but will not be ok for a JVM F# port. I believe most JVM implementations do not do TCO, so it will have to be implemented with some indirection, introducing a performance hit.

An bytecode re-compilation approach mentioned by @mythz sounds very attractive as it allows more than just porting F# - ideally it allows porting more .NET software to the JVM. I worked quite a bit with .NET bytecode analysis on an internal WebSharper 3.0 project - we are looking at the option of compiling .NET bytecode instead of F# quotations to JavaScript. But there are huge challenges there:

A lot of code in BCL is opaque (native) - and you cannot decompile it
The generics model is fairly complicated. I have implemented a JavaScript runtime that models class and method generics, instantiation, type generation, and basic reflection with some precision and reasonable performance. This was difficult enough in dynamic JavaScript with closures and is seems quite difficult to do in a performant way on the JVM - but maybe I just do not see a simple solution.
Value types create significant complications in the bytecode. I am yet to figure this one out for WebSharper 3.0. They cannot be ignored either, as they are used extensively by many libraries you would want ported.
Similarly, basic reflection is used in many real-world .NET libraries - and it is a nightmare to cross-compile in terms of both lots of native code and proper support for generics and value types.

Also, the bytecode approach does not remove the question on how to implement tail calls. AFAIK, Scala does not implement tailcalls. They have certainly the talent and the funding to do that - the fact that they do not, tells me a lot about how practical it is to do TCO on the JVM. For our .NET->JavaScript port I will probably go a similar route - no TCO guarantees unless you specifically ask for trampolining which will work but cost you an order of magnitude (or two) in performance.

Question 3

There is a project that compiles OCaml to the JVM, OCaml-Java: it's pretty complete and in particular can compile the OCaml's compiler (written in OCaml) sources. I'm not sure which aspects of the F# language you're interested in, but if you're mainly looking at getting a mature strict typed functional language to the JVM, that may be a good option.

Question 4

I suspect any approach would be a lot of work, but I think your first suggestion is the only one that would avoid introducing lots of additional incompatibilities and bugs. The compiler's pretty complex and there are a lot of corner cases around overload resolution, etc. (and the spec probably has gaps too), so it seems very unlikely that a new implementation would have consistently compatible semantics.

Question 5

Modify the existing F# compiler to emit Java bytecode and then compile the F# compiler with it? Use a JVM based ML compiler like Yeti to Bootstrap a minimal F# compiler on the JVM?

Porting the compiler shouldn't be that hard if it is written in F#.

I'd probably go the first way, because this is the only way one could hope to keep the new compiler in sync with the .net F# compiler.

Re-write the F# compiler from scratch in Java as the fjord project appears to be attempting?

This is certainly the least elegant approach, IMHO.

Something else?

When the compiler is done, you'll have 90% of the work left to do.

For example, not knowing much F#, but I assume it is easy to use any .NET libraries out there. That means, the basic problem is to port the .NET ecosystem, somehow.

Question 6

I was looking for something in similar lines, though it was more like a F# to Akka translator/compiler. As far as F# -> JVM is concerned, I came across two not quite production ready options:

  1. F# -> [Fjord][1] -> JVM.

  2. F# -> [Funscript][2] -> [Vert.X][3] -> JVM