JVM supports so many languages other than Java like Groovy,Clojure,Scala
etc which are functional languages unlike Java(I am referring to Java before Version 8 where Lambda's
are not supported) that doesn’t support functional capabilities.On a high level what makes the JVM so versatile that it can support both Object Oriented as well as Functional languages?
1
Compared to other VMs, the JVM actually isn’t particularly versatile. It directly supports statically typed OO. For everything else, you have to see what parts you can use, and how you can build everything else your language needs on top of those parts.
For example, until Java 7 introduced the invokedynamic
bytecode, it was very hard to implement a dynamically typed OO language on the JVM – you had to use complex workarounds that were bad for performance and resulted in horribly bloated stack traces.
And yet, a bunch of dynamic languages (Groovy, Jython, JRuby among others) were implemented on the JVM before that.
Not because the JVM is so versatile, but because it is so widespread, and because it has very mature, well-supported and high-performing implementations.
And, perhaps even more important, because there is a huge amount of Java code out there doing pretty much anything, and if your language runs on the JVM, you can easily offer facilities to integrate with that code. Basically, having your language run on the JVM is the 21st century version of offering interoperability with C.
1
The JVM was written to basically act like a CPU, there is a set of instructions, kind of like assembly, that the VM runs called bytecodes. If you can write a compiler that generates a valid set of bytecodes, then the JVM can run them.
Wikipedia has a list of the bytecodes:
http://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
as well as an explanation of how the JVM loads the byte codes:
http://en.wikipedia.org/wiki/Java_virtual_machine
By using the invoke style bytecodes, a functional language can execute code, regardless of what the source looks like. Also, with the addition of invokevirtual, language implementations like jruby have been giving some flexibility with how they run.
2
I’ll add that the JVM supports a well defined and pretty decent Memory Model (JMM) which means good support for consistent (albeit low level) threading behaviour. It also has a powerful Just In Time compiler (no more useful for dynamic languages thanks to MethodHandles and invokedynamic).
Last but not least is the JVM’s Garbage Collection sub-system which (with the right tuning) manages memory for you regardless of the language on top.
2
The key element in this is the separation of the compilation from the execution phase. By this it is possible to write other compilers compiling other languages to bytecode.
Bytecode there acts similar to machine code of a CPU – you have all the little operations needed to run a program – you can get a variable, do math on it, have conditional operations etc.
Java also isn’t special. In Java the existance of multiple languages wasn’t even a design goal, unlike other VMs. For Microsoft’s .Net CIL the ability to run multiple languages (C#, VB.Net, …) was a key design element, also the ParrotVM from the Perl6 project aimed to be a generic VM.
For the fun of it I once created a proof that even PHP’s Zend Engine would allow that.
And frankly this isn’t anything new – even on real hardware you can run multiple languages – i.e. C or Fortran.
The difference to this separation from compilation and execution are clssic interpreters, like some forms of Basic, shell scripts, etc. they often work in a way that they execute code more or less in a line by line way without bringing it in an immediate form in between.
The JVM is the first virtual machine I’m aware of which combined garbage collection, performance, and a workable sandbox model. The emergence of many languages to support the JVM is probably not so much a result of its “versatility”, but rather the fact that the Java language lacks some significant features that people want in a programming language. For example, while most machine languages have only half a dozen or so data types (e.g. byte, halfword, word, double-word, single-precision float, and double-precision float), the vast majority of programming languages allow code to use an arbitrary number of user-defined data types. The JVM recognizes a few primitive types similar to those on a typical machine, plus one more type: the Promiscuous Object Reference. The Java language likewise recognizes those primitives, and Promiscuous Object References. While a variable may be constrained not to hold references to anything that isn’t a particular class, the language makes no distinctions between any of the following kinds of field of type List<String>
that might be held by instance MyThing
class MyClass
:
-
A reference to something code knows to be an immutable implementation of
List<String>
-
A reference to an instance of a mutable list type which will never be exposed to anything that might mutate it.
-
A reference to a mutable list to which, except during the execution of
MyThings
‘s methods, no other reference could possibly exist anywhere in the universe. -
A reference to a mutable list which is owned by some other object, which that other object would like
MyThing
to use in some fashion. -
A reference to a mutable list which
MyThing
owns, but which it has also exposed to some other objects so they may do something with it.
Even though all of those fields could have type List<String>
, they hold very different things. An expressive language might allow a distinction among those meanings, but Java does not. Since a language could attach meaning to such things (at least outside generic contexts) and run on the JVM, that leaves a lot of room for JVM-targetted languages to express concepts which Java cannot.