Is it really necessary to differentiate between code and data ?
Is there any language where such differentiation is not there ?
2
No, it’s definitely not necessary. The Van Neumann architecture, which practically all computers today are based upon, pioneered the idea of a stored program, which is nothing more than code stored as data.
That’s at the hardware level though; at the software level, paradigms differ as to how closely data and code are intermingled (or, conversely, how sharply they need to be divided).
On one end of the spectrum, there’s imperative programming, especially the procedural flavors – think Pascal, C, FORTRAN, BASIC (actual BASIC, not VB.NET or anything like that). In these languages, data and code are conceptually separate; you can take pointers to function addresses and pass them around, but other than that, code is code, and data is data. If you want to treat data as code, you have to pull quite some tricks, e.g. injecting some assembler code to jump to an address that contains your data.
On the other end, you’ll find functional programming, where functions are data (in a super-hardcore FP idiom, functions are even the only kind of data you can have); but also logic programming (think Prolog), where the entire program is mostly just a data structure describing facts about the problem domain. And of course there’s the dynamic camp, which gives you techniques like monkey-patching, duck typing, introspection / reflection, etc., to have your code inspect and modify itself at run time. Lisp (any Lisp, really) is a particularly interesting candidate here, because it combines functional programming and dynamic features.
In between, there’s object-oriented programming; this idiom mainly inherits the imperative mindset of separate data and code, but at its core, it departs from it a bit by bundling related behavior (code) and state (data) into units (objects). Fast-forward to the point where the idiom has fully unraveled, and you see “design patterns” emerge, many of which pretty much replicate code-as-data features found in other idioms.
All that doesn’t really answer your question though: should code and data be separated? Well, the fact that even the most hardcore code-data-separators occasionally use code generators, lookup tables, and similar techniques that mix them up should tell you that being religious about such a separation is ridiculous. There are quite some advantages to treating them as separate realms, and often, it is the most straightforward approach. Often, but not always. In many cases, letting go of the separation can lead to cleaner code, less cruft, and more elegant solutions.
Take this example:
interface Fooable {
function run();
}
function foo(Fooable f) {
f.run();
}
// ...
myF = new Fooable() { function run() { print "Hello, world!"; } }
foo(myF);
vs.:
function foo(f) {
f();
}
myF = function () { print "Hello, world!"; }
foo(myF);
In any case, I consider it important to keep in mind that a separation of code and data is just one way to approach a problem, and that at the machine level, the distinction is questionable at best.
In Lisp dialects, the code and data are both represented as lists, and this facilitates manipulation of code as data, because it is data. But it is also useful to keep in mind the distinction: if you try to run some list that isn’t code, you will get an error, rather than useful output. Sometimes it is more useful to think of them as the same thing, sometimes it is more useful to think of them as different.
For example, a list of the integers from 1 to 5 would look like this in Lisp: (1 2 3 4 5)
. If you try to run it, you’ll get an error that complains that the number 1 is not a function. You could think of that list as a piece of broken code, but it would be more useful to think of it as data, not code. If you wrote (+ 2 3 4 5)
, then that is a function call of the ‘+’ function on the arguments 2, 3, 4, and 5. If you wrote it like this '(+ 2 3 4 5)
(note the single quote at the front), then it would be a piece of code that’s treated like data, and it would not be run just yet (although you could run it later).
The real issue that needs to be considered is “integrity of data”. Segregating code & data still leaves one open to corrupted data via buffer overruns.
Segregating code & data is a solution at one end of a spectrum, and very appropriate for languages (such as C) that enable indiscriminate changes to data (and, by extension, code if not segregated from the data). It’s an extreme and necessary solution to a fundamental problem in such languages. Please note that I’m in no way suggesting that C is flawed – it was designed with such a feature for the “low level” work for which it was intended. It was originally designed to be somewhere between assembly language and “high level languages”.
Yes. Properly segregating code and data is very important.
Code is written beforehand and executed, generally at a fairly high level of trust. Data comes from external sources, and not necessarily from a source that ought to be trusted. And the most destructive of hacks (such as buffer overflows and SQL injections) come from an untrustworthy user finding a way to cause the computer to treat data as code.
Some programming languages, mostly those influenced by Lisp, as World Engineer noted, like to conflate the two. This is almost like the language designer saying “here, you don’t need to write your own security holes; we’ve given you a free arbitrary code execution vulnerability as a fundamental language feature!”
4