Background
The Wikipedia page on Syntactic Sugar states:
In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language “sweeter” for humans to use: things can be expressed more clearly, more concisely, or in an alternative style that some may prefer.
I don’t really understand what the difference is between Syntactic Sugar and Syntax.
I appreciate the point that the sugary version can be clearer, more concise, perhaps boil off some boilerplate. But I feel at some level, all the syntax is essentially doing that to form an abstraction over what the code gets compiled down to.
From the same Wikipedia page:
Language processors, including compilers, static analyzers, and the like, often expand sugared constructs into more fundamental constructs before processing, a process sometimes called “desugaring”.
As a thought exercise, if I take “often” in this statement to mean “always”: If the difference were just whether the compiler “desugars” the syntax before moving to a next stage, how might a coder who does not know the innards of the compiler know (or care) what is Sugar’d Syntax or not?
A very much related question on this site “Rigorous Definition of Syntactic Sugar?” has an answer which commences:
IMHO I don’t think you can have a definition for syntactic sugar, because the phrase is BS and is likely to be used by people that talk about “real programmers” using “real tools” on “real operating systems”
Which might indicate to me that perhaps there isn’t a huge difference to the coder using the language?
Perhaps the difference is only perceptible to the compiler writer? Though there may be instances in which it’s helpful for the coder using the language to know what’s under the hood of the Syntactic Sugar? (But perhaps in reality any discourse on the subject tends to use the term as flame bait?)
The heart of the question
So… the short version of the question:
- Is there a real difference between Syntax and Syntactic Sugar?
- Who does it matter to?
Extra Food for Thought
Bonus on topic contradiction:
On the Wikipedia page an example is given:
For instance, in the C language the
a[i]
notation is syntactic sugar for*(a + i)
Whereas another answer on the above linked question talks about the same example:
Now consider
a[i] == *(a + i)
. Think about any C program that uses arrays in any substantive way.
And summarizes that:
The
[]
notation facilitates this abstraction. It’s not syntactic sugar.
The opposite conclusion for the same example!
4
The main difference is that syntax is grammar defined in a language to allow you to expose some functionality. As soon as you could get to that functionality, any other syntax that lets you do the same thing is considered sugar. That of course runs into odd scenarios about which of the two syntaxes is the sugar, especially since it’s not always clear which came first.
In practice, syntactic sugar is only used to describe syntax added to a language to facilitate ease of use, like making infix lhs + rhs
map to lhs.Add(rhs)
. I would consider C’s array indexing to be syntactic sugar.
It matters mostly because elegant designs tend to limit the amount of duplication. Needing (or at least wanting) syntactic sugar is seen by some as a sign of a design failing.
11
Syntax is what a language processor uses to understand what the constructs of a language mean. Constructs that are deemed to be syntactic sugar also have to be interpreted by the language processor and thus are part of a languages syntax.
That what sets syntactic sugar apart from the rest of the syntax of a language is that it would be possible to remove the syntactic sugar from the language without affecting the programs that can be written in the language.
To give a more formalistic definition, I would say
Syntactic sugar are those parts of a language’s syntax whose effects are defined in terms of other syntax constructs in the language.
This is in no way meant to denigrate syntactic sugar, or the languages that have it, because the use of syntactic sugar often leads to programs whose intent is more understandable.
13
The other answers haven’t mentioned a key concept: abstract syntax; without it, the term “syntactic sugar” doesn’t make any sense.
Abstract syntax defines the elements and structure of languages, and how phrases of that language can be combined to build bigger phrases. Abstract syntax is independent of concrete syntax. The term “syntactic sugar”, as I understand it, refers to concrete syntax.
In general, when designing a language, you’ll want to create concrete syntax for each term of your abstract syntax, so that people can write code in your language using plain text.
Now let’s say you create an awkward concrete syntax for foo. Users complain, and you implement a new concrete syntax to represent the same abstract syntax. The result is that your abstract syntax and semantics haven’t changed, but you now have two concrete syntaxes for the same abstract syntax term.
This, I believe, is what people mean when they say “syntactic sugar” — changes which only affect concrete syntax, but which do not affect abstract syntax or semantics.
And so the difference between “syntactic sugar” and “concrete syntax” is now clear. To me. 🙂
This interpretation also helps explain what Alan Perlis might have meant when he said “syntactic sugar causes cancer of the semicolon”: all the concrete syntactic sugar in the world can’t fix weak abstract syntax, and all the effort you expend adding that sugar is effort you aren’t spending dealing with the real problem — the abstract syntax.
I should also note that this is solely my opinion; I only believe it because it’s the only interpretation I can think of that makes sense to me.
11
Syntactic sugar is a subset of the languages syntax. The basic idea is that there is more than one way to say the same thing.
What makes it difficult to say which pieces are syntactic sugar and which pieces are “pure syntax” are statements like “it’s hard to say which form came first” or “it’s hard to know which way the author of the language intended” or “it’s somewhat arbitrary to decide which form is simpler”.
What makes it easy to decide which pieces are pure or sugary are to ask the question within the frame of a specific compiler or interpreter. The pure syntax is the stuff that a compiler converts directly to machine-code or that the interpreter directly responds to. The sugar is the stuff that first gets turned into some other syntax stuff before these direct things happen. Depending on the implementation, this may or may not be the same as what the author intended or even what the language spec claims.
In practice, this is the way that the reality of the matter is decided.
1
Really, your first quote from Wikipedia says it all “…makes things easier to read…”, “….sweeter for humans to use….”.
In writing, shortened forms such as “don’t” or “haven’t” could be considered syntactic sugar.
6
Usually syntax sugar is the part of language that can be expressed by existing part of language (syntax) without loss of generality but with possibly lossing clarity. Sometimes the compilers have explicit desugaring step which transforms the AST created by the source code and apply simple steps to remove nodes corresponding to sugar.
For example Haskell have syntax sugar for monads with following rules applied recursivly
do { f } ~> f
do { g; h } ~> g >> do h
do { x <- f; h } ~> f >>= x -> do h
do { let x = f; h } ~> let x = f in do h
Right now it doesn’t matter what it means exactly – however you can see that special syntax on LHS can be transformed into something more basic on RHS (namely function applications, lambdas and let
‘s). This steps allows to keep best of both worlds:
- The syntax on LHS is easier for programmer (syntax sugar) expressing the existing ideas in more clear manner
- However as the support in compiler for RHS constructs do exists already in compiler it doesn’t need to treat it as something special outside parser and desugaring (except for error reporting).
Similarly in C you can imagine desugaring rewrite rule (due to operator overloading etc. it is not true for C++):
f->h ~> (*f).h
f[h] ~> *(f + h)
You could imagine writing all programs without using ->
or []
in C that use this construction today. However it would be harder for programmers to use it hence provided syntax sugar (I guess in 70’s it might simplified work for compilers too). It is possibly less clear as you can technically add the following, perfectly valid, rewrite rule:
*f ~> f[0] -- * and [] have the same expressiveness
Is syntax sugar bad? Not necessarily – there is danger that it will be used as cargo cult without understanding deeper meaning. For example following functions are equivalent in Haskell yet many beginners would write the first form without understanding that they are overusing syntax sugar:
f1 = do x <- doSomething
return x
f2 = doSomething
In addition syntax sugar might overcomplicate the language or be too narrow to allow generalized idiomatic code. Also it might mean that language is not powerful enough to do certain things easily – it might be by design (don’t give developers sharp tools or very specific niche language that adding more powerful construct would hurt other goals) or by omission – the latter form gave the syntax sugar the bad name. If the language is powerful enough to use other constructs without adding syntax sugar it is considered more elegant to use those.
I think the most obvious example would be the “+=” syntax in C.
i = i + 1;
and
i += 1;
do exactly the same thing and compile to exactly the same set of machine instructions. The second form saves a couple of characters typing, but, more importantly makes it very clear that you are modifying a value based on its current value.
I was going to cite the “++” post/prefix operator as the canonical example but realized that this was more than syntactic sugar. There is no way to express the difference between ++i and i++ in a single expression using the i = i + 1
syntax.
1
Whatever the original connotation of the phrase was, nowadays it is primarily a pejorative, almost always phrased as “just” or “only” syntactic sugar. It pretty much only matters to programmers who like to do things the unreadable way and want a concise way to justify that to their colleagues. A definition by those who primarily use the term today, from their point of view, would be something like:
Syntax that is redundant with other more widely applicable syntax, in
order to provide a crutch for programmers who don’t really understand
the language.
That’s why you get two opposite conclusions for the same syntactic element. Your first example on array notation is using the original positive meaning of the term, something similar to Bart’s answer. Your second example is defending array notation against the charge of being syntactic sugar in the pejorative sense. In other words, it is arguing the syntax is a useful abstraction rather than a crutch.
6
First, I’ll address some of the other answers with a concrete example. The C++11 range-based for loop (much like foreach loops in various other languages)
for (auto value : container) {
do_something_with(value);
}
is exactly equivalent to (ie, a sugared version of)
for (auto iterator = begin(container); iterator != end(container); ++iterator) {
do_something_with(*iterator);
}
Now, despite adding no new abstract syntax or semantics to the language, it does have real utility.
The first version makes the intent (visiting every item in a container) explicit. It also prohibits unusual behaviour such as modifying the container during traversal, or further advancing iterator
in the loop body, or getting the loop conditions subtly wrong. This avoids possible sources of bugs and, in doing so, reduces the difficulty of reading and reasoning about the code.
For example, a one-character mistake in the second version:
for (auto iterator = begin(container); iterator <= end(container); ++iterator) {
do_something_with(*iterator);
}
gives a one-past-the-end error and undefined behaviour.
So, the sugared version is useful precisely because it’s more restrictive, and thus simpler to trust & understand.
Second, to the original question:
Is there a real difference between Syntax and Syntactic Sugar?
No, “syntactic sugar” is (concrete) language syntax, considered “sugar” because it doesn’t extend the abstract syntax or core functionality of the language. I like Matt Fenwick’s answer on this.
Who does it matter to?
It matters to users of the language, as much as any other syntax, and in that sugar is provided to support (and in some sense bless) specific idioms.
Finally, on the bonus question
The [] notation facilitates this abstraction.
this sounds a lot like the definition of syntactic sugar: it supports (and provides the language authors’ blessing for) using pointers as arrays. The p[i]
form isn’t really more restrictive than *(p+i)
, so the only difference is the clear communication of intent (and a slight readability gain).
2