Maybe I miss something, but do the so called markup languages have the equivalent concept of semantics
that you can find in C or C++?
Judging from how you parse the language, you don’t really have too many options, there isn’t too much context, you just parse your tokens and apply them to the source code.
I’m tempted to say that not only semantics doesn’t exist in this case, but this only happens in markup languages: am I right?
7
Common markup languages and common programming languages are indistinguishable in that they have both syntax and semantics. They differ only in that the purpose of a programming language is to generate executable code for a processor, while the purpose of a markup language is to generate declarative data for a display engine.
Both have a lexical structure. They are made up of tokens like words and numbers and punctuation.
Both have a syntactical structure. The tokens can be arranged in many different ways, and some arrangements are valid while others are not.
Both have a semantic structure. The tokens have meaning, some being predefined and some user-defined, and the way that tokens are arranged has meaning.
If we take the following code fragment:
<div class="yolo">Hello world</div>
The tokens div, class and the angle brackets have pre-defined meanings. The token “yolo” is a string which provides a user-defined name for the div. The angle brackets separate the markup from the text. These are semantic observations.
For both, the parser phase of the compiler stops at the same point, when it has generated a suitable internal representation (such as an AST).
The next phase for a programming language is to generate code. The next phase for a markup language could be to generate a DOM. That phase is dependent on the the semantics, but not part of it.
5
A language is defined (a) by its syntax which includes the grammar and the lexical structure or vocabulary, and (b) the semantics, which is the meaning. In C, the snippet a + 1
has the meaning “add the contents of variable a
and the integer 1
”. In HTML, the snippet <p>foo bar</p>
has the meaning “a paragraph element containing the text foo bar
”.
So markup languages do have semantics (i.e. a meaning). However, semantics have nothing to do with parsing. During parsing, the input is broken up according to the lexical structure. These tokens are then arranged according to the grammar of the language. E.g. the English snippet you asked a question
might be arranged as
subject: 'you',
predicate: (verb: 'ask',
tense: past,
object: (article: indefinite,
noun: 'question'))
This structure helps figuring out the meaning, but it’s not the same as the actual meaning.
There can also be well-formed sentences that do not have valid semantics. For example, the above C example a + 1
makes no sense if a
holds a struct instead of a numeric type. In English, Colorless green ideas sleep furiously
would be an example of a well-formed, but nonsensical sentence.
10
Markup languages most definitely have syntax (specific sequences of characters that defines tokens) and semantics (meaning to be applied to parsed content). You can’t really have a language without both. You might possibly be able to argue about XML (semantics are really specific to a particular format, making each a DSL), but you can’t possibly argue about HTML.
What HTML lacks is EXPRESSION STATEMENTS or variables. Without adding in CSS there’s no way to make it do a calculation by itself. This means it’s not able to perform any calculations and is limited to giving semantics to content — it can neither create or modify content on it’s own.
Note that the way you typically add/modify html content on the fly is only possible because html has semantics — in particular, you can say: this bunch of text is a computer program in language x, run it (where X is generally javascript).
2