How exactly is an Abstract Syntax Tree created?

I think I understand the goal of an AST, and I’ve built a couple of tree structures before, but never an AST. I’m mostly confused because the nodes are text and not number, so I can’t think of a nice way to input a token/string as I’m parsing some code.

For example, when I looked at diagrams of AST’s, the variable and its value were leaf nodes to an equal sign. This makes perfect sense to me, but how would I go about implementing this? I guess I can do it case by case, so that when I stumble upon an “=” I use that as a node, and add the value parsed before the “=” as the leaf. It just seems wrong, because I’d probably have to make cases for tons and tons of things, depending on the syntax.

And then I came upon another problem, how is the tree traversed? Do I go all the way down the height, and go back up a node when I hit the bottom, and do the same for it’s neighbor?

I’ve seen tons of diagrams on ASTs, but I couldn’t find a fairly simple example of one in code, which would probably help.

4

The short answer is that you use stacks. This is a good example, but I’ll apply it to an AST.

FYI, this is Edsger Dijkstra’s Shunting-Yard Algorithm.

In this case, I will use an operator stack and an expression stack. Since numbers are considered expressions in most languages, I’ll use the expression stack to store them.

class ExprNode:
    char c
    ExprNode operand1
    ExprNode operand2

    ExprNode(char num):
        c = num
        operand1 = operand2 = nil

    Expr(char op, ExprNode e1, ExprNode e2):
        c = op
        operand1 = e1
        operand2 = e2

# Parser
ExprNode parse(string input):
    char c
    while (c = input.getNextChar()):
        if (c == '('):
            operatorStack.push(c)

        else if (c.isDigit()):
            exprStack.push(ExprNode(c))

        else if (c.isOperator()):
            while(operatorStack.top().precedence >= c.precedence):
                operator = operatorStack.pop()
                # Careful! The second operand was pushed last.
                e2 = exprStack.pop()
                e1 = exprStack.pop()
                exprStack.push(ExprNode(operator, e1, e2))

            operatorStack.push(c)

        else if (c == ')'):
            while (operatorStack.top() != '('):
                operator = operatorStack.pop()
                # Careful! The second operand was pushed last.
                e2 = exprStack.pop()
                e1 = exprStack.pop()
                exprStack.push(ExprNode(operator, e1, e2))

            # Pop the '(' off the operator stack.
            operatorStack.pop()

        else:
            error()
            return nil

    # There should only be one item on exprStack.
    # It's the root node, so we return it.
    return exprStack.pop()

(Please be nice about my code. I know it’s not robust; it’s just supposed to be pseudocode.)

Anyway, as you can see from the code, arbitrary expressions can be operands to other expressions. If you have the following input:

5 * 3 + (4 + 2 % 2 * 8)

the code I wrote would produce this AST:

     +
    / 
   /   
  *     +
 /    / 
5   3 4   *
         / 
        %   8
       / 
      2   2

And then when you want to produce the code for that AST, you do a Post Order Tree Traversal. When you visit a leaf node (with a number), you generate a constant because the compiler needs to know the operand values. When you visit a node with an operator, you generate the appropriate instruction from the operator. For example, the ‘+’ operator gives you an “add” instruction.

3

There is a significant difference between how an AST is typically depicted in test (a tree with numbers/variables at the leaf nodes and symbols at interior nodes) and how it is actually implemented.

The typical implementation of an AST (in an OO language) makes heavy use of polymorphism. The nodes in the AST are typically implemented with a variety of classes, all deriving from a common ASTNode class. For each syntactical construct in the language you are processing, there will be a class for representing that construct in the AST, such as ConstantNode (for constants, such as 0x10 or 42), VariableNode (for variable names), AssignmentNode (for assignment operations), ExpressionNode (for generic expressions), etc.
Each specific node type specifies if that node has children, how many and possibly of what type. A ConstantNode will typically have no children, an AssignmentNode will have two and a ExpressionBlockNode can have any number of children.

The AST gets built by the parser, who knows what construct it has just parsed, so it can construct the right kind of AST Node.

When traversing the AST, the polymorphism of the nodes comes really into play. The base ASTNode defines the operations that can be performed on the nodes, and each specific node type implements those operations in the specific way for that particular language construct.

Building the AST from the source text is “simply” parsing. How exactly it is done depends upon the parsed formal language and the implementation. You could use parser generators like menhir (for Ocaml), GNU bison with flex, or ANTLR etc etc. It is often done “manually” by coding some recursive descent parser (see this answer explaining why). The contextual aspect of parsing is often done elsewhere (symbol tables, attributes, ….).

However, in practice AST are much more complex than what you believe. For instance, in a compiler like GCC the AST keeps source location information and some typing information. Read about Generic Trees and GIMPLE in GCC and look inside its gcc/tree.def. Consider, if you want to process C or C++ or Fortran or Ada, writing your own GCC plugin. BTW, look also inside the obsolete GCC MELT (which I have designed & implemented), it is relevant to your question. Read the Dragon book. See also this draft report for references, and the RefPerSys project and the source code of many other open source projects (including Ocaml or Guile or SBCL or Jq) on github or gitlab as examples.

2

I know this question is 4+ years old but I feel I should add a more detailed answer.

Abstract Syntax Trees are created no differently from other trees; the more true statement in this case is that Syntax Tree nodes have a variadic amount of nodes AS NEEDED.

An example is binary expressions like 1 + 2
A simple expression like that would create a single root node holding a right and left node that holds the data about the numbers.
In C language, it’d look something like

struct ASTNode;
union SyntaxNode {
    int64_t         llVal;
    uint64_t        ullVal;
    struct {
        struct ASTNode *left, *right;
    } BinaryExpr;
};

enum SyntaxNodeType {
    AST_IntVal, AST_Add, AST_Sub, AST_Mul, AST_Div, AST_Mod,
};

struct ASTNode {
    union SyntaxNode *Data;
    enum SyntaxNodeType Type;
};

Your question was also how to traverse? Traversing in this case is called Visiting Nodes. Visiting each Node requires that you use each node type to determine how to evaluate each Syntax node’s data.

Here’s another example of that in C where I simply print the contents of each node:

void AST_PrintNode(const ASTNode *node)
{
    if( !node )
        return;

    char *opername = NULL;
    switch( node->Type ) {
        case AST_IntVal:
            printf("AST Integer Literal - %llin", node->Data->llVal);
            break;
        case AST_Add:
            if( !opername )
                opername = "+";
        case AST_Sub:
            if( !opername )
                opername = "-";
        case AST_Mul:
            if( !opername )
                opername = "*";
        case AST_Div:
            if( !opername )
                opername = "/";
        case AST_Mod:
            if( !opername )
                opername = "%";
            printf("AST Binary Expr - Oper: '%s' Left:'%p' | Right:'%p'n", opername, node->Data->BinaryExpr.left, node->Data->BinaryExpr.right);
            AST_PrintNode(node->Data->BinaryExpr.left); // NOTE: Recursively Visit each node.
            AST_PrintNode(node->Data->BinaryExpr.right);
            break;
    }
}

Notice how the function recursively visits each node according to what type of node we’re dealing with.

Let’s add a more complex example, an if statement construct! Recall that if statements can also have an optional else clause. Let’s add the if-else statement to our original node structure. Remember that if statements themselves can also have if statements, so a kind of recursion within our node system can occur. Else statements are optional so the elsestmt field can be NULL which the recursive visitor function can ignore.

struct ASTNode;
union SyntaxNode {
    int64_t         llVal;
    uint64_t        ullVal;
    struct {
        struct ASTNode *left, *right;
    } BinaryExpr;
    struct {
        struct ASTNode *expr, *stmt, *elsestmt;
    } IfStmt;
};

enum SyntaxNodeType {
    AST_IntVal, AST_Add, AST_Sub, AST_Mul, AST_Div, AST_Mod, AST_IfStmt, AST_ElseStmt, AST_Stmt
};

struct ASTNode {
    union SyntaxNode *Data;
    enum SyntaxNodeType Type;
};

back in our node visitor print function called AST_PrintNode, we can accommodate the if statement AST construct by adding this C code:

case AST_IfStmt:
    puts("AST If Statementn");
    AST_PrintNode(node->Data->IfStmt.expr);
    AST_PrintNode(node->Data->IfStmt.stmt);
    AST_PrintNode(node->Data->IfStmt.elsestmt);
    break;

As simple as that! In conclusion, the Syntax Tree is not much more than a tree of a tagged union of the tree and its data itself!

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa

How exactly is an Abstract Syntax Tree created?

I think I understand the goal of an AST, and I’ve built a couple of tree structures before, but never an AST. I’m mostly confused because the nodes are text and not number, so I can’t think of a nice way to input a token/string as I’m parsing some code.

For example, when I looked at diagrams of AST’s, the variable and its value were leaf nodes to an equal sign. This makes perfect sense to me, but how would I go about implementing this? I guess I can do it case by case, so that when I stumble upon an “=” I use that as a node, and add the value parsed before the “=” as the leaf. It just seems wrong, because I’d probably have to make cases for tons and tons of things, depending on the syntax.

And then I came upon another problem, how is the tree traversed? Do I go all the way down the height, and go back up a node when I hit the bottom, and do the same for it’s neighbor?

I’ve seen tons of diagrams on ASTs, but I couldn’t find a fairly simple example of one in code, which would probably help.

4

The short answer is that you use stacks. This is a good example, but I’ll apply it to an AST.

FYI, this is Edsger Dijkstra’s Shunting-Yard Algorithm.

In this case, I will use an operator stack and an expression stack. Since numbers are considered expressions in most languages, I’ll use the expression stack to store them.

class ExprNode:
    char c
    ExprNode operand1
    ExprNode operand2

    ExprNode(char num):
        c = num
        operand1 = operand2 = nil

    Expr(char op, ExprNode e1, ExprNode e2):
        c = op
        operand1 = e1
        operand2 = e2

# Parser
ExprNode parse(string input):
    char c
    while (c = input.getNextChar()):
        if (c == '('):
            operatorStack.push(c)

        else if (c.isDigit()):
            exprStack.push(ExprNode(c))

        else if (c.isOperator()):
            while(operatorStack.top().precedence >= c.precedence):
                operator = operatorStack.pop()
                # Careful! The second operand was pushed last.
                e2 = exprStack.pop()
                e1 = exprStack.pop()
                exprStack.push(ExprNode(operator, e1, e2))

            operatorStack.push(c)

        else if (c == ')'):
            while (operatorStack.top() != '('):
                operator = operatorStack.pop()
                # Careful! The second operand was pushed last.
                e2 = exprStack.pop()
                e1 = exprStack.pop()
                exprStack.push(ExprNode(operator, e1, e2))

            # Pop the '(' off the operator stack.
            operatorStack.pop()

        else:
            error()
            return nil

    # There should only be one item on exprStack.
    # It's the root node, so we return it.
    return exprStack.pop()

(Please be nice about my code. I know it’s not robust; it’s just supposed to be pseudocode.)

Anyway, as you can see from the code, arbitrary expressions can be operands to other expressions. If you have the following input:

5 * 3 + (4 + 2 % 2 * 8)

the code I wrote would produce this AST:

     +
    / 
   /   
  *     +
 /    / 
5   3 4   *
         / 
        %   8
       / 
      2   2

And then when you want to produce the code for that AST, you do a Post Order Tree Traversal. When you visit a leaf node (with a number), you generate a constant because the compiler needs to know the operand values. When you visit a node with an operator, you generate the appropriate instruction from the operator. For example, the ‘+’ operator gives you an “add” instruction.

3

There is a significant difference between how an AST is typically depicted in test (a tree with numbers/variables at the leaf nodes and symbols at interior nodes) and how it is actually implemented.

The typical implementation of an AST (in an OO language) makes heavy use of polymorphism. The nodes in the AST are typically implemented with a variety of classes, all deriving from a common ASTNode class. For each syntactical construct in the language you are processing, there will be a class for representing that construct in the AST, such as ConstantNode (for constants, such as 0x10 or 42), VariableNode (for variable names), AssignmentNode (for assignment operations), ExpressionNode (for generic expressions), etc.
Each specific node type specifies if that node has children, how many and possibly of what type. A ConstantNode will typically have no children, an AssignmentNode will have two and a ExpressionBlockNode can have any number of children.

The AST gets built by the parser, who knows what construct it has just parsed, so it can construct the right kind of AST Node.

When traversing the AST, the polymorphism of the nodes comes really into play. The base ASTNode defines the operations that can be performed on the nodes, and each specific node type implements those operations in the specific way for that particular language construct.

Building the AST from the source text is “simply” parsing. How exactly it is done depends upon the parsed formal language and the implementation. You could use parser generators like menhir (for Ocaml), GNU bison with flex, or ANTLR etc etc. It is often done “manually” by coding some recursive descent parser (see this answer explaining why). The contextual aspect of parsing is often done elsewhere (symbol tables, attributes, ….).

However, in practice AST are much more complex than what you believe. For instance, in a compiler like GCC the AST keeps source location information and some typing information. Read about Generic Trees and GIMPLE in GCC and look inside its gcc/tree.def. Consider, if you want to process C or C++ or Fortran or Ada, writing your own GCC plugin. BTW, look also inside the obsolete GCC MELT (which I have designed & implemented), it is relevant to your question. Read the Dragon book. See also this draft report for references, and the RefPerSys project and the source code of many other open source projects (including Ocaml or Guile or SBCL or Jq) on github or gitlab as examples.

2

I know this question is 4+ years old but I feel I should add a more detailed answer.

Abstract Syntax Trees are created no differently from other trees; the more true statement in this case is that Syntax Tree nodes have a variadic amount of nodes AS NEEDED.

An example is binary expressions like 1 + 2
A simple expression like that would create a single root node holding a right and left node that holds the data about the numbers.
In C language, it’d look something like

struct ASTNode;
union SyntaxNode {
    int64_t         llVal;
    uint64_t        ullVal;
    struct {
        struct ASTNode *left, *right;
    } BinaryExpr;
};

enum SyntaxNodeType {
    AST_IntVal, AST_Add, AST_Sub, AST_Mul, AST_Div, AST_Mod,
};

struct ASTNode {
    union SyntaxNode *Data;
    enum SyntaxNodeType Type;
};

Your question was also how to traverse? Traversing in this case is called Visiting Nodes. Visiting each Node requires that you use each node type to determine how to evaluate each Syntax node’s data.

Here’s another example of that in C where I simply print the contents of each node:

void AST_PrintNode(const ASTNode *node)
{
    if( !node )
        return;

    char *opername = NULL;
    switch( node->Type ) {
        case AST_IntVal:
            printf("AST Integer Literal - %llin", node->Data->llVal);
            break;
        case AST_Add:
            if( !opername )
                opername = "+";
        case AST_Sub:
            if( !opername )
                opername = "-";
        case AST_Mul:
            if( !opername )
                opername = "*";
        case AST_Div:
            if( !opername )
                opername = "/";
        case AST_Mod:
            if( !opername )
                opername = "%";
            printf("AST Binary Expr - Oper: '%s' Left:'%p' | Right:'%p'n", opername, node->Data->BinaryExpr.left, node->Data->BinaryExpr.right);
            AST_PrintNode(node->Data->BinaryExpr.left); // NOTE: Recursively Visit each node.
            AST_PrintNode(node->Data->BinaryExpr.right);
            break;
    }
}

Notice how the function recursively visits each node according to what type of node we’re dealing with.

Let’s add a more complex example, an if statement construct! Recall that if statements can also have an optional else clause. Let’s add the if-else statement to our original node structure. Remember that if statements themselves can also have if statements, so a kind of recursion within our node system can occur. Else statements are optional so the elsestmt field can be NULL which the recursive visitor function can ignore.

struct ASTNode;
union SyntaxNode {
    int64_t         llVal;
    uint64_t        ullVal;
    struct {
        struct ASTNode *left, *right;
    } BinaryExpr;
    struct {
        struct ASTNode *expr, *stmt, *elsestmt;
    } IfStmt;
};

enum SyntaxNodeType {
    AST_IntVal, AST_Add, AST_Sub, AST_Mul, AST_Div, AST_Mod, AST_IfStmt, AST_ElseStmt, AST_Stmt
};

struct ASTNode {
    union SyntaxNode *Data;
    enum SyntaxNodeType Type;
};

back in our node visitor print function called AST_PrintNode, we can accommodate the if statement AST construct by adding this C code:

case AST_IfStmt:
    puts("AST If Statementn");
    AST_PrintNode(node->Data->IfStmt.expr);
    AST_PrintNode(node->Data->IfStmt.stmt);
    AST_PrintNode(node->Data->IfStmt.elsestmt);
    break;

As simple as that! In conclusion, the Syntax Tree is not much more than a tree of a tagged union of the tree and its data itself!

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật