How do I deterministically generate strings from a grammar to test a program?

I’m currently learning about fuzzing and testing and there’s a part that I’m not too sure how to do.
I am given grammars like this:

grammar = {
    "<start>": ["<product>;<quantity><separater><status>"],
    "<product>": ["book", "game", "child", "software"],
    "<quantity>": ["0", "1", "2", "3", "5", "10"],
    "<status>": ["pending", "processed", "shipped", "cancelled", "<status>"],
    "<separater>": [";", ""]
}

The strings that can be generated from this grammar will be used as input to test some basic programs (the programs only have one input() max

And given a value n, we want to generate n input strings that maximise the branch coverage.

I currently have this code to generate a string from the grammar:

import random
import time

def tokenise(production):
    """extract tokens from a production rule."""
    tokens = []
    buffer = ""
    in_token = False
    for char in production:
        if char == "<":
            if buffer:  # Flush buffer as a literal text
                tokens.append(buffer)
                buffer = ""
            in_token = True
            buffer += char
        elif char == ">":
            buffer += char
            tokens.append(buffer)
            buffer = ""
            in_token = False
        else:
            buffer += char
    if buffer:  # Catch any remaining literals after the last token
        tokens.append(buffer)
    return tokens

def expand(symbol, grammar, depth=0, max_depth=10):
    """Recursively expands a symbol using the provided grammar."""
    
    if depth > max_depth:
        return ''
    if symbol not in grammar:  # Base case: symbol is a terminal
        return symbol

    production = random.choice(grammar[symbol])

    # Split the production into tokens (both terminals and non-terminals)
    tokens = tokenise(production)
    
    # Recursively expand each token and concatenate the result
    expanded = ''.join(expand(token, grammar, depth+1, max_depth) for token in tokens)
    return expanded

However, the issue with this code is that it’s very inefficient because it uses random.choice() to generate a string, and I need to test if it increases branch coverage. If it does, I add it to my list of test input strings.

To make it more efficient, I need to read the program as a tree using AST, and then use it to determine which strings in the grammar you should generate to cover specific branches. You can reduce the running time of string production in your grammar by preserving the non-terminal productions you generate and using those to generate sets of strings instead of starting form the start state every time.”

But I’m not quite sure how I’d do that. I can identify all the branches using AST, but how would I decide what path I’d take through the grammar? and how do I track the paths I’ve already taken to make sure I take the next best path?

I read about symbolic execution, but Im not allowed to not use it because only allowed to use Python standard libraries in the final exam, so it’ll be better practise to only use standard libraries.

I have tried randomly generating the input but that is very inefficient.

New contributor

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 16:01

Thẻ: pythongrammarfuzzing

Thiết kế website giá rẻ

Danh mục

How do I deterministically generate strings from a grammar to test a program?