Current advances in LLMs and generative AI have huge impact on software development today and tomorrow. Today is paraphrased as the phase of AI as copilot. An impact on code quality, measured by usual metrics as code churn and violations against DRY principle, can be observed.
I believe tomorrow will be the autopilot phase, with more and more code being generated autonomously by AI. While I have an idea of how to ensure functional correctness and performance of the generated code using traditional methods like unit tests, integration tests and performance tests, I wonder:
How can possibly insecure and malicious generated code be run in a secure manner?
For the early stages of evaluation and test of the generated code, these are some ideas:
- Execution in an isolated sandbox
- Let different LLMs assess code security / maliciousness and perform a voting
- Perform static code analysis
- run code with security guards in place that disallow potentially malicious system calls
- Strict firewall rules with allow list for required external services, check for blocked traffic
What else can be added to that list?