In a class of MIT, an orator says that the “Shortest program that fits training data is the best possible generalisation” and that’s it’s not just philosophy (https://en.wikipedia.org/wiki/Occam’s_razor) but actually a mathematical fact backed by a formal (simple) proof.
What is the name of that mathematical theorem? Is this the “Minimum description length”? That’s what ChatGPT told me when I asked it the question but it doesn’t look like a proved mathematical theorem to me but rather something similar albeit slightly more formalised that the Occam’s razor.