The core iteration loop in the Funsearch paper is so simple:
generate => evaluate => best-shot => generate => ...
I imagine this is similar to how a lot of people ideate with LLMs:
(one difference being that funsearch uses a population model to avoid getting stuck at local optima)
As they call out, the biggest challenge is "how do you establish an effective evaluator?", especially for multi-step tasks. (How do we do credit assignment?)