Start with vibes, not evals. For a completely novel AI capability, first throw stuff at the wall and see what works in an open-ended way. Evals become useful once you've converged on the form factor and know what use cases to test against.