State-of-the-art visual understanding and physical world prediction; enables zero-shot robot planning.