Almost all coverage of the R1 in the press is just a reprint of Apple’s PR, so it just says that the
[…] R1 chip processes input from 12 cameras, five sensors, and six microphones to ensure that content feels like it is appearing right in front of the user’s eyes, in real time. R1 streams new images to the displays within 12 milliseconds
Or some simple variation on that. Obviously, 12 milliseconds is “fast”, but that’s not the point of the R1.
If “fast” was all that was required, Apple probably could have gotten a process on the M2 to average 12 millisecond display times, maybe plus or minus a few milliseconds. The point of the R1 is that it guarantees to do that processing in an exact amount of time. That’s what “real-time” in real time chips and operating systems means. It lets you express your requirements to run tasks in a given amount of real time and can guarantee it before runtime.
This is not the first time Apple tried to get “real-time” working for a single element of a device. In the original iPhone, scrolling with your finger was given such high priority that displaying the view was partially delayed. John Gruber described it well in his review from 2007:
Update: Real-time dragging is such a priority that if the iPhone can’t keep up and render what you’re dragging in real-time, it won’t even try, and you get a checkerboard pattern reminiscent of a transparent Photoshop layer until it catches up (typically, an instant later). I.e. iPhone prioritizes drag animation over the rendering of the contents; feel over appearance.
This was the compromise they had to make in 2007, but in 2023, with the same requirement to prioritize “feel”, they decided to keep appearance this time and trade-off price, complexity, and battery-life.