It's simply that doing that many system calls takes that amount of time. Even though everything is hot in the cache, make still needs to call stat on every single file in the kernel source tree, to detect those files that have changed.
The suggestion in the comments to allow a Windows-style batched stat is a good idea. This means that more work is done per system call, reducing the number of calls and therefore the amount of time spent saving the CPU state to switch to kernel mode, and then restoring it again to switch back.
Linus says about page fault overhead, not system call. I understand this is a much different thing. On x86 page fault raises an interrupt. For a system call there is an instruction designed for fast control transfer to avoid costly interrupt
No they are not. Cutting context switches in system calls is one of the easiest ways of boosting throughput in your average unoptimized Linux app, in my experience. Exactly because so many developers go around ignoring the cost of system calls.
The suggestion in the comments to allow a Windows-style batched stat is a good idea. This means that more work is done per system call, reducing the number of calls and therefore the amount of time spent saving the CPU state to switch to kernel mode, and then restoring it again to switch back.