This sounds like it's just the cost to traverse the page table, right? ~300 cycl...

raverbashing · on May 1, 2014

This

4k for a page is ridiculous. I'd say it was ridiculous for something like 5 years ago already

4M may be too big (I'm thinking 512k could be a sweet spot)

(or it could just work in chunks - I believe it does something like that already, and get multiple pages at once)

MichaelGG · on May 1, 2014

512K pagesize? Wouldn't that add a ton of IO in a lot of scenarios? Like every 1 byte file would now require a 512K event? Large pages (2MB/1GB) is for specialised use where you know you're not going to be paging things in/out too often, right?

IIRC Linus was quite dismissive about having larger-than-4k page sizes as the default.

raverbashing · on May 1, 2014

as opposed as taking 256 pages by MB? And all the overhead of that

It's probably easier to make something special for small files than making the current system go faster

noselasd · on May 1, 2014

You don't need to couple memory pages with disk blocks.

justincormack · on May 1, 2014

Switch to PowerPC and get 64k pages by default in most distros!

rwmj · on May 1, 2014

.. and exposing lots of buggy userspace code into the bargain!

dmm · on May 1, 2014

What kind of bugs appear with bigger default page sizes?

rwmj · on May 1, 2014

Lots of userspace makes assumptions about page size being 4k and breaks when it changes. Try looking for:

https://www.google.co.uk/search?q="pagesize"+"64k"+"bug"

https://www.google.co.uk/search?q="pagesize"+"64k"+"issue"

Another common one is actually in the kernel where filesystem block sizes are limited to page sizes, so from this point of view large page sizes are better:

https://lwn.net/Articles/591690/

codys · on May 3, 2014

And an even slower page fault mechanism. PPC generally ends up relying on hugepages to try to match pagefault speeds on x86

nhaehnle · on May 1, 2014

The thing is, all the relevant page table data should already be in L1 caches when the page fault handler returns. The TLB miss on iret should not require raw memory lookups and should be much faster than 300 cycles.

Probably it's more that iret simply has always been slow (hence why the various syscall/sysenter extensions were created).