Stack access is cached, sure, but you still ultimately have to keep the caches in sync with actual memory, so you pay for it sooner or later. The additional registers are used extensively, for reasons that should be obvious; you can keep more locals in registers at a time, copy fewer callee-save registers to the stack, and generally avoid a lot of shuffling of values around that's required on x86. For a specific example, 64 bit calling conventions use four (MS) or six (System V) registers for arguments, so most of the time you don't need to put arguments on the stack at all. It also makes the code smaller, which I know is your favourite optimisation - you generate a lot fewer mov instructions when you don't have to keep shuffling stuff to the stack and back.kohlrak wrote:So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.
How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?
If you're still convinced that 32 bit should be faster, I recommend you do some tests yourself, and let me know if you come up with any cases where there's a significant speed difference in favour of 32 bit.
Right, but userland security is as much a feature of the operating system as the kernel security is - I use browser sandboxes as an example because they're an obvious attack vector and one that's received a lot of security attention in recent years, but the ability to communicate with other processes, make syscalls, and the entire concept of userland accounts and privileges is provided by the operating system APIs and kernel. Browser sandboxes are largely a case of making the right calls to the OS to drop privileges (though browser vendors have done a lot of work with OS vendors to improve what privileges they can restrict and so on).kohlrak wrote:Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.
I mean, if you want your application not to be controlled by another application then you just... set privileges correctly so it can't be. Hence the whole discussion about browser sandboxes and the like. If there's something you don't want to be able to access your application, your application should be running at a different privilege level. If you can't run it at a different privilege level, you need to question whether your application can really be protected, for all the reasons I listed a few posts ago, regardless of any specific OS features.kohlrak wrote:No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.
An access violation is just a page fault, which generates an interrupt that the kernel handles. It inspects page fault interrupts it receives and handles them appropriately - if it's a memory page it's paged out, it pages it back in; if it's an access violation in userland it passes it to the process' exception handlers or kills it; if it's an access violation in kernel it bugchecks; and so forth.kohlrak wrote:So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?
So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?
IRQL errors generally manifest either as page faults when you're above the IRQL that handles paging memory in or as priority inversions in the scheduler (the code that detects situations like this is a nightmare as you might imagine).
Memory pools and other kernel objects generally have various self-checks in them that bugcheck if they break.
You get the idea.
How do we know this is what happens - by reading Windows Internals and/or kernel debugging your drivers when they crash, would be my recommended methods. (The latter is more fun - Windows Internals is a bit dry.) Bugchecks do also report details about what error it was and why they happened, though drivers could fake all that if they really wanted to bugcheck themselves.
Sure, although branch prediction will handle some of this for you, and a simple cache miss is a different situation from a large-scale cache invalidation. But if you're an ISR in kernel you shouldn't need to be doing that anyway - an ISR is expected to complete within a few microseconds (generally just handling the interrupt and setting up for more processing at a lower IRQL if it can't just be handled within that time), so context switching to the interrupt is already a significant chunk of the time it takes to service one. Add a bunch of extra context switches to get this into usermode and back and you've doubled how long every interrupt takes to be serviced.kohlrak wrote:Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.
Are you claiming that you, alone of all the programmers in the world, are capable of writing completely bug-free code 100% of the time? Or are you claiming that you and some set of other programmers in the world are capable of writing completely bug-free code 100% of the time, but they all choose to hide their perfect output somewhere and leave it to all the inferior programmers to write all the software that everyone actually uses? I mean, a brief glance at the CVE database should convince you that people have found bugs in everything that has enough users to make it to the CVE database. I've seen a lot of code, and none of that was bug-free either. And I've run into enough bugs in my own code to be damn sure there's others I haven't found.kohlrak wrote:And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.
But hey, maybe all my professional security experience is wrong and we should just get you to write all the world's code.
(Although maybe not, as your "solution" to race conditions is spoken like somebody who has never had to debug a nontrivial race condition...)
It doesn't let the application keep the data - you're entirely correct about how register aliasing works, but the point is that the data loaded into the aliased registers necessarily affects the caches, so when it unaliases them and bins all the data the cache state remains changed. At that point you can just leak the information with a cache timing attack.kohlrak wrote:wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).
Also, it doesn't do this upon page faulting - you can't page fault until you're sure you're actually executing the right bit of code. While in speculative execution, page faults are simply queued for handling upon resolution of whatever's put it in speculative mode, and in the meantime execution continues. If you don't end up going down that branch, the fault is just binned with the rest of the state, while if you do the fault is raised and state (minus, as mentioned, the cache) is rolled back to the instruction that faulted.
Largely, there isn't one any more - like a lot of x86, they're historical cruft that's retained only for backwards compatibility. When the architecture was designed they seemed like a good idea; for a variety of reasons OSes that used them fell by the wayside and the remaining OSes settled on a two-ring model.kohlrak wrote:Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.
Ring 1 enjoyed a resurgence as a mechanism for implementing virtualisation for a while (the virtualised kernel is moved into ring 1), as an alternative to paravirtualisation, but that's fallen by the wayside too with the advent of proper hardware virtualisation features in x86. I suspect that's why it wasn't removed from long mode like segmentation was, though.