intel cpu bug found

red assassin · Post by **red assassin** » Mon, 8. Jan 18, 03:09

kohlrak wrote:So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.

How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?

Stack access is cached, sure, but you still ultimately have to keep the caches in sync with actual memory, so you pay for it sooner or later. The additional registers are used extensively, for reasons that should be obvious; you can keep more locals in registers at a time, copy fewer callee-save registers to the stack, and generally avoid a lot of shuffling of values around that's required on x86. For a specific example, 64 bit calling conventions use four (MS) or six (System V) registers for arguments, so most of the time you don't need to put arguments on the stack at all. It also makes the code smaller, which I know is your favourite optimisation - you generate a lot fewer mov instructions when you don't have to keep shuffling stuff to the stack and back.

If you're still convinced that 32 bit should be faster, I recommend you do some tests yourself, and let me know if you come up with any cases where there's a significant speed difference in favour of 32 bit.

kohlrak wrote:Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.

Right, but userland security is as much a feature of the operating system as the kernel security is - I use browser sandboxes as an example because they're an obvious attack vector and one that's received a lot of security attention in recent years, but the ability to communicate with other processes, make syscalls, and the entire concept of userland accounts and privileges is provided by the operating system APIs and kernel. Browser sandboxes are largely a case of making the right calls to the OS to drop privileges (though browser vendors have done a lot of work with OS vendors to improve what privileges they can restrict and so on).

kohlrak wrote:No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.

I mean, if you want your application not to be controlled by another application then you just... set privileges correctly so it can't be. Hence the whole discussion about browser sandboxes and the like. If there's something you don't want to be able to access your application, your application should be running at a different privilege level. If you can't run it at a different privilege level, you need to question whether your application can really be protected, for all the reasons I listed a few posts ago, regardless of any specific OS features.

kohlrak wrote:So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?

So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?

An access violation is just a page fault, which generates an interrupt that the kernel handles. It inspects page fault interrupts it receives and handles them appropriately - if it's a memory page it's paged out, it pages it back in; if it's an access violation in userland it passes it to the process' exception handlers or kills it; if it's an access violation in kernel it bugchecks; and so forth.
IRQL errors generally manifest either as page faults when you're above the IRQL that handles paging memory in or as priority inversions in the scheduler (the code that detects situations like this is a nightmare as you might imagine).
Memory pools and other kernel objects generally have various self-checks in them that bugcheck if they break.
You get the idea.

How do we know this is what happens - by reading Windows Internals and/or kernel debugging your drivers when they crash, would be my recommended methods. (The latter is more fun - Windows Internals is a bit dry.) Bugchecks do also report details about what error it was and why they happened, though drivers could fake all that if they really wanted to bugcheck themselves.

kohlrak wrote:Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.

Sure, although branch prediction will handle some of this for you, and a simple cache miss is a different situation from a large-scale cache invalidation. But if you're an ISR in kernel you shouldn't need to be doing that anyway - an ISR is expected to complete within a few microseconds (generally just handling the interrupt and setting up for more processing at a lower IRQL if it can't just be handled within that time), so context switching to the interrupt is already a significant chunk of the time it takes to service one. Add a bunch of extra context switches to get this into usermode and back and you've doubled how long every interrupt takes to be serviced.

kohlrak wrote:And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.

Are you claiming that you, alone of all the programmers in the world, are capable of writing completely bug-free code 100% of the time? Or are you claiming that you and some set of other programmers in the world are capable of writing completely bug-free code 100% of the time, but they all choose to hide their perfect output somewhere and leave it to all the inferior programmers to write all the software that everyone actually uses? I mean, a brief glance at the CVE database should convince you that people have found bugs in everything that has enough users to make it to the CVE database. I've seen a lot of code, and none of that was bug-free either. And I've run into enough bugs in my own code to be damn sure there's others I haven't found.

But hey, maybe all my professional security experience is wrong and we should just get you to write all the world's code.

(Although maybe not, as your "solution" to race conditions is spoken like somebody who has never had to debug a nontrivial race condition...)

kohlrak wrote:wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).

It doesn't let the application keep the data - you're entirely correct about how register aliasing works, but the point is that the data loaded into the aliased registers necessarily affects the caches, so when it unaliases them and bins all the data the cache state remains changed. At that point you can just leak the information with a cache timing attack.
Also, it doesn't do this upon page faulting - you can't page fault until you're sure you're actually executing the right bit of code. While in speculative execution, page faults are simply queued for handling upon resolution of whatever's put it in speculative mode, and in the meantime execution continues. If you don't end up going down that branch, the fault is just binned with the rest of the state, while if you do the fault is raised and state (minus, as mentioned, the cache) is rolled back to the instruction that faulted.

kohlrak wrote:Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.

Largely, there isn't one any more - like a lot of x86, they're historical cruft that's retained only for backwards compatibility. When the architecture was designed they seemed like a good idea; for a variety of reasons OSes that used them fell by the wayside and the remaining OSes settled on a two-ring model.
Ring 1 enjoyed a resurgence as a mechanism for implementing virtualisation for a while (the virtualised kernel is moved into ring 1), as an alternative to paravirtualisation, but that's fallen by the wayside too with the advent of proper hardware virtualisation features in x86. I suspect that's why it wasn't removed from long mode like segmentation was, though.

kohlrak · Post by **kohlrak** » Mon, 8. Jan 18, 06:55

red assassin wrote:
kohlrak wrote:So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.

How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?
Stack access is cached, sure, but you still ultimately have to keep the caches in sync with actual memory, so you pay for it sooner or later. The additional registers are used extensively, for reasons that should be obvious; you can keep more locals in registers at a time, copy fewer callee-save registers to the stack, and generally avoid a lot of shuffling of values around that's required on x86. For a specific example, 64 bit calling conventions use four (MS) or six (System V) registers for arguments, so most of the time you don't need to put arguments on the stack at all. It also makes the code smaller, which I know is your favourite optimisation - you generate a lot fewer mov instructions when you don't have to keep shuffling stuff to the stack and back.

How often do compilers actually use them, though? In theory, i totally agree, but in practice they get clobbered so they have to store everything on the stack anyway. And, because of this, i've seen GCC take variables off the stack and create the same thing on the local stack frame, even if they were read-only and never written to. I always thought the point of using HLLs was to let the compiler optimize out this behavior for you, but it seemed to fail in that regard. I assume, by now, they have fixed this. At least I hope they have. That said, that still doesn't mean they're using the extra registers, because if you make one call within that function, you have to assume those registers got clobbered, so it never gets to a point that they're actually needed.

If you're still convinced that 32 bit should be faster, I recommend you do some tests yourself, and let me know if you come up with any cases where there's a significant speed difference in favour of 32 bit.

I wish i could, but i'm not in that position right now (which is why i'm playing X2, and asked you to do a few checks for me).

kohlrak wrote:Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.
Right, but userland security is as much a feature of the operating system as the kernel security is - I use browser sandboxes as an example because they're an obvious attack vector and one that's received a lot of security attention in recent years, but the ability to communicate with other processes, make syscalls, and the entire concept of userland accounts and privileges is provided by the operating system APIs and kernel. Browser sandboxes are largely a case of making the right calls to the OS to drop privileges (though browser vendors have done a lot of work with OS vendors to improve what privileges they can restrict and so on).

They are useful and should be done, but they should not be relied upon.

kohlrak wrote:No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.
I mean, if you want your application not to be controlled by another application then you just... set privileges correctly so it can't be. Hence the whole discussion about browser sandboxes and the like. If there's something you don't want to be able to access your application, your application should be running at a different privilege level. If you can't run it at a different privilege level, you need to question whether your application can really be protected, for all the reasons I listed a few posts ago, regardless of any specific OS features.

That's the problem, though: the reasons you mentioned above are OS features. Just because they're shared by most OSes at this point doesn't mean they should be. I do hold current practices to fault.

kohlrak wrote:So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?

So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?
An access violation is just a page fault, which generates an interrupt that the kernel handles. It inspects page fault interrupts it receives and handles them appropriately - if it's a memory page it's paged out, it pages it back in; if it's an access violation in userland it passes it to the process' exception handlers or kills it; if it's an access violation in kernel it bugchecks; and so forth.

In that case, should not the OS restart the drivers/shutoff the hardware in question and reload the drivers? Why are GPUs crashing the OS? It's a known issue, so it is happening, it is entirely preventable, as far as I can tell. So if the drivers aren't calling the kebugcheck function, despite it actually being available to them, then why is it being called when it's preventable? The only driver you can't afford to let crash is the HDD driver, and even that can be prevented with careful planning of a copy in the kernel, since HDD drivers are low footprint.

IRQL errors generally manifest either as page faults when you're above the IRQL that handles paging memory in or as priority inversions in the scheduler (the code that detects situations like this is a nightmare as you might imagine).
Memory pools and other kernel objects generally have various self-checks in them that bugcheck if they break.
You get the idea.

So why are things other than the kernel crashing if they can all be restart without shutting down the kernel? If the kernel knows it's not it's own code that is making things go boom boom (hence the errors we are seeing in windows when we get a BSoD saying it's not the kernel), then why is it going down?

How do we know this is what happens - by reading Windows Internals and/or kernel debugging your drivers when they crash, would be my recommended methods. (The latter is more fun - Windows Internals is a bit dry.) Bugchecks do also report details about what error it was and why they happened, though drivers could fake all that if they really wanted to bugcheck themselves.

That's the big problem. There's no reason the drivers should be the ones bug checking. Microsoft, the company who single-handedly broke the javascript standards and got away with it, is unable to establish a proper standard for what drivers should be able to assume they can and cannot do? We're talking about a company that ex parte (that is, without the other company being there) sues companies, and they can't remove a simple "this driver is not windows certified" tag which is in their domain without dishonor? I find that very hard to believe, especially when windows has it's own register calling convention for 64bit that is different from everyone else's. They're a bit powerful.

kohlrak wrote:Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.
Sure, although branch prediction will handle some of this for you, and a simple cache miss is a different situation from a large-scale cache invalidation. But if you're an ISR in kernel you shouldn't need to be doing that anyway - an ISR is expected to complete within a few microseconds (generally just handling the interrupt and setting up for more processing at a lower IRQL if it can't just be handled within that time), so context switching to the interrupt is already a significant chunk of the time it takes to service one. Add a bunch of extra context switches to get this into usermode and back and you've doubled how long every interrupt takes to be serviced.

One shouldn't make a speed assumption about an ISR. IRQ handlers, yes, but not syscall or something. IRQs not in huge frequency to begin with. However, for this reason, and this is reasonable to pull off today, we see alot of ARM systems dedicating 1 core to the kernel. The other cores all have their own caches. While some would say that's wasteful, it's more redundancy than bloat. There was some talk before (and i still don't know why it was scrapped, or if they just never got around to it) where systems could have their dedicated cores dynamically allocated depending on what's eating up the time. I'm guessing the problem they had was figuring out how to actually determine if, when, and how these switches would take place. And, no, this wasn't being done with 32bit vs 64bit in mind, but, rather, GPU vs CPU modes. However, it should be little complicated to expand it.

kohlrak wrote:And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.
Are you claiming that you, alone of all the programmers in the world, are capable of writing completely bug-free code 100% of the time? Or are you claiming that you and some set of other programmers in the world are capable of writing completely bug-free code 100% of the time, but they all choose to hide their perfect output somewhere and leave it to all the inferior programmers to write all the software that everyone actually uses?

No, i'm saying that accountability has been going out the window. Programmers make mistakes, but so does everyone else. You need to accept mistakes are going to be made, and accept fault when it's your own, instead of pointing the finger and saying "It's his fault!" You improve by learning, and you learn by making mistakes. You want to have a decently secure wrapper for anything that's particularly vulnerable. User input, and especially internet input, you want to treat as volatile material that needs 10 bottles of rubbing alcohole poured on top and a match dropped on top. If you sanitize properly, making things secure should not be difficult. The problem is accountability, and, to be fair to devs, this often comes as a result of managers pushing some sort of deadline. To be fair, i understand deadlines, too, but we're seeing too many silly mistake on the market for it to be a matter of needing protection. The extra protection will fall under some sort of twist of Parkinson's law, which seems to be where the mentality comes from to keep pushing, only to be the cause of their own problems an inefficiencies. The more you nanny the coder with your security layers, the less they're going to worry about security when they have a manager breathing down their neck telling them it's more important to get the project out on time and to trust the security of the library, since it's advertising it's security. Why? Because the people making these decisions don't code. If people were less apathetic about security, the market would focus more on it, and code would be more secure. But, right now, the market of software is focused on saturation, so get your games, suits, and whatever out as fast as possible, and make it "resonably stable," "reasonably secure," etc. "We can work on bugs later."

I mean, a brief glance at the CVE database should convince you that people have found bugs in everything that has enough users to make it to the CVE database. I've seen a lot of code, and none of that was bug-free either. And I've run into enough bugs in my own code to be damn sure there's others I haven't found.

Oh, i make mistakes and make bugs, but if I screw up I own it. It's a learning experience and a chance to improve how i code when i see my mistakes come out like that.

But hey, maybe all my professional security experience is wrong and we should just get you to write all the world's code.

Or maybe you can accept that insecurity is going to be a thing, forever, and that a better understanding of it instead of trying to make it a magic black box is going to go far in preventing it, especially on newer systems. Programmers who write libraries also write programs and vice versa. Those mistakes are going to happen, one way or another. We have to accept that. Restricting capabilities upstream to nanny downstream is not going to solve the problem, but instead patch it, potentially making the issue worse down the road. Security is everybody's responsibility, but functionality should not be sacrificed so that someone else doesn't have to feel the need to consider security their responsibility.

(Although maybe not, as your "solution" to race conditions is spoken like somebody who has never had to debug a nontrivial race condition...)

If you're following KISS, it should be trivial, unless you're aware of a situation that i'm not. Tell me where i'm wrong, that checking return values and/or using mutexes doesn't work.

kohlrak wrote:wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).
It doesn't let the application keep the data - you're entirely correct about how register aliasing works, but the point is that the data loaded into the aliased registers necessarily affects the caches, so when it unaliases them and bins all the data the cache state remains changed. At that point you can just leak the information with a cache timing attack.

Fortunately, side-channel attacks aren't all that effective for most targets. It's mostly on certain types of cryptography and password checks, which are easier to fix. If your operations take equal time to execute regardless of the input data, the attack method becomes ineffective.

Also, it doesn't do this upon page faulting - you can't page fault until you're sure you're actually executing the right bit of code. While in speculative execution, page faults are simply queued for handling upon resolution of whatever's put it in speculative mode, and in the meantime execution continues. If you don't end up going down that branch, the fault is just binned with the rest of the state, while if you do the fault is raised and state (minus, as mentioned, the cache) is rolled back to the instruction that faulted.

Right, but these issues aren't enough on their own to grab the sensitive data. You still need a secondary attack vector. Thus, by this point, this already looks impractical for anything other than a targeted attack.

kohlrak wrote:Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.
Largely, there isn't one any more - like a lot of x86, they're historical cruft that's retained only for backwards compatibility. When the architecture was designed they seemed like a good idea; for a variety of reasons OSes that used them fell by the wayside and the remaining OSes settled on a two-ring model.
Ring 1 enjoyed a resurgence as a mechanism for implementing virtualisation for a while (the virtualised kernel is moved into ring 1), as an alternative to paravirtualisation, but that's fallen by the wayside too with the advent of proper hardware virtualisation features in x86. I suspect that's why it wasn't removed from long mode like segmentation was, though.

X86 is a mixed bag of history. Backwards compatibility is nice, but with it comes with all the problems. I remember the A20 gate issue like it was yesterday. What's funny is, next time it'll be A40. By then, i hope intel has learned it's lesson. Honestly, x86 could benefit from a reboot. ARM's starting to cream the x86 in certain markets, and not just from Jazelle.

red assassin · Post by **red assassin** » Mon, 8. Jan 18, 11:41

Okay, look. In this post alone you have claimed that:

Compilers probably don't use additional x64 registers, because they get clobbered sometimes, despite the trivially observable fact that they do and the result typically runs a little faster than comparable x86
That userland security is different from kernel security because userland security should not be relied upon, implying that kernel security is somehow perfect?
That "if I have access to the same data, I can do the same calculations and get the same result" is an OS security failing and not, say, a feature of reality
That I should explain again why NT chooses to bugcheck on any failure in kernel, presumably because you didn't like the reason the first few times
That [various conspiracy theories about Microsoft] therefore [something]
That syscalls are interrupts (we had that very discussion earlier in this thread)
That when you make a mistake, it's a learning experience, but that when anyone else makes a mistake it's just sloppy practises and failure of accountability (trust me, other people learn from their mistakes too, but a bug in released code is a bug in released code no matter how much you learnt from it)
That race conditions are trivially preventable, despite synchronisation being famously one of the hardest problems (guess what - sticking mutexes on everything defeats the point of threading in the first place, not to mention the risk of deadlocking)
That literally all the processor, OS, browser, etc developers and security researchers who've been doing all this work on Meltdown are wrong and that it's not usefully exploitable because side-channels are hard

There's a recurring theme in this entire thread, which is that you seem to believe that having written a toy OS one time (which is cool, don't get me wrong, but not unique) makes you an expert on modern hardware, OS design and security issues. When I point out that the real world is a little bit more complicated than this, that things change fast, and that yes, OS developers and security researchers do tend to know what they're talking about and don't just do things the way they do because they're terrible people, you tell me they're all wrong and you, personally, are right. However, you're admittedly too lazy to do any actual research, or even to read the things I give you, and you don't believe that my years of professional experience have taught me anything either, so when reality rears its ugly head you tell me it's wrong too.

I dunno man, I'm sure you're a talented programmer, but in the end you're asserting that everybody else is wrong and you're so right you don't even need to check. As you yourself wisely stated in the other thread, at some point if people don't want to learn there's no point trying any more. It's been fun, but I'm done. Why not put all that talent to use and build a properly secure OS or something?

Post by **Nanook** » Tue, 9. Jan 18, 00:00

Let's refrain from making this personal, or this thread is in danger of the dreaded 'CLICK'. OK?

RegisterMe · Post by **RegisterMe** » Wed, 10. Jan 18, 10:51

Might be of interest:-

https://cloudblogs.microsoft.com/micros ... s-systems/

Morkonan · Post by **Morkonan** » Sat, 13. Jan 18, 22:21

Related: https://arstechnica.com/information-tec ... -firmware/

A little bit more troubling than the "bypass password by rebooting with a cd in the drive" problem...

ie: Reboot, go to BIOS, reset the Intel AMT pass, since nobody bothers to change it from "admin" and enable remote management, /win

Requires physical access, so the only people that can take advantage of it is the hotel maids and the guy sitting next to you at the coffee shop when you take a quick bathroom break. They could do worse and still have a few hurdles to get through. Unless, I guess, you were on the local wifi and so were they, perhaps.

Firmware update, maybe? If so, incoming box bricking...

egosoft.com

egosoft.com

intel cpu bug found