intel cpu bug found

Anything not relating to the X-Universe games (general tech talk, other games...) belongs here. Please read the rules before posting.

Moderator: Moderators for English X Forum

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Fri, 5. Jan 18, 19:01

pjknibbs wrote:
kohlrak wrote:
Tracker001 wrote:Looks Like Any slow down that may occur will be application spacific .
Well, there's 2 ways of doing this. See, and this is another reason why i'm faulting the OS devs, is that rings 1 and 2 probably won't get the same scrubbing
As far as I know there isn't a single modern OS that actually uses Ring 1 or Ring 2. I think OS/2 made use of Ring 2 for something or other, but Windows has always been Ring 3 (user code) and Ring 0 (kernel). In versions of Windows NT prior to 4.0 all the drivers actually ran under a user-mode process to ensure that a buggy driver wouldn't take down the kernel--in practice, the system would reboot if the said user-mode process died because that would effectively prevent the GUI or I/O working, so they just bit the bullet and moved the drivers into kernel space starting with NT 4.0.
I thought it used ring 2 for services and ring 1 for drivers. In that case, where are drivers running now? Please tell me it's not what i think it is... It would be all too predictable for microsoft. Between some of the flaws MS has been well aware of and never bothering to patch until way later (like a double free caused by message boxes), I still think drivers having access to the KeBugCheck function has to be the absolute most destructive of them all. Not only is it straight up intentional, but it's the direct cause of why everything crashes so much.
Morkonan wrote:
kohlrak wrote:You ask a bit much from people. On the other hand, I say people deserve the trouble they get into online...
People should have the same healthy respect for their privacy and security online that prompts them to put on their seat-belts before driving off in their car.

BUT, it's just not possible to expect a "user" to be a coder/OS specialist. It can't be expected anymore than we can expect everyone to be a surgeon capable of performing exploratory surgery on themselves. So, a compromise has to be reached. Thus, some broad, easy to understand, steps for users to take is the start of empowering users, even if it's not perfect.
No, but it seems most people don't know how to operate the turn signal, either. You can always click "no," when it asks for admin permissions. You can say, "No, i don't agree to these terms of service." Though, with that one, there's a whole other debate that needs to be had.
Nobody "deserves" disaster. People don't "deserve" to be victims of a crime. If you honestly think that someone deserves to have their life ruined because they visited the wrong site when trying to find a cat gif... Seriously? Some granny deserves to have her computer hit by ransomware or her credit cards stolen just because she was trying to buy her grandkid a birthday gift?
Yet do not people who leave their wallets and purses in plain view in a car with the doors unlocked deserve to have their credit card numbers and/or money stolen? There comes a point when you're pretty much begging for trouble. When users turn a blind eye to evidence that something is spying on them and might be malicious, they're asking for trouble. Doors have locks for a reason. Permissions prompts exist for a reason.
I'm fully in favor of protecting users from themselves, up to a point. Eventually, though, there has to be a way to continue the functionality that everyone wants coupled with the security that everyone needs.
That is correct: security should not come at the cost of freedom. This applies outside of technology, as well. "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." ~ Benjamin Franklin

The more you nanny people, the lower the bar goes, the less people will learn, and the more you will need to nanny them. Just look at the welfare state in america. A long time ago, churches did that, and they held people accountable. Now days, you need a "caution: hot" label on a steaming cup of joe. Anti-viruses are real fun: "This is a hack too." "It's a port scanner, Norton, leave it alone. Here, i made an exception rule." "I ignore exception rules for hack tools." "Ok, fine, i'm uninstalling you." "*makes taskbar disappear and deletes networking DLL*" Yeah, no thanks. Just because my port scanner wasn't from a professional site doesn't mean you need to tell me what i can and cannot do. Supposedly, everything i ran into were norton bugs, like ignoring exceptions and deleting those features. Let's be honest, they wanted everything to quit working without norton so i'd panic and reinstall it, thinking a virus hit me that fast after deleting norton.

And you knwo what i miss, even though it was annoying? That windows messenger thing, that you just can create popups and send them over the network. You could turn it off, and it was annoying, but there's nothing quite like sitting on some chat boards and suddenly seeing a pair of fun bags in ascii art. You could have so much fun with people with that. Did we add a whitelist feature or blacklist feature to get rid of the spam ones? No? Aw. We just got rid of the feature. Could you imagine if they would've expanded upon it, instead? Internet CB radio, man.

User avatar
Morkonan
Posts: 10113
Joined: Sun, 25. Sep 11, 04:33
x3tc

Post by Morkonan » Fri, 5. Jan 18, 19:47

kohlrak wrote:No, but it seems most people don't know how to operate the turn signal, either. You can always click "no," when it asks for admin permissions. You can say, "No, i don't agree to these terms of service." Though, with that one, there's a whole other debate that needs to be had.
That's true. But, do they "deserve" to be hit buy a bus? What sort of punishment is fitting for someone who forgets to use their turn signal? (Yes, I am annoyed by this, as well.) And, what if they don't see anyone else around? They're "alone" on the road and don't use their turn signal? Should the bus hiding in the bushes be able to jump them, then?
Yet do not people who leave their wallets and purses in plain view in a car with the doors unlocked deserve to have their credit card numbers and/or money stolen? There comes a point when you're pretty much begging for trouble. When users turn a blind eye to evidence that something is spying on them and might be malicious, they're asking for trouble. Doors have locks for a reason. Permissions prompts exist for a reason.
Someone being vulnerable is not an excuse for someone else to take advantage of that vulnerability.

We're dealing with a new jungle, here. There are boogaboos that people haven't yet developed natural instincts to help guard them against.
...That is correct: security should not come at the cost of freedom. This applies outside of technology, as well. "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." ~ Benjamin Franklin..
But, we do have to protect people from themselves to a certain extent, especially when it's only due to ignorance that they have placed themselves at risk. And, we have to protect innocents against criminals, else it's all just anarchy and not worth defining as anything. "Teh Interwebz" is "The Real World." Some people don't understand this, especially certain sorts of criminals who get surprised when the police show up at their door or they get hit with a DCMA letter or, worse, when they use their 1337haxxors skills and someone who's really ticked off shows up at their house with a gun or burns it down, using their 1337haxxors-real-gasoline-and-matches skills.

The internet is not some Brave New World - It's real and it has real-world consequences.
... Let's be honest, they wanted everything to quit working without norton so i'd panic and reinstall it, thinking a virus hit me that fast after deleting norton.
Norton started off with "being everything." Remember "Norton Utilities?" That was actually a decent bit of stuff when it came out. Then, Norton decided to default to "warn everyone about everything" and seriously diluted the value of... "warnings." To be fair, though, it was doing what it was supposed to do, just messily.
And you knwo what i miss, even though it was annoying? That windows messenger thing, that you just can create popups and send them over the network. You could turn it off, and it was annoying, but there's nothing quite like sitting on some chat boards and suddenly seeing a pair of fun bags in ascii art. You could have so much fun with people with that. Did we add a whitelist feature or blacklist feature to get rid of the spam ones? No? Aw. We just got rid of the feature. Could you imagine if they would've expanded upon it, instead? Internet CB radio, man.
I used to play with messenger on the network, every once in awhile. Fun, but it turned out to be a dangerously easy vector for an attack and stayed that way for a long time.

CB Radio for teh interwebz can be found all over the place. Old "AOL Messenger" did that, to a certain extent, as well as other apps. These days, there are much more sophisticated apps out there and, of course, there's always IRC. For myself, I miss newsgroups. They're still around, though, but they're not like they used to be.

I suppose my point is this: The Internet is real. It's the "Real World", not just some exclusive pool where only the initiated get to swim in the deep end. Being the real world, it's going to have to be subjected to our "real world" rules and even our "ethics" and "morality." And, where it manages to crawl free of those constraints, those participating in such activities are still subject to "rules" and "consequences."

User avatar
Morkonan
Posts: 10113
Joined: Sun, 25. Sep 11, 04:33
x3tc

Post by Morkonan » Fri, 5. Jan 18, 21:12

Just a quick note - Firefox has released an update in response to the latest vulnerability announcements: https://www.mozilla.org/en-US/security/ ... efox57.0.4

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Fri, 5. Jan 18, 23:28

kohlrak wrote:IIRC, not for 32bit apps, which still exist and will for a very long time, especially given the speed boost writing 32bit apps provides.
32 bit Windows kernels don't enforce driver signing or PatchGuard by default, presumably for backwards compatibility reasons, so the hypervisor method used on 64 bit won't be necessary. You still need to be administrator to install drivers in the first place, though. 32 bit kernels can hardly be considered modern at this point, and are pretty much dead now - Windows 10 does technically have a 32 bit variant, but it's extremely unusual (I've seen it once!). As for performance, this is nonsense - 32 bit is consistently slower than 64 bit, and for some applications it's a *lot* slower. Here is a pretty comprehensive set of benchmarks, but there are plenty of others. 64 bit x86 has more, wider registers and more vectorised instructions. I'm sure you could produce some pathological cases where 32 bit is faster if you really tried, but as the benchmarks demonstrate it's not common.

kohlrak wrote:x86 and many other processors have special dedicated instructions to debugging. Why can't the debugger be added to the EXE on compile time? I've made my own. Linux and Windows both have APIs for writing your own debuggers within the programs. But no, userland is where the sensitive data is most likely to get collected. In windows, for example, the libraries for getting text from the user are userland. All you'd have to do is know the program you want to grab data from. For most cases, IE (edge is IE, i don't care what people say), Firefox, Chrome, etc., to get sensitive information.
Because enforcing that the debugger live in the same process as the target makes debugging harder for literally no gain - if I want to maliciously extract data from some target process I have permissions to attach a debugger to, I can just inject the debugger code into the executable and then run it. I'm sure you can construct other similar attacks with a little thought. You cannot usefully protect information at a given privilege level from other things with the same or greater privileges.

What you're persistently ignoring is that just because it's in ring 3 doesn't mean it has the same privileges - different user accounts, different elevation levels, sandboxing, virtual secure mode, enclaves, etc all provide mechanisms for doing exactly this, despite it all being ring 3, and yes, this does prevent attaching debuggers, reading memory, and the like.

kohlrak wrote:Of course it's a separate issue. I asked myself why driver signing was brought up, but felt it a good opportunity. It's my device the software's running on. I should be able to take responsibility for my own actions. I shouldn't need my OS provider to be my nanny. I understand unsigned drivers cause bluescreens, but you could also benefit from not giving device drivers the option to bluescreen. KeBugCheck is callable from device drivers. Is it necessary? Absolutely not. Fix your bad design in the first place. Turn it into a wrapper and restart the device, which, IIRc, is what Linux does.
Bad drivers can and do crash Linux just as much as they can crash Windows. Linux does attempt to recover from situations where Windows chooses to crash, but this is explicitly a security decision on the part of Windows - once something has gone wrong, you have absolutely no guarantees about the integrity of the kernel and the only guaranteed safe option is to panic. (Likewise, this is why PatchGuard panics if it detects anything.)

If you don't want drivers to be able to crash your system, you need a microkernel architecture. There's no mainstream OSes which take this approach any more, due to the disadvantages - one of which is that, as pjk pointed out in another post, if a critical driver crashes the only real option left is to reboot the system anyway!

kohlrak wrote:That's not intel's job. You should learn how to use your tools. You can't sit there and write "this needs to be wrapped between calls to accomplish thsi for security" for every instruction or function. It's a fool's errand. Instead, the programmer should understand their tools and foresee this themselves.

They must've removed it fairly recently. Anyway, programmers are not toddlers, they shouldn't need nannies. You are supposed to know your tools. I can't nanny every developer out there. What if they make a buffer of size 1024, but then realize they want a larger buffer, but only raise the read function's number? How am i supposed to protect the programmer from themselves? The OS devs made some really stupid mistakes here. It's understandable that they made them, just as people who used gets() and got owned. But they need to accept responsibility for a stupid mistake instead of blaming the library provider for not being their nanny (to be fair, i dn't really see this from the OS devs).
Would you like to be smug about dumb programmers when security issues happen, or would you like security issues not to happen? "Programmers should understand every possible security flaw" is laughably naive. The best developer in the world still makes mistakes, and most code? Not written by the best developer in the world. Security is sufficiently complex that, for example, we're still finding security issues that have existed in most CPUs in the world for twenty years, and you expect every person on the planet writing code to suddenly be able to find every security issue affecting their code?

We absolutely need to make every possible effort to make writing secure code easier, because it just isn't possible to do reliably. The new wave of systems languages like Rust is a good step in the right direction, with a philosophy of removing as many ways of shooting oneself in the foot as possible.

kohlrak wrote:So i was right, amd doesn't allow out of order execution.
No, I (well, they) said they don't do speculative reads. They do out of order execution, as does any modern high performance architecture, which is why they're still vulnerable to some varieties of Spectre.

kohlrak wrote:I thought it used ring 2 for services and ring 1 for drivers. In that case, where are drivers running now? Please tell me it's not what i think it is... It would be all too predictable for microsoft.
Rings 1 and 2 are not functionally isolated from ring 0 - if you're running in one of those rings, you can get yourself to ring 0 if you try hard enough. There's little advantage to using them anyway, and other architectures typically just provide two rings. Using ring 1/2 for drivers hasn't been a thing since the OS/2 days - Windows, Linux, etc have never done it.
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

User avatar
Morkonan
Posts: 10113
Joined: Sun, 25. Sep 11, 04:33
x3tc

Post by Morkonan » Fri, 5. Jan 18, 23:53

AFAIK, all the major CPU mfrs are banding together to come up with solutions to this issue.

But, what are the implication for cloud architecture? They're going to have to implement solutions, too, right?

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Sat, 6. Jan 18, 01:36

Morkonan wrote:AFAIK, all the major CPU mfrs are banding together to come up with solutions to this issue.

But, what are the implication for cloud architecture? They're going to have to implement solutions, too, right?
There's a good summary of responses from various involved companies here: https://arstechnica.com/gadgets/2018/01 ... -about-it/

The short version is, there's OS and hypervisor level mitigations for Meltdown and some Spectre variants. Intel is deploying microcode updates for more recent processors which enable other Spectre mitigations, and they've also stated that future processors won't be vulnerable to Meltdown. Work is ongoing at a compiler level, as well as developers of particularly at-risk applications including browsers, to include other Spectre mitigations.
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

pjknibbs
Posts: 41359
Joined: Wed, 6. Nov 02, 20:31
x4

Post by pjknibbs » Sat, 6. Jan 18, 07:35

Tamina wrote:Does somebody know if this bug affects RyZen?
Can't find any information via Google, it always points to articles about Intel.
As with other AMD chips, Ryzen is not affected by the Meltdown bug, but it *is* affected by Spectre.

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Sat, 6. Jan 18, 17:54

I felt like ignoring this post, but then i decided it was better to respond.
red assassin wrote:
kohlrak wrote:IIRC, not for 32bit apps, which still exist and will for a very long time, especially given the speed boost writing 32bit apps provides.
32 bit Windows kernels don't enforce driver signing or PatchGuard by default, presumably for backwards compatibility reasons, so the hypervisor method used on 64 bit won't be necessary. You still need to be administrator to install drivers in the first place, though. 32 bit kernels can hardly be considered modern at this point, and are pretty much dead now - Windows 10 does technically have a 32 bit variant, but it's extremely unusual (I've seen it once!). As for performance, this is nonsense - 32 bit is consistently slower than 64 bit, and for some applications it's a *lot* slower. Here is a pretty comprehensive set of benchmarks, but there are plenty of others. 64 bit x86 has more, wider registers and more vectorised instructions. I'm sure you could produce some pathological cases where 32 bit is faster if you really tried, but as the benchmarks demonstrate it's not common.
Starting with 64bit, the processor can switch between 32bit and 64bit modes almost seemlessly, just like ARM can switch to different instruction sets (hence why they recommend mixing Thumb and regular ARM code, since Thumb code is smaller, but ARM code can be faster for certain tasks that require certain operations that have to be done a roundabout way with Thumb code [like DIV]).

Furthermore, those CPU enhancements that 64bit CPUs have are still available to programs running on 64bit machines using 32bit code (SSE3 and up, for example). The advantage comes from the fact that 64bit instructions are 125% or higher in size of their 32bit equivalents. I've actually seen coders (i haven't checked GCC on a 64bit machine lately [and gcc has improved alot the past couple years with optimization] since my linux boxes are 32bit [actually, the one is 64bit, but the android version is still 32bit, and I don't have much of a choice on that: Samsung Galaxy Tab E]) write "64bit code" where the only thing it bothered with 64bit for was pointers. The next kicker, 64bit address space is actually only 40 bits, so most programs would survive without that, even.

The reason for the slowdown is that when you compile code for 32bit, it assumes Pentium 4. I don't know if it's Wirth's law, or they just haven't gotten around to "32bit on 64bit compiling" yet. Naturally, you'll want to keep libraries using 64bit mode so they can handle programs without a "proxy address" or something. Intel also planned out this so that you could optimize processes by using 32bit mode. Unlike 16bit code running on a 32bit processor, 32bit code running on a 64bit processor is "sign extended" or "zero extended" so that you can easily switch back and forth (although linux doesn't feel the need to allow this, since code should be open source for it, anyway). What this translates to is, for a simple optimization example, instead of turning "return 0" into:

Code: Select all

mov rax, 0
ret
which, iirc, is 6bytes, you can do:

Code: Select all

xor eax, eax
ret
which is 2 bytes. The general rule of optimization for the past few years (thanks to increases in execution time, and such, that the best optimization you can do is making your code fit into cache pages, rather than picking faster operations (the fastest method is and eax, 0, but the code size is an extra byte). So, the trick is, unless you're calculating "long int" in all your calculations, you gain a major speed boost in using 32bit code. If you have a copy of GCC, try checking for me to see if it doesn't already use the E prefix registers for regular int calculations, instead of down promoting after using the R prefix registers.
kohlrak wrote:x86 and many other processors have special dedicated instructions to debugging. Why can't the debugger be added to the EXE on compile time? I've made my own. Linux and Windows both have APIs for writing your own debuggers within the programs. But no, userland is where the sensitive data is most likely to get collected. In windows, for example, the libraries for getting text from the user are userland. All you'd have to do is know the program you want to grab data from. For most cases, IE (edge is IE, i don't care what people say), Firefox, Chrome, etc., to get sensitive information.
Because enforcing that the debugger live in the same process as the target makes debugging harder for literally no gain - if I want to maliciously extract data from some target process I have permissions to attach a debugger to, I can just inject the debugger code into the executable and then run it. I'm sure you can construct other similar attacks with a little thought. You cannot usefully protect information at a given privilege level from other things with the same or greater privileges.

What you're persistently ignoring is that just because it's in ring 3 doesn't mean it has the same privileges - different user accounts, different elevation levels, sandboxing, virtual secure mode, enclaves, etc all provide mechanisms for doing exactly this, despite it all being ring 3, and yes, this does prevent attaching debuggers, reading memory, and the like.
Right, but those are superficial mechanisms. Even chroot gives a warning that you shouldn't use it for security purposes. I was actually going to use it for that purpose with a php app i was making, only to get that warning and change my mind. Frankly, the OS shouldn't be providing methods for programs to scan for other programs and then modify them. It's not necessary at all, and just serves as an extra hole. Use the things that our processors gave us for debugging, instead of some external solution which often has hard time "connecting to the process." Heck, even within the programming languages there are often constructs for debuggers (like try, throw, and catch in C++) being built into the code. You can use defines to enable and disable this debugging code to optimize.
kohlrak wrote:Of course it's a separate issue. I asked myself why driver signing was brought up, but felt it a good opportunity. It's my device the software's running on. I should be able to take responsibility for my own actions. I shouldn't need my OS provider to be my nanny. I understand unsigned drivers cause bluescreens, but you could also benefit from not giving device drivers the option to bluescreen. KeBugCheck is callable from device drivers. Is it necessary? Absolutely not. Fix your bad design in the first place. Turn it into a wrapper and restart the device, which, IIRc, is what Linux does.
Bad drivers can and do crash Linux just as much as they can crash Windows. Linux does attempt to recover from situations where Windows chooses to crash, but this is explicitly a security decision on the part of Windows - once something has gone wrong, you have absolutely no guarantees about the integrity of the kernel and the only guaranteed safe option is to panic. (Likewise, this is why PatchGuard panics if it detects anything.)

If you don't want drivers to be able to crash your system, you need a microkernel architecture. There's no mainstream OSes which take this approach any more, due to the disadvantages - one of which is that, as pjk pointed out in another post, if a critical driver crashes the only real option left is to reboot the system anyway!
What piece of equipment is too critical to need rebooting the whole system, but can have a driver that can crash? As long as you have CPU and RAM, your system can restart the hardware (since all drivers can be stored on the HD aside from the HDD driver which should always be in RAM). As far as i can tell, this *IS* what Linux does. My video card has crashed already, for example, and linux just restarted the GPU itself instead of going flashing the capslock light with a blank screen like it normally does with a kernel panic.
kohlrak wrote:That's not intel's job. You should learn how to use your tools. You can't sit there and write "this needs to be wrapped between calls to accomplish thsi for security" for every instruction or function. It's a fool's errand. Instead, the programmer should understand their tools and foresee this themselves.

They must've removed it fairly recently. Anyway, programmers are not toddlers, they shouldn't need nannies. You are supposed to know your tools. I can't nanny every developer out there. What if they make a buffer of size 1024, but then realize they want a larger buffer, but only raise the read function's number? How am i supposed to protect the programmer from themselves? The OS devs made some really stupid mistakes here. It's understandable that they made them, just as people who used gets() and got owned. But they need to accept responsibility for a stupid mistake instead of blaming the library provider for not being their nanny (to be fair, i dn't really see this from the OS devs).
Would you like to be smug about dumb programmers when security issues happen, or would you like security issues not to happen? "Programmers should understand every possible security flaw" is laughably naive. The best developer in the world still makes mistakes, and most code? Not written by the best developer in the world. Security is sufficiently complex that, for example, we're still finding security issues that have existed in most CPUs in the world for twenty years, and you expect every person on the planet writing code to suddenly be able to find every security issue affecting their code?
Oh, i understand, but there's no reason we should assume developers can't write secure code, either. They should own up to their mistakes when they make them, instead of pointing their fingers and blaming everyone else. But that's why we have patching. We can't blame a certain few "elites" like intel for the mistakes of a dumb programmer. We could also work on improving coding education, as well. Alot of people today leave colleges with less knowledge and experience than me, and i'm just a cowboy coder with no degree, yet I can code my own toy OS (entirely in assembly, without stealing code from Linux like grub or some othe boot loader) and programming language, which baffles most people with degrees for some unknown reason. People these days seem to get programming degrees like blackbelts out of McDojo. It's not like good education is too hard for people to understand, but we've been simplifying that to the degree that half the people don't even know half the language that they're even coding in. I had arguments with teachers over whether or not to teach & and |, when students were confused that their code compiled with & and |, but didn't produce the results they were expecting (since they wanted to use && and ||). This is what creates these kinds of coders: they assume that since most errors produce a compiler issue, that if it compiles but has bugs, it's probably an off-by-one error or something like that, rather than a mistyped ==, &&, || or even pulling off a unary * (ended up with a pointer dereference instead of a multiply).
We absolutely need to make every possible effort to make writing secure code easier, because it just isn't possible to do reliably. The new wave of systems languages like Rust is a good step in the right direction, with a philosophy of removing as many ways of shooting oneself in the foot as possible.
If you can't shoot yourself in the foot, you often can't get anything done. I remember having this one really annoying issue where i tried to make some code and store an IPv4 in a uint32 so I could encode it, but the only way I could do it was with a plethora of really inefficient shifts and such because type-casting was an error instead of a warning. Took me days to finally write it, then i wrote it in 32bit x86 assembly in a few hours. People say you're not supposed to be able to do things faster in assembly, yet I did, because these protections were getting in the way. I was constantly fighting castes in that code.
kohlrak wrote:So i was right, amd doesn't allow out of order execution.
No, I (well, they) said they don't do speculative reads. They do out of order execution, as does any modern high performance architecture, which is why they're still vulnerable to some varieties of Spectre.
That's even worse. Speculative reads (aka speculative execution) is branch prediction. It's a staple of x86's pipeline optimization. I really miss ARM's original response to it (conditional instructions, but they switched to branch prediction to save on instruction values so they could fit more instructions in a smaller space). Now i'm really skeptical of AMD not being affected.
kohlrak wrote:I thought it used ring 2 for services and ring 1 for drivers. In that case, where are drivers running now? Please tell me it's not what i think it is... It would be all too predictable for microsoft.
Rings 1 and 2 are not functionally isolated from ring 0 - if you're running in one of those rings, you can get yourself to ring 0 if you try hard enough. There's little advantage to using them anyway, and other architectures typically just provide two rings. Using ring 1/2 for drivers hasn't been a thing since the OS/2 days - Windows, Linux, etc have never done it.
In what way can rings 1 and 2 jump into ring 0? I never read anything like that in the intel manuals. Have you coded software implementing this stuff, before?

User avatar
Tamina
Moderator (Deutsch)
Moderator (Deutsch)
Posts: 4543
Joined: Sun, 26. Jan 14, 09:56

Post by Tamina » Sat, 6. Jan 18, 17:57

pjknibbs wrote:
Tamina wrote:Does somebody know if this bug affects RyZen?
Can't find any information via Google, it always points to articles about Intel.
As with other AMD chips, Ryzen is not affected by the Meltdown bug, but it *is* affected by Spectre.
Thank you very much :)

Code: Select all

Und wenn ein Forenbösewicht, was Ungezogenes spricht, dann hol' ich meinen Kaktus und der sticht sticht sticht.
  /l、 
゙(゚、 。 7 
 l、゙ ~ヽ   / 
 じしf_, )ノ 

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Sat, 6. Jan 18, 18:53

I'm going to fork the education point in a new thread, because i think it's worth talking about separately.

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Sat, 6. Jan 18, 22:12

kohlrak wrote:Starting with 64bit, the processor can switch between 32bit and 64bit modes almost seemlessly, just like ARM can switch to different instruction sets (hence why they recommend mixing Thumb and regular ARM code, since Thumb code is smaller, but ARM code can be faster for certain tasks that require certain operations that have to be done a roundabout way with Thumb code [like DIV]).

Furthermore, those CPU enhancements that 64bit CPUs have are still available to programs running on 64bit machines using 32bit code (SSE3 and up, for example). The advantage comes from the fact that 64bit instructions are 125% or higher in size of their 32bit equivalents. I've actually seen coders (i haven't checked GCC on a 64bit machine lately [and gcc has improved alot the past couple years with optimization] since my linux boxes are 32bit [actually, the one is 64bit, but the android version is still 32bit, and I don't have much of a choice on that: Samsung Galaxy Tab E]) write "64bit code" where the only thing it bothered with 64bit for was pointers. The next kicker, 64bit address space is actually only 40 bits, so most programs would survive without that, even.

The reason for the slowdown is that when you compile code for 32bit, it assumes Pentium 4. I don't know if it's Wirth's law, or they just haven't gotten around to "32bit on 64bit compiling" yet. Naturally, you'll want to keep libraries using 64bit mode so they can handle programs without a "proxy address" or something. Intel also planned out this so that you could optimize processes by using 32bit mode. Unlike 16bit code running on a 32bit processor, 32bit code running on a 64bit processor is "sign extended" or "zero extended" so that you can easily switch back and forth (although linux doesn't feel the need to allow this, since code should be open source for it, anyway). What this translates to is, for a simple optimization example, instead of turning "return 0" into:

Code: Select all

mov rax, 0
ret
which, iirc, is 6bytes, you can do:

Code: Select all

xor eax, eax
ret
which is 2 bytes. The general rule of optimization for the past few years (thanks to increases in execution time, and such, that the best optimization you can do is making your code fit into cache pages, rather than picking faster operations (the fastest method is and eax, 0, but the code size is an extra byte). So, the trick is, unless you're calculating "long int" in all your calculations, you gain a major speed boost in using 32bit code. If you have a copy of GCC, try checking for me to see if it doesn't already use the E prefix registers for regular int calculations, instead of down promoting after using the R prefix registers.
I think you have a misunderstanding about the difference between register size and processor mode going on here. In long mode (i.e. 64 bit), you can of course still touch eax etc (just as you can touch ax, al and ah from both 64 and 32 bit modes, for that matter) to work with 32 bit values, and it's good practice to use the size of variable you actually need. This is very much a separate thing from a mode transition to 32 bit protected mode, which allows you to run code written for 32 bit x86 - you can't just point a processor in long mode at 32 bit code and expect it to work. Reasons for this include: a) some of the opcodes have changed (several blocks of 32 bit instructions were moved to make room for the new 64 bit instructions), b) page table layout is necessarily different, and c) some architectural features have changed, e.g. segmentation is gone in long mode. If you have a process running in 32 bit compatibility mode and you want to call other code that's 64 bit (which includes syscalls to the kernel, since that will be 64 bit in this case), something needs to mode transition. This is handled on Windows by WOW64 and on Linux by the kernel when it receives a syscall, and it isn't a free operation as it's a context switch.

Distros tend to build 32 bit versions to maintain some degree of backwards compatibility, so yes, some of the extensions are disabled, which explains some of the cases where 64 bit code is significantly faster (generally stuff which is particularly suited to vectorisation). It certainly doesn't make 64 bit code any slower than 32. If you're compiling yourself it's easy enough to turn all the extensions back on (it's -march=native for gcc) and test. I ran a quick test with xz (since it's easy to compile and very CPU-intensive) - compressing a 1GB random file (reading from a ramdisk and writing to /dev/null to avoid disk bandwidth being an issue), with identical compile options other than the architecture (i.e. optimisation and use of all available extensions are enabled), 32 bit takes ~7m40s and 64 bit takes ~7m10s (across a few repeated runs, variance is ~5s, on my Core i5-3350P). It isn't super significant, but it's also definitely faster on 64 bit!

Optimisation is also quite a bit more complicated than "just make the code as small as possible", or -Os and -O2 would be the same thing (or /Ot vs /Os on MSVC).
kohlrak wrote:Right, but those are superficial mechanisms. Even chroot gives a warning that you shouldn't use it for security purposes. I was actually going to use it for that purpose with a php app i was making, only to get that warning and change my mind. Frankly, the OS shouldn't be providing methods for programs to scan for other programs and then modify them. It's not necessary at all, and just serves as an extra hole. Use the things that our processors gave us for debugging, instead of some external solution which often has hard time "connecting to the process." Heck, even within the programming languages there are often constructs for debuggers (like try, throw, and catch in C++) being built into the code. You can use defines to enable and disable this debugging code to optimize.
"Superficial"? You go ahead and write me a Chrome sandbox escape (that is, given native code execution in a Chrome renderer process, gain execution elsewhere on the system), for example, and tell me just how superficial those mechanisms are. I'll wait.

Actually, on second thoughts, if you write a Chrome sandbox escape, you probably just want to sell it, as it's worth at least tens of thousands of dollars.

As I said, if you're the same privilege level as another process there is no way of preventing data access. Modify the executable on disk to write the information you want out; just read the same files and do the same calculations as it does; modify one of the libraries it loads to do that; write your own process loader that injects code to read the information you want and use that to start the process; etc etc... You don't need to be able to attach a debugger/ReadProcessMemory/etc the obvious ways to get there.
kohlrak wrote:What piece of equipment is too critical to need rebooting the whole system, but can have a driver that can crash? As long as you have CPU and RAM, your system can restart the hardware (since all drivers can be stored on the HD aside from the HDD driver which should always be in RAM). As far as i can tell, this *IS* what Linux does. My video card has crashed already, for example, and linux just restarted the GPU itself instead of going flashing the capslock light with a blank screen like it normally does with a kernel panic.
Your HDD driver crashes and corrupts itself in memory. Any driver crashes and sends junk data to whatever was reading from it when it went wrong, throwing random spanners into the works of the rest of your system. And so forth. The Linux kernel makes an attempt to decide whether it thinks a driver crash is recoverable, in which case it will try (it refers to this as an "oops"), or non-recoverable, in which case it will panic. However, it's definitely not right all the time - I've had my system do some very bizarre things after oopses that weren't as recoverable as it thought they were. (Writing kernel code is an exciting experience, believe me.) Windows chooses not to take that risk for a variety of reasons, including security.

At any rate, this isn't the only issue with microkernels - all of the extra ring transitions to talk to your drivers introduce extra latency, extra complexity (which means, ironically, more risk of things going wrong, among other disadvantages), and architectural issues introduced by inability to share state (which means more copying of things). Anyway, kernel design philosophy isn't really my area - macrokernels won, I just have to deal with them. If you want to know more about why, read some of Torvalds' thoughts about them.
kohlrak wrote:Oh, i understand, but there's no reason we should assume developers can't write secure code, either. They should own up to their mistakes when they make them, instead of pointing their fingers and blaming everyone else. But that's why we have patching. We can't blame a certain few "elites" like intel for the mistakes of a dumb programmer. We could also work on improving coding education, as well. Alot of people today leave colleges with less knowledge and experience than me, and i'm just a cowboy coder with no degree, yet I can code my own toy OS (entirely in assembly, without stealing code from Linux like grub or some othe boot loader) and programming language, which baffles most people with degrees for some unknown reason. People these days seem to get programming degrees like blackbelts out of McDojo. It's not like good education is too hard for people to understand, but we've been simplifying that to the degree that half the people don't even know half the language that they're even coding in. I had arguments with teachers over whether or not to teach & and |, when students were confused that their code compiled with & and |, but didn't produce the results they were expecting (since they wanted to use && and ||). This is what creates these kinds of coders: they assume that since most errors produce a compiler issue, that if it compiles but has bugs, it's probably an off-by-one error or something like that, rather than a mistyped ==, &&, || or even pulling off a unary * (ended up with a pointer dereference instead of a multiply).
"There's no reason we should assume developers can't write secure code"? How about, like, all code ever written? Coding is hard to start with, but security is incredibly hard. Understanding machine behaviour on a deep enough level to get exploits is hard; keeping track of all the classes of exploit is hard; not making any mistakes even when you know what you're doing is hard. Writing completely safe buffer handling code with no mistakes in is hard enough even when you're an expert who knows how all of the exploits for it work, and other classes of bug are harder to understand and harder to reason about. And besides, people come up with new exploitation approaches that nobody has had to deal with before pretty often, which means a program developed perfectly to the state of the art today might be laughably insecure tomorrow. If your code is complicated enough to do anything useful or to have security boundaries, I can pretty much guarantee it has security flaws.

I don't disagree for a moment that the state of programming education is terrible, or that the average developer is terrible, but I don't think any of this is a fixable problem - what exactly are you going to do about it? You can't stop people programming. And even if it *was* fixable, as I say, the best developer in the world is still going to produce insecure code. But you can improve the tools.
kohlrak wrote:If you can't shoot yourself in the foot, you often can't get anything done. I remember having this one really annoying issue where i tried to make some code and store an IPv4 in a uint32 so I could encode it, but the only way I could do it was with a plethora of really inefficient shifts and such because type-casting was an error instead of a warning. Took me days to finally write it, then i wrote it in 32bit x86 assembly in a few hours. People say you're not supposed to be able to do things faster in assembly, yet I did, because these protections were getting in the way. I was constantly fighting castes in that code.
The compiler probably optimises away anything particularly weird you do. Either way, sounds like you were either using something storing the address in a particularly odd way or just doing it wrong, because the only way I've ever seen IPv4 addresses represented is a string or something that's just a typedef for a uint32_t.

The irony here is that a more modern, safer language would likely have a type for addresses that handles all of this for you and saves trying to blindly cast things at all.
kohlrak wrote:That's even worse. Speculative reads (aka speculative execution) is branch prediction. It's a staple of x86's pipeline optimization. I really miss ARM's original response to it (conditional instructions, but they switched to branch prediction to save on instruction values so they could fit more instructions in a smaller space). Now i'm really skeptical of AMD not being affected.
I was probably inadequately clear here - I said "they do not do the speculative reads that are responsible", meaning it's the specific vulnerable cross-privilege read, not a general case of "never speculatively read anything, including instructions". But to save further argument I dug up the specific quote from AMD:
The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
kohlrak wrote:In what way can rings 1 and 2 jump into ring 0? I never read anything like that in the intel manuals. Have you coded software implementing this stuff, before?
The most obvious issue is that the page tables have a single bit for supervisor mode, there's no per-ring granularity. Anything less than ring 3 is supervisor mode. (See the paging documentation in Intel's manuals.) If you can read and write to all memory used by code in ring 0, it's clearly trivial to gain execution in ring 0.
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

pjknibbs
Posts: 41359
Joined: Wed, 6. Nov 02, 20:31
x4

Post by pjknibbs » Sat, 6. Jan 18, 23:05

red assassin wrote: At any rate, this isn't the only issue with microkernels - all of the extra ring transitions to talk to your drivers introduce extra latency, extra complexity (which means, ironically, more risk of things going wrong, among other disadvantages), and architectural issues introduced by inability to share state (which means more copying of things).
This is another reason that Microsoft moved the drivers into the kernel starting with NT4--having them in a separate user-mode process required a lot of crosstalk between the kernel and user mode which slowed things down a lot.

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Sun, 7. Jan 18, 16:16

red assassin wrote:
kohlrak wrote:Starting with 64bit, the processor can switch between 32bit and 64bit modes almost seemlessly, just like ARM can switch to different instruction sets (hence why they recommend mixing Thumb and regular ARM code, since Thumb code is smaller, but ARM code can be faster for certain tasks that require certain operations that have to be done a roundabout way with Thumb code [like DIV]).

Furthermore, those CPU enhancements that 64bit CPUs have are still available to programs running on 64bit machines using 32bit code (SSE3 and up, for example). The advantage comes from the fact that 64bit instructions are 125% or higher in size of their 32bit equivalents. I've actually seen coders (i haven't checked GCC on a 64bit machine lately [and gcc has improved alot the past couple years with optimization] since my linux boxes are 32bit [actually, the one is 64bit, but the android version is still 32bit, and I don't have much of a choice on that: Samsung Galaxy Tab E]) write "64bit code" where the only thing it bothered with 64bit for was pointers. The next kicker, 64bit address space is actually only 40 bits, so most programs would survive without that, even.

The reason for the slowdown is that when you compile code for 32bit, it assumes Pentium 4. I don't know if it's Wirth's law, or they just haven't gotten around to "32bit on 64bit compiling" yet. Naturally, you'll want to keep libraries using 64bit mode so they can handle programs without a "proxy address" or something. Intel also planned out this so that you could optimize processes by using 32bit mode. Unlike 16bit code running on a 32bit processor, 32bit code running on a 64bit processor is "sign extended" or "zero extended" so that you can easily switch back and forth (although linux doesn't feel the need to allow this, since code should be open source for it, anyway). What this translates to is, for a simple optimization example, instead of turning "return 0" into:

Code: Select all

mov rax, 0
ret
which, iirc, is 6bytes, you can do:

Code: Select all

xor eax, eax
ret
which is 2 bytes. The general rule of optimization for the past few years (thanks to increases in execution time, and such, that the best optimization you can do is making your code fit into cache pages, rather than picking faster operations (the fastest method is and eax, 0, but the code size is an extra byte). So, the trick is, unless you're calculating "long int" in all your calculations, you gain a major speed boost in using 32bit code. If you have a copy of GCC, try checking for me to see if it doesn't already use the E prefix registers for regular int calculations, instead of down promoting after using the R prefix registers.
I think you have a misunderstanding about the difference between register size and processor mode going on here. In long mode (i.e. 64 bit), you can of course still touch eax etc (just as you can touch ax, al and ah from both 64 and 32 bit modes, for that matter) to work with 32 bit values, and it's good practice to use the size of variable you actually need. This is very much a separate thing from a mode transition to 32 bit protected mode, which allows you to run code written for 32 bit x86 - you can't just point a processor in long mode at 32 bit code and expect it to work. Reasons for this include: a) some of the opcodes have changed (several blocks of 32 bit instructions were moved to make room for the new 64 bit instructions), b) page table layout is necessarily different, and c) some architectural features have changed, e.g. segmentation is gone in long mode. If you have a process running in 32 bit compatibility mode and you want to call other code that's 64 bit (which includes syscalls to the kernel, since that will be 64 bit in this case), something needs to mode transition. This is handled on Windows by WOW64 and on Linux by the kernel when it receives a syscall, and it isn't a free operation as it's a context switch.
Which ones were moved? Last time i checked for x86, they used the "reserved" instructions and turned them into prefixes, which is a staple of x86's ability to have variable length instructions. The context switching ends up being part of the general task switching, anyway, so it's not as much as you think. IIRC, it's a matter of pointing the segment registers in the right direction, since the format is basically the same. TBH, i don't have experience in this matter, just reading. I've stayed pretty close to 32bit transitions, which, agreed, are a pain. Intel, however, understood this would be a problem (since BIOSes stayed 16bit, many 32bit OSes constantly had to switch back and forth and it was a major pain, but even that switching isn't as bad as you think, and i do have experience with that), so they tried to simplify the mode switching when it came to switching between pmode and long mode. And, frankly, what these compilers do just to call a function is more complex than the switching between 32bit and 16bit code (only marginally, if it's not doing the useless mov operations that i've seen GCC do on 32bit x86 code).
Distros tend to build 32 bit versions to maintain some degree of backwards compatibility, so yes, some of the extensions are disabled, which explains some of the cases where 64 bit code is significantly faster (generally stuff which is particularly suited to vectorisation). It certainly doesn't make 64 bit code any slower than 32. If you're compiling yourself it's easy enough to turn all the extensions back on (it's -march=native for gcc) and test. I ran a quick test with xz (since it's easy to compile and very CPU-intensive) - compressing a 1GB random file (reading from a ramdisk and writing to /dev/null to avoid disk bandwidth being an issue), with identical compile options other than the architecture (i.e. optimisation and use of all available extensions are enabled), 32 bit takes ~7m40s and 64 bit takes ~7m10s (across a few repeated runs, variance is ~5s, on my Core i5-3350P). It isn't super significant, but it's also definitely faster on 64 bit!
But your example still says nothing, since the vectorization still isn't enabled. If it is, then GCC has a hard time with 32bit x86 code, still (and i remember it having a hard time). The 64bit instructions are longer. If it's using the 32bit equivalents, as it should, the 64bit code will be marginally slower due to occasionally having to handle pointers, even if they're predictable in lower memory. There's nothing inherent about 32bit that makes it slower than 64bit, while the opposite is true.
Optimisation is also quite a bit more complicated than "just make the code as small as possible", or -Os and -O2 would be the same thing (or /Ot vs /Os on MSVC).
It is more complex, but it's a general rule. One Intel picked out and creamed AMD with a few years back when AMD was spending all sorts of money on making individual instructions execute faster while intel spent very little money on making the caches larger. Intel figured out that the biggest bottleneck on CPUs today is cache misses. Given that hello world programs are compiling so large should be enough to point out the source of the problem: Wirth's Law.
kohlrak wrote:Right, but those are superficial mechanisms. Even chroot gives a warning that you shouldn't use it for security purposes. I was actually going to use it for that purpose with a php app i was making, only to get that warning and change my mind. Frankly, the OS shouldn't be providing methods for programs to scan for other programs and then modify them. It's not necessary at all, and just serves as an extra hole. Use the things that our processors gave us for debugging, instead of some external solution which often has hard time "connecting to the process." Heck, even within the programming languages there are often constructs for debuggers (like try, throw, and catch in C++) being built into the code. You can use defines to enable and disable this debugging code to optimize.
"Superficial"? You go ahead and write me a Chrome sandbox escape (that is, given native code execution in a Chrome renderer process, gain execution elsewhere on the system), for example, and tell me just how superficial those mechanisms are. I'll wait.
Just because it'd take a lot of work to pull off, doesn't mean it's not superficial. It's more like a sandbag bunker as opposed to a steel one. Those are fairly superficial, yet have fun getting through one without an explosive, which is alot of money and work.
Actually, on second thoughts, if you write a Chrome sandbox escape, you probably just want to sell it, as it's worth at least tens of thousands of dollars.
So would a decent cryptor, but those things are open source, even. A fool and his money are soon parted.
As I said, if you're the same privilege level as another process there is no way of preventing data access. Modify the executable on disk to write the information you want out; just read the same files and do the same calculations as it does; modify one of the libraries it loads to do that; write your own process loader that injects code to read the information you want and use that to start the process; etc etc... You don't need to be able to attach a debugger/ReadProcessMemory/etc the obvious ways to get there.
No, you don't, but saying that just because you can break the window of a store to rob the place as an excuse for putting the key under the front door matt is inexcuseable.
kohlrak wrote:What piece of equipment is too critical to need rebooting the whole system, but can have a driver that can crash? As long as you have CPU and RAM, your system can restart the hardware (since all drivers can be stored on the HD aside from the HDD driver which should always be in RAM). As far as i can tell, this *IS* what Linux does. My video card has crashed already, for example, and linux just restarted the GPU itself instead of going flashing the capslock light with a blank screen like it normally does with a kernel panic.
Your HDD driver crashes and corrupts itself in memory. Any driver crashes and sends junk data to whatever was reading from it when it went wrong, throwing random spanners into the works of the rest of your system. And so forth. The Linux kernel makes an attempt to decide whether it thinks a driver crash is recoverable, in which case it will try (it refers to this as an "oops"), or non-recoverable, in which case it will panic. However, it's definitely not right all the time - I've had my system do some very bizarre things after oopses that weren't as recoverable as it thought they were. (Writing kernel code is an exciting experience, believe me.) Windows chooses not to take that risk for a variety of reasons, including security.
Yeah, but that doesn't mean all drivers should have access to the crash function. The majority of windows crashes have been GPU crashes, so if MS wants to limit the number of BSoDs, the first thing it should do is take it away from them. But, that comes back to, you have RAM and CPU, with innate drivers, and the HDD driver. The HDD driver should not get corrupted in memory, but should it, it should cause a kernel panic. However, that same driver should be part of the kernel. The rest should not need to panic, as anything else can be restarted (or, at least, the device makers should have the option to cut the power and re-establish power, instead of demanding everything go down).
At any rate, this isn't the only issue with microkernels - all of the extra ring transitions to talk to your drivers introduce extra latency, extra complexity (which means, ironically, more risk of things going wrong, among other disadvantages), and architectural issues introduced by inability to share state (which means more copying of things).
Have you ever written transition code? It's really alot less than you think. Just read a tutorial on getting to protected mode from real mode, or, if you'd like, i'd just give you the code here, as i've actually written some. Long mode is a different story, but i've read it is easier by design, due to how many issues came from the 16bit to 32bit transition. They realized that they didn't want to make the same mistake again.
Anyway, kernel design philosophy isn't really my area - macrokernels won, I just have to deal with them. If you want to know more about why, read some of Torvalds' thoughts about them.
I developed a microkernel. The short answer is, it's easier to convince companies to write drivers for macro-kernels, because they want as many options as possible, even if you're giving away your stability to them.
kohlrak wrote:Oh, i understand, but there's no reason we should assume developers can't write secure code, either. They should own up to their mistakes when they make them, instead of pointing their fingers and blaming everyone else. But that's why we have patching. We can't blame a certain few "elites" like intel for the mistakes of a dumb programmer. We could also work on improving coding education, as well. Alot of people today leave colleges with less knowledge and experience than me, and i'm just a cowboy coder with no degree, yet I can code my own toy OS (entirely in assembly, without stealing code from Linux like grub or some othe boot loader) and programming language, which baffles most people with degrees for some unknown reason. People these days seem to get programming degrees like blackbelts out of McDojo. It's not like good education is too hard for people to understand, but we've been simplifying that to the degree that half the people don't even know half the language that they're even coding in. I had arguments with teachers over whether or not to teach & and |, when students were confused that their code compiled with & and |, but didn't produce the results they were expecting (since they wanted to use && and ||). This is what creates these kinds of coders: they assume that since most errors produce a compiler issue, that if it compiles but has bugs, it's probably an off-by-one error or something like that, rather than a mistyped ==, &&, || or even pulling off a unary * (ended up with a pointer dereference instead of a multiply).
"There's no reason we should assume developers can't write secure code"? How about, like, all code ever written? Coding is hard to start with, but security is incredibly hard. Understanding machine behaviour on a deep enough level to get exploits is hard; keeping track of all the classes of exploit is hard; not making any mistakes even when you know what you're doing is hard. Writing completely safe buffer handling code with no mistakes in is hard enough even when you're an expert who knows how all of the exploits for it work, and other classes of bug are harder to understand and harder to reason about. And besides, people come up with new exploitation approaches that nobody has had to deal with before pretty often, which means a program developed perfectly to the state of the art today might be laughably insecure tomorrow. If your code is complicated enough to do anything useful or to have security boundaries, I can pretty much guarantee it has security flaws.
KISS makes it easy, actually. Generally, the idea of stack busting is taking advantage of injecting a callback or using a stack overflow. If you assume incoming data could be malicious, it's much, much easier to avoid. If you don't mind loss of efficiency, but are determined to use a stack, you make a simple stack jail. And, yes, there are often instructions specifically to pull this off (intel has a bounds instruction). And is it easy to pull off? Absolutely.
I don't disagree for a moment that the state of programming education is terrible, or that the average developer is terrible, but I don't think any of this is a fixable problem - what exactly are you going to do about it? You can't stop people programming. And even if it *was* fixable, as I say, the best developer in the world is still going to produce insecure code. But you can improve the tools.
And this is precisely where we made the mistake, because by assuming the tools should nanny the programmer, when the tools don't nanny you, you make a false assumption that they will nanny you. Your failure to use my library securely is not my fault. I can choose to nanny you if i want, but if i'm not making any claims of nannying you, you can't call it my bug. It's your bug.
kohlrak wrote:If you can't shoot yourself in the foot, you often can't get anything done. I remember having this one really annoying issue where i tried to make some code and store an IPv4 in a uint32 so I could encode it, but the only way I could do it was with a plethora of really inefficient shifts and such because type-casting was an error instead of a warning. Took me days to finally write it, then i wrote it in 32bit x86 assembly in a few hours. People say you're not supposed to be able to do things faster in assembly, yet I did, because these protections were getting in the way. I was constantly fighting castes in that code.
The compiler probably optimises away anything particularly weird you do. Either way, sounds like you were either using something storing the address in a particularly odd way or just doing it wrong, because the only way I've ever seen IPv4 addresses represented is a string or something that's just a typedef for a uint32_t.

The irony here is that a more modern, safer language would likely have a type for addresses that handles all of this for you and saves trying to blindly cast things at all.
Which would actually make the problem worse, because instead of it being a pain in the rear to get around the type casting, it simply wouldn't let me try. That's a great idea.
kohlrak wrote:That's even worse. Speculative reads (aka speculative execution) is branch prediction. It's a staple of x86's pipeline optimization. I really miss ARM's original response to it (conditional instructions, but they switched to branch prediction to save on instruction values so they could fit more instructions in a smaller space). Now i'm really skeptical of AMD not being affected.
I was probably inadequately clear here - I said "they do not do the speculative reads that are responsible", meaning it's the specific vulnerable cross-privilege read, not a general case of "never speculatively read anything, including instructions". But to save further argument I dug up the specific quote from AMD:
Well, thank you for clarifying.
The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Wait, so they only allow it when it's allowed? They think the intel code wouldn't hit rock bottom on a page fault? They're confusing me, here, on how they're different from intel.
kohlrak wrote:In what way can rings 1 and 2 jump into ring 0? I never read anything like that in the intel manuals. Have you coded software implementing this stuff, before?
The most obvious issue is that the page tables have a single bit for supervisor mode, there's no per-ring granularity. Anything less than ring 3 is supervisor mode. (See the paging documentation in Intel's manuals.) If you can read and write to all memory used by code in ring 0, it's clearly trivial to gain execution in ring 0.
Maybe we're thinking of different types of pages, because the ones i use are 2 bits, which allow 4 diferent rings (0-3).

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Sun, 7. Jan 18, 23:32

kohlrak wrote:Which ones were moved? Last time i checked for x86, they used the "reserved" instructions and turned them into prefixes, which is a staple of x86's ability to have variable length instructions. The context switching ends up being part of the general task switching, anyway, so it's not as much as you think. IIRC, it's a matter of pointing the segment registers in the right direction, since the format is basically the same. TBH, i don't have experience in this matter, just reading. I've stayed pretty close to 32bit transitions, which, agreed, are a pain. Intel, however, understood this would be a problem (since BIOSes stayed 16bit, many 32bit OSes constantly had to switch back and forth and it was a major pain, but even that switching isn't as bad as you think, and i do have experience with that), so they tried to simplify the mode switching when it came to switching between pmode and long mode. And, frankly, what these compilers do just to call a function is more complex than the switching between 32bit and 16bit code (only marginally, if it's not doing the useless mov operations that i've seen GCC do on 32bit x86 code).
0x40-0x4f (the single byte inc/dec instructions) in x86 are reassigned to the REX prefix in x86_64. Additionally, 0x63 (ARPL) becomes MOVSXD.
Context switching isn't that expensive in the grand scheme of things, sure, but it's still not free and can take in the order of microseconds, as it necessarily invalidates significant chunks of the cache.
kohlrak wrote:But your example still says nothing, since the vectorization still isn't enabled. If it is, then GCC has a hard time with 32bit x86 code, still (and i remember it having a hard time). The 64bit instructions are longer. If it's using the 32bit equivalents, as it should, the 64bit code will be marginally slower due to occasionally having to handle pointers, even if they're predictable in lower memory. There's nothing inherent about 32bit that makes it slower than 64bit, while the opposite is true.
-march=native -O3 enables vectorisation with any extensions supported on the current processor in any version of gcc from the last few years. I have checked and AVX vectorised loops are emitted on both 32 and 64 bit versions of my build.

The additional memory and cache overhead from 64 bit pointers is so small as to be negligible under most circumstances, while 32 bit suffers significantly from having half as many general-purpose registers - it's harder to optimise and requires much heavier usage of the stack, which is necessarily slow. 32 bit also suffers in any cases where maths on 64-bit integers is required.
kohlrak wrote:Just because it'd take a lot of work to pull off, doesn't mean it's not superficial. It's more like a sandbag bunker as opposed to a steel one. Those are fairly superficial, yet have fun getting through one without an explosive, which is alot of money and work.
I'm genuinely confused what you think the difference is that makes one superficial and the other not, given they're both of comparable difficulty to breach and the mechanism is roughly the same. The kernel provides to userland a selection of endpoints which can be called to do various things; if you find the right sort of bug in the handling of those endpoints you can compromise the kernel. The kernel and various userland processes also provide a selection of endpoints which can be called by other userland processes to do various things in userland, taking into account security boundaries there, and if you find the right sort of bug in the handling of one of those you can escalate privileges in userland. Arguably the attack surface to escape from the Chrome sandbox is smaller than the kernel attack surface from higher-privileged userland; all you can really do from the sandbox is send IPC messages to other Chrome processes (you can't even make most syscalls!).
kohlrak wrote:No, you don't, but saying that just because you can break the window of a store to rob the place as an excuse for putting the key under the front door matt is inexcuseable.
But this analogy doesn't make sense - the store here has the same privileges as you do - it's your store. You can do whatever you like with it. If the government came along and announced you were no longer allowed to enter your store through the front door, you'd a) think it was stupid, and b) enter your damn store via another means. If the government came along and stopped you from using any means of accessing your store it would be a pretty damn useless store. This analogy is getting a little strained, but the point is you're asserting that functionally useless security boundaries should be added where it doesn't make sense to have a boundary, while simultaneously (in the section prior to this one) pretending that the security boundaries that do exist are superficial.
kohlrak wrote:Yeah, but that doesn't mean all drivers should have access to the crash function. The majority of windows crashes have been GPU crashes, so if MS wants to limit the number of BSoDs, the first thing it should do is take it away from them. But, that comes back to, you have RAM and CPU, with innate drivers, and the HDD driver. The HDD driver should not get corrupted in memory, but should it, it should cause a kernel panic. However, that same driver should be part of the kernel. The rest should not need to panic, as anything else can be restarted (or, at least, the device makers should have the option to cut the power and re-establish power, instead of demanding everything go down).
I'm not sure you quite understand how NT kernel bugchecks work. It's generally not that drivers are calling KeBugCheck() themselves. There are a bunch of circumstances under which the kernel can crash - common ones include any access violation, a variety of exciting IRQL issues, pool corruption, and various ways of mishandling kernel objects. Generally when the kernel crashes due to driver issues, the driver has faulted in one of these ways and the kernel fault handler is bugchecking. As I say, in some cases some of these issues may be theoretically recoverable, but it comes with no guarantee of continued system stability and certainly compromised security, so NT opts to bugcheck.
kohlrak wrote:Have you ever written transition code? It's really alot less than you think. Just read a tutorial on getting to protected mode from real mode, or, if you'd like, i'd just give you the code here, as i've actually written some. Long mode is a different story, but i've read it is easier by design, due to how many issues came from the 16bit to 32bit transition. They realized that they didn't want to make the same mistake again.
It's not that actively transitioning mode is hard, it's a) efficiency cost of all the context switching (which, as above, invalidates caches) and data copying you have to do, and b) the increased architectural complexity of the system is a significant maintenance overhead.
kohlrak wrote:KISS makes it easy, actually. Generally, the idea of stack busting is taking advantage of injecting a callback or using a stack overflow. If you assume incoming data could be malicious, it's much, much easier to avoid. If you don't mind loss of efficiency, but are determined to use a stack, you make a simple stack jail. And, yes, there are often instructions specifically to pull this off (intel has a bounds instruction). And is it easy to pull off? Absolutely.
You... do realise that secure coding is a bit more complicated than avoiding stack buffer overflows, right? Heap overflows, memory management issues (use after free and the like), race conditions, integer overflows, authentication problems, error handling mistakes... these are just the most common exploitable bug classes I can think of off the top of my head! They're all easy to screw up accidentally even when you know what you're doing, never mind when you haven't spent years learning about code security in particular. I absolutely guarantee you that any code written by you or anybody else of non-trivial complexity contains bugs.
kohlrak wrote:
The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Wait, so they only allow it when it's allowed? They think the intel code wouldn't hit rock bottom on a page fault? They're confusing me, here, on how they're different from intel.
Have you had this entire discussion with me without actually looking up how Meltdown works? The problem is that while speculatively executing, Intel processors can successfully read memory where the access would generate a page fault due to the page permissions, load the result into caches, and continue executing, only resolving the page fault at the point at which the speculative section would be committed or rolled back. AMD isn't vulnerable because the access check is performed, and will interrupt speculative execution, at the point of read.
kohlrak wrote:Maybe we're thinking of different types of pages, because the ones i use are 2 bits, which allow 4 diferent rings (0-3).
Intel Software Developer Manual 3A part 1, section 4.6.1 wrote:Every access to a linear address is either a supervisor-mode access or a user-mode access. For all instruction fetches and most data accesses, this distinction is determined by the current privilege level (CPL): accesses made while CPL < 3 are supervisor-mode accesses, while accesses made while CPL = 3 are user-mode accesses.

[...]

Access rights are also controlled by the mode of a linear address as specified by the paging-structure entries controlling the translation of the linear address. If the U/S flag (bit 2) is 0 in at least one of the paging-structure entries, the address is a supervisor-mode address. Otherwise, the address is a user-mode address.
You're probably thinking of segments, which did provide a 2 bit CPL field.
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Mon, 8. Jan 18, 01:33

red assassin wrote:
kohlrak wrote:Which ones were moved? Last time i checked for x86, they used the "reserved" instructions and turned them into prefixes, which is a staple of x86's ability to have variable length instructions. The context switching ends up being part of the general task switching, anyway, so it's not as much as you think. IIRC, it's a matter of pointing the segment registers in the right direction, since the format is basically the same. TBH, i don't have experience in this matter, just reading. I've stayed pretty close to 32bit transitions, which, agreed, are a pain. Intel, however, understood this would be a problem (since BIOSes stayed 16bit, many 32bit OSes constantly had to switch back and forth and it was a major pain, but even that switching isn't as bad as you think, and i do have experience with that), so they tried to simplify the mode switching when it came to switching between pmode and long mode. And, frankly, what these compilers do just to call a function is more complex than the switching between 32bit and 16bit code (only marginally, if it's not doing the useless mov operations that i've seen GCC do on 32bit x86 code).
0x40-0x4f (the single byte inc/dec instructions) in x86 are reassigned to the REX prefix in x86_64. Additionally, 0x63 (ARPL) becomes MOVSXD.


Interesting. These differences, though, make 32bit seem even faster, seeing as 32bit code in 32bit mode doesn't have to use the longer instructions, though that's small in the big scheme of things, since they don't use them often in the first place. I do stand corrected. It wasn't to make room, though. It was to not take as deep of a performance hit.
Context switching isn't that expensive in the grand scheme of things, sure, but it's still not free and can take in the order of microseconds, as it necessarily invalidates significant chunks of the cache.
Well, you should avoid transitioning alot. You should treat it like a task switch, especially given that jumping up to the kernel, anyway, is often realized as such.
kohlrak wrote:But your example still says nothing, since the vectorization still isn't enabled. If it is, then GCC has a hard time with 32bit x86 code, still (and i remember it having a hard time). The 64bit instructions are longer. If it's using the 32bit equivalents, as it should, the 64bit code will be marginally slower due to occasionally having to handle pointers, even if they're predictable in lower memory. There's nothing inherent about 32bit that makes it slower than 64bit, while the opposite is true.
-march=native -O3 enables vectorisation with any extensions supported on the current processor in any version of gcc from the last few years. I have checked and AVX vectorised loops are emitted on both 32 and 64 bit versions of my build.
So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.
The additional memory and cache overhead from 64 bit pointers is so small as to be negligible under most circumstances, while 32 bit suffers significantly from having half as many general-purpose registers - it's harder to optimise and requires much heavier usage of the stack, which is necessarily slow. 32 bit also suffers in any cases where maths on 64-bit integers is required.
How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?
kohlrak wrote:Just because it'd take a lot of work to pull off, doesn't mean it's not superficial. It's more like a sandbag bunker as opposed to a steel one. Those are fairly superficial, yet have fun getting through one without an explosive, which is alot of money and work.
I'm genuinely confused what you think the difference is that makes one superficial and the other not, given they're both of comparable difficulty to breach and the mechanism is roughly the same. The kernel provides to userland a selection of endpoints which can be called to do various things; if you find the right sort of bug in the handling of those endpoints you can compromise the kernel. The kernel and various userland processes also provide a selection of endpoints which can be called by other userland processes to do various things in userland, taking into account security boundaries there, and if you find the right sort of bug in the handling of one of those you can escalate privileges in userland. Arguably the attack surface to escape from the Chrome sandbox is smaller than the kernel attack surface from higher-privileged userland; all you can really do from the sandbox is send IPC messages to other Chrome processes (you can't even make most syscalls!).[/quote]

Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.
kohlrak wrote:No, you don't, but saying that just because you can break the window of a store to rob the place as an excuse for putting the key under the front door matt is inexcuseable.
But this analogy doesn't make sense - the store here has the same privileges as you do - it's your store. You can do whatever you like with it. If the government came along and announced you were no longer allowed to enter your store through the front door, you'd a) think it was stupid, and b) enter your damn store via another means. If the government came along and stopped you from using any means of accessing your store it would be a pretty damn useless store. This analogy is getting a little strained, but the point is you're asserting that functionally useless security boundaries should be added where it doesn't make sense to have a boundary, while simultaneously (in the section prior to this one) pretending that the security boundaries that do exist are superficial.
No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.
kohlrak wrote:Yeah, but that doesn't mean all drivers should have access to the crash function. The majority of windows crashes have been GPU crashes, so if MS wants to limit the number of BSoDs, the first thing it should do is take it away from them. But, that comes back to, you have RAM and CPU, with innate drivers, and the HDD driver. The HDD driver should not get corrupted in memory, but should it, it should cause a kernel panic. However, that same driver should be part of the kernel. The rest should not need to panic, as anything else can be restarted (or, at least, the device makers should have the option to cut the power and re-establish power, instead of demanding everything go down).
I'm not sure you quite understand how NT kernel bugchecks work. It's generally not that drivers are calling KeBugCheck() themselves.
So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?
There are a bunch of circumstances under which the kernel can crash - common ones include any access violation, a variety of exciting IRQL issues, pool corruption, and various ways of mishandling kernel objects. Generally when the kernel crashes due to driver issues, the driver has faulted in one of these ways and the kernel fault handler is bugchecking. As I say, in some cases some of these issues may be theoretically recoverable, but it comes with no guarantee of continued system stability and certainly compromised security, so NT opts to bugcheck.
So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?
kohlrak wrote:Have you ever written transition code? It's really alot less than you think. Just read a tutorial on getting to protected mode from real mode, or, if you'd like, i'd just give you the code here, as i've actually written some. Long mode is a different story, but i've read it is easier by design, due to how many issues came from the 16bit to 32bit transition. They realized that they didn't want to make the same mistake again.
It's not that actively transitioning mode is hard, it's a) efficiency cost of all the context switching (which, as above, invalidates caches) and data copying you have to do, and b) the increased architectural complexity of the system is a significant maintenance overhead.
Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.
kohlrak wrote:KISS makes it easy, actually. Generally, the idea of stack busting is taking advantage of injecting a callback or using a stack overflow. If you assume incoming data could be malicious, it's much, much easier to avoid. If you don't mind loss of efficiency, but are determined to use a stack, you make a simple stack jail. And, yes, there are often instructions specifically to pull this off (intel has a bounds instruction). And is it easy to pull off? Absolutely.
You... do realise that secure coding is a bit more complicated than avoiding stack buffer overflows, right? Heap overflows, memory management issues (use after free and the like), race conditions, integer overflows, authentication problems, error handling mistakes... these are just the most common exploitable bug classes I can think of off the top of my head! They're all easy to screw up accidentally even when you know what you're doing, never mind when you haven't spent years learning about code security in particular. I absolutely guarantee you that any code written by you or anybody else of non-trivial complexity contains bugs.
And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.
kohlrak wrote:
The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Wait, so they only allow it when it's allowed? They think the intel code wouldn't hit rock bottom on a page fault? They're confusing me, here, on how they're different from intel.
Have you had this entire discussion with me without actually looking up how Meltdown works? The problem is that while speculatively executing, Intel processors can successfully read memory where the access would generate a page fault due to the page permissions, load the result into caches, and continue executing, only resolving the page fault at the point at which the speculative section would be committed or rolled back. AMD isn't vulnerable because the access check is performed, and will interrupt speculative execution, at the point of read.
wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).
kohlrak wrote:Maybe we're thinking of different types of pages, because the ones i use are 2 bits, which allow 4 diferent rings (0-3).
Intel Software Developer Manual 3A part 1, section 4.6.1 wrote:Every access to a linear address is either a supervisor-mode access or a user-mode access. For all instruction fetches and most data accesses, this distinction is determined by the current privilege level (CPL): accesses made while CPL < 3 are supervisor-mode accesses, while accesses made while CPL = 3 are user-mode accesses.

[...]

Access rights are also controlled by the mode of a linear address as specified by the paging-structure entries controlling the translation of the linear address. If the U/S flag (bit 2) is 0 in at least one of the paging-structure entries, the address is a supervisor-mode address. Otherwise, the address is a user-mode address.
Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.
You're probably thinking of segments, which did provide a 2 bit CPL field.
Yep, you're right.

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Mon, 8. Jan 18, 03:09

kohlrak wrote:So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.

How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?
Stack access is cached, sure, but you still ultimately have to keep the caches in sync with actual memory, so you pay for it sooner or later. The additional registers are used extensively, for reasons that should be obvious; you can keep more locals in registers at a time, copy fewer callee-save registers to the stack, and generally avoid a lot of shuffling of values around that's required on x86. For a specific example, 64 bit calling conventions use four (MS) or six (System V) registers for arguments, so most of the time you don't need to put arguments on the stack at all. It also makes the code smaller, which I know is your favourite optimisation - you generate a lot fewer mov instructions when you don't have to keep shuffling stuff to the stack and back.

If you're still convinced that 32 bit should be faster, I recommend you do some tests yourself, and let me know if you come up with any cases where there's a significant speed difference in favour of 32 bit.
kohlrak wrote:Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.
Right, but userland security is as much a feature of the operating system as the kernel security is - I use browser sandboxes as an example because they're an obvious attack vector and one that's received a lot of security attention in recent years, but the ability to communicate with other processes, make syscalls, and the entire concept of userland accounts and privileges is provided by the operating system APIs and kernel. Browser sandboxes are largely a case of making the right calls to the OS to drop privileges (though browser vendors have done a lot of work with OS vendors to improve what privileges they can restrict and so on).
kohlrak wrote:No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.
I mean, if you want your application not to be controlled by another application then you just... set privileges correctly so it can't be. Hence the whole discussion about browser sandboxes and the like. If there's something you don't want to be able to access your application, your application should be running at a different privilege level. If you can't run it at a different privilege level, you need to question whether your application can really be protected, for all the reasons I listed a few posts ago, regardless of any specific OS features.
kohlrak wrote:So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?

So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?
An access violation is just a page fault, which generates an interrupt that the kernel handles. It inspects page fault interrupts it receives and handles them appropriately - if it's a memory page it's paged out, it pages it back in; if it's an access violation in userland it passes it to the process' exception handlers or kills it; if it's an access violation in kernel it bugchecks; and so forth.
IRQL errors generally manifest either as page faults when you're above the IRQL that handles paging memory in or as priority inversions in the scheduler (the code that detects situations like this is a nightmare as you might imagine).
Memory pools and other kernel objects generally have various self-checks in them that bugcheck if they break.
You get the idea.

How do we know this is what happens - by reading Windows Internals and/or kernel debugging your drivers when they crash, would be my recommended methods. (The latter is more fun - Windows Internals is a bit dry.) Bugchecks do also report details about what error it was and why they happened, though drivers could fake all that if they really wanted to bugcheck themselves.
kohlrak wrote:Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.
Sure, although branch prediction will handle some of this for you, and a simple cache miss is a different situation from a large-scale cache invalidation. But if you're an ISR in kernel you shouldn't need to be doing that anyway - an ISR is expected to complete within a few microseconds (generally just handling the interrupt and setting up for more processing at a lower IRQL if it can't just be handled within that time), so context switching to the interrupt is already a significant chunk of the time it takes to service one. Add a bunch of extra context switches to get this into usermode and back and you've doubled how long every interrupt takes to be serviced.
kohlrak wrote:And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.
Are you claiming that you, alone of all the programmers in the world, are capable of writing completely bug-free code 100% of the time? Or are you claiming that you and some set of other programmers in the world are capable of writing completely bug-free code 100% of the time, but they all choose to hide their perfect output somewhere and leave it to all the inferior programmers to write all the software that everyone actually uses? I mean, a brief glance at the CVE database should convince you that people have found bugs in everything that has enough users to make it to the CVE database. I've seen a lot of code, and none of that was bug-free either. And I've run into enough bugs in my own code to be damn sure there's others I haven't found.

But hey, maybe all my professional security experience is wrong and we should just get you to write all the world's code.

(Although maybe not, as your "solution" to race conditions is spoken like somebody who has never had to debug a nontrivial race condition...)
kohlrak wrote:wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).
It doesn't let the application keep the data - you're entirely correct about how register aliasing works, but the point is that the data loaded into the aliased registers necessarily affects the caches, so when it unaliases them and bins all the data the cache state remains changed. At that point you can just leak the information with a cache timing attack.
Also, it doesn't do this upon page faulting - you can't page fault until you're sure you're actually executing the right bit of code. While in speculative execution, page faults are simply queued for handling upon resolution of whatever's put it in speculative mode, and in the meantime execution continues. If you don't end up going down that branch, the fault is just binned with the rest of the state, while if you do the fault is raised and state (minus, as mentioned, the cache) is rolled back to the instruction that faulted.
kohlrak wrote:Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.
Largely, there isn't one any more - like a lot of x86, they're historical cruft that's retained only for backwards compatibility. When the architecture was designed they seemed like a good idea; for a variety of reasons OSes that used them fell by the wayside and the remaining OSes settled on a two-ring model.
Ring 1 enjoyed a resurgence as a mechanism for implementing virtualisation for a while (the virtualised kernel is moved into ring 1), as an alternative to paravirtualisation, but that's fallen by the wayside too with the advent of proper hardware virtualisation features in x86. I suspect that's why it wasn't removed from long mode like segmentation was, though.
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

kohlrak
Posts: 136
Joined: Thu, 28. Dec 17, 11:47

Post by kohlrak » Mon, 8. Jan 18, 06:55

red assassin wrote:
kohlrak wrote:So then what was the bottleneck? If the instructions are smaller and the code stayed in your countrol, there's no reason why it shouldn't gotten slower.

How much do those extra registers even ever get touched, outside of avoiding the tack usage, which should be cached?
Stack access is cached, sure, but you still ultimately have to keep the caches in sync with actual memory, so you pay for it sooner or later. The additional registers are used extensively, for reasons that should be obvious; you can keep more locals in registers at a time, copy fewer callee-save registers to the stack, and generally avoid a lot of shuffling of values around that's required on x86. For a specific example, 64 bit calling conventions use four (MS) or six (System V) registers for arguments, so most of the time you don't need to put arguments on the stack at all. It also makes the code smaller, which I know is your favourite optimisation - you generate a lot fewer mov instructions when you don't have to keep shuffling stuff to the stack and back.
How often do compilers actually use them, though? In theory, i totally agree, but in practice they get clobbered so they have to store everything on the stack anyway. And, because of this, i've seen GCC take variables off the stack and create the same thing on the local stack frame, even if they were read-only and never written to. I always thought the point of using HLLs was to let the compiler optimize out this behavior for you, but it seemed to fail in that regard. I assume, by now, they have fixed this. At least I hope they have. That said, that still doesn't mean they're using the extra registers, because if you make one call within that function, you have to assume those registers got clobbered, so it never gets to a point that they're actually needed.
If you're still convinced that 32 bit should be faster, I recommend you do some tests yourself, and let me know if you come up with any cases where there's a significant speed difference in favour of 32 bit.
I wish i could, but i'm not in that position right now (which is why i'm playing X2, and asked you to do a few checks for me).
kohlrak wrote:Fixing those endpoints affects a much, much larger code base, while code in a browser to do the same thing is a bit small potatoes. I'm not saying it's not important, but the focus should be on the pareto effect.
Right, but userland security is as much a feature of the operating system as the kernel security is - I use browser sandboxes as an example because they're an obvious attack vector and one that's received a lot of security attention in recent years, but the ability to communicate with other processes, make syscalls, and the entire concept of userland accounts and privileges is provided by the operating system APIs and kernel. Browser sandboxes are largely a case of making the right calls to the OS to drop privileges (though browser vendors have done a lot of work with OS vendors to improve what privileges they can restrict and so on).
They are useful and should be done, but they should not be relied upon.
kohlrak wrote:No, the fundemental difference is that i'm looking at the application's point of view, not the user's. Each application is it's own store. IP register being a customer. Applications should talk to each other, not control each other.
I mean, if you want your application not to be controlled by another application then you just... set privileges correctly so it can't be. Hence the whole discussion about browser sandboxes and the like. If there's something you don't want to be able to access your application, your application should be running at a different privilege level. If you can't run it at a different privilege level, you need to question whether your application can really be protected, for all the reasons I listed a few posts ago, regardless of any specific OS features.
That's the problem, though: the reasons you mentioned above are OS features. Just because they're shared by most OSes at this point doesn't mean they should be. I do hold current practices to fault.
kohlrak wrote:So, the bug checks happen, the bug checks are available to the drivers, but the bug checks are not called by the drivers themselves?

So then how does windows know the driver is misbehaving? Would this not require self-reporting by the driver? And how does the driver report this to the kernel? And how do we know this method is in practice instead of KeBugCheck?
An access violation is just a page fault, which generates an interrupt that the kernel handles. It inspects page fault interrupts it receives and handles them appropriately - if it's a memory page it's paged out, it pages it back in; if it's an access violation in userland it passes it to the process' exception handlers or kills it; if it's an access violation in kernel it bugchecks; and so forth.
In that case, should not the OS restart the drivers/shutoff the hardware in question and reload the drivers? Why are GPUs crashing the OS? It's a known issue, so it is happening, it is entirely preventable, as far as I can tell. So if the drivers aren't calling the kebugcheck function, despite it actually being available to them, then why is it being called when it's preventable? The only driver you can't afford to let crash is the HDD driver, and even that can be prevented with careful planning of a copy in the kernel, since HDD drivers are low footprint.
IRQL errors generally manifest either as page faults when you're above the IRQL that handles paging memory in or as priority inversions in the scheduler (the code that detects situations like this is a nightmare as you might imagine).
Memory pools and other kernel objects generally have various self-checks in them that bugcheck if they break.
You get the idea.
So why are things other than the kernel crashing if they can all be restart without shutting down the kernel? If the kernel knows it's not it's own code that is making things go boom boom (hence the errors we are seeing in windows when we get a BSoD saying it's not the kernel), then why is it going down?
How do we know this is what happens - by reading Windows Internals and/or kernel debugging your drivers when they crash, would be my recommended methods. (The latter is more fun - Windows Internals is a bit dry.) Bugchecks do also report details about what error it was and why they happened, though drivers could fake all that if they really wanted to bugcheck themselves.
That's the big problem. There's no reason the drivers should be the ones bug checking. Microsoft, the company who single-handedly broke the javascript standards and got away with it, is unable to establish a proper standard for what drivers should be able to assume they can and cannot do? We're talking about a company that ex parte (that is, without the other company being there) sues companies, and they can't remove a simple "this driver is not windows certified" tag which is in their domain without dishonor? I find that very hard to believe, especially when windows has it's own register calling convention for 64bit that is different from everyone else's. They're a bit powerful.
kohlrak wrote:Thanks to the size of the code, you hit cache missses any time you make a large jump, regardless. You should assume, when you make syscalls or library calls, that you're makng a big enough jump to cache miss, regardlesss of what library you're using, unless it happens to be statically linked.
Sure, although branch prediction will handle some of this for you, and a simple cache miss is a different situation from a large-scale cache invalidation. But if you're an ISR in kernel you shouldn't need to be doing that anyway - an ISR is expected to complete within a few microseconds (generally just handling the interrupt and setting up for more processing at a lower IRQL if it can't just be handled within that time), so context switching to the interrupt is already a significant chunk of the time it takes to service one. Add a bunch of extra context switches to get this into usermode and back and you've doubled how long every interrupt takes to be serviced.
One shouldn't make a speed assumption about an ISR. IRQ handlers, yes, but not syscall or something. IRQs not in huge frequency to begin with. However, for this reason, and this is reasonable to pull off today, we see alot of ARM systems dedicating 1 core to the kernel. The other cores all have their own caches. While some would say that's wasteful, it's more redundancy than bloat. There was some talk before (and i still don't know why it was scrapped, or if they just never got around to it) where systems could have their dedicated cores dynamically allocated depending on what's eating up the time. I'm guessing the problem they had was figuring out how to actually determine if, when, and how these switches would take place. And, no, this wasn't being done with 32bit vs 64bit in mind, but, rather, GPU vs CPU modes. However, it should be little complicated to expand it.
kohlrak wrote:And these are all basic issues that you should be aware of and should have no problems preventing. Either I'm gifted, or you underestimate the simplicity of these issues. Heap overflows are the result of bad memory management. Use of declaring and freeing memory should be made into a class or something in an automatic way, and you shouldn't be doing it willy-nilly constantly throughout the code without some sort of wrapper. Race conditions you can almost always solve by checking your return values. Integer overflows should not be much of a problem, and are likely application specific, and often revolve around doing a simple bounds check. "Authentication problems" is vague, and sounds like puffery. Error handling mistakes are the results of really, really bad habits (like not checking return values). I have them, too, and have been working at it, since i'm starting to release more of my own code into the wild. I have no excuses other than being lazy.
Are you claiming that you, alone of all the programmers in the world, are capable of writing completely bug-free code 100% of the time? Or are you claiming that you and some set of other programmers in the world are capable of writing completely bug-free code 100% of the time, but they all choose to hide their perfect output somewhere and leave it to all the inferior programmers to write all the software that everyone actually uses?
No, i'm saying that accountability has been going out the window. Programmers make mistakes, but so does everyone else. You need to accept mistakes are going to be made, and accept fault when it's your own, instead of pointing the finger and saying "It's his fault!" You improve by learning, and you learn by making mistakes. You want to have a decently secure wrapper for anything that's particularly vulnerable. User input, and especially internet input, you want to treat as volatile material that needs 10 bottles of rubbing alcohole poured on top and a match dropped on top. If you sanitize properly, making things secure should not be difficult. The problem is accountability, and, to be fair to devs, this often comes as a result of managers pushing some sort of deadline. To be fair, i understand deadlines, too, but we're seeing too many silly mistake on the market for it to be a matter of needing protection. The extra protection will fall under some sort of twist of Parkinson's law, which seems to be where the mentality comes from to keep pushing, only to be the cause of their own problems an inefficiencies. The more you nanny the coder with your security layers, the less they're going to worry about security when they have a manager breathing down their neck telling them it's more important to get the project out on time and to trust the security of the library, since it's advertising it's security. Why? Because the people making these decisions don't code. If people were less apathetic about security, the market would focus more on it, and code would be more secure. But, right now, the market of software is focused on saturation, so get your games, suits, and whatever out as fast as possible, and make it "resonably stable," "reasonably secure," etc. "We can work on bugs later."
I mean, a brief glance at the CVE database should convince you that people have found bugs in everything that has enough users to make it to the CVE database. I've seen a lot of code, and none of that was bug-free either. And I've run into enough bugs in my own code to be damn sure there's others I haven't found.
Oh, i make mistakes and make bugs, but if I screw up I own it. It's a learning experience and a chance to improve how i code when i see my mistakes come out like that.
But hey, maybe all my professional security experience is wrong and we should just get you to write all the world's code.
Or maybe you can accept that insecurity is going to be a thing, forever, and that a better understanding of it instead of trying to make it a magic black box is going to go far in preventing it, especially on newer systems. Programmers who write libraries also write programs and vice versa. Those mistakes are going to happen, one way or another. We have to accept that. Restricting capabilities upstream to nanny downstream is not going to solve the problem, but instead patch it, potentially making the issue worse down the road. Security is everybody's responsibility, but functionality should not be sacrificed so that someone else doesn't have to feel the need to consider security their responsibility.
(Although maybe not, as your "solution" to race conditions is spoken like somebody who has never had to debug a nontrivial race condition...)
If you're following KISS, it should be trivial, unless you're aware of a situation that i'm not. Tell me where i'm wrong, that checking return values and/or using mutexes doesn't work.
kohlrak wrote:wait, upon page faulting it actually lets the application keep this data from a page fault instead of unaliasing those registers? The expected reaction has nothing to do with the cache. It should just clear that data. Generally, the way this works is, the actual internal registers and such are not the ones in assembly, and when this commits, the registers are realiased to the ones reference in assembly, while the fails basically get ignored (write ops to non-registers don't make it to their respective buses).
It doesn't let the application keep the data - you're entirely correct about how register aliasing works, but the point is that the data loaded into the aliased registers necessarily affects the caches, so when it unaliases them and bins all the data the cache state remains changed. At that point you can just leak the information with a cache timing attack.
Fortunately, side-channel attacks aren't all that effective for most targets. It's mostly on certain types of cryptography and password checks, which are easier to fix. If your operations take equal time to execute regardless of the input data, the attack method becomes ineffective.
Also, it doesn't do this upon page faulting - you can't page fault until you're sure you're actually executing the right bit of code. While in speculative execution, page faults are simply queued for handling upon resolution of whatever's put it in speculative mode, and in the meantime execution continues. If you don't end up going down that branch, the fault is just binned with the rest of the state, while if you do the fault is raised and state (minus, as mentioned, the cache) is rolled back to the instruction that faulted.
Right, but these issues aren't enough on their own to grab the sensitive data. You still need a secondary attack vector. Thus, by this point, this already looks impractical for anything other than a targeted attack.
kohlrak wrote:Then what's the point of CPL 1 and CPL 2? I feel like we're missing a bit of extra text here, otherwise it should've gone to 2 rings only.
Largely, there isn't one any more - like a lot of x86, they're historical cruft that's retained only for backwards compatibility. When the architecture was designed they seemed like a good idea; for a variety of reasons OSes that used them fell by the wayside and the remaining OSes settled on a two-ring model.
Ring 1 enjoyed a resurgence as a mechanism for implementing virtualisation for a while (the virtualised kernel is moved into ring 1), as an alternative to paravirtualisation, but that's fallen by the wayside too with the advent of proper hardware virtualisation features in x86. I suspect that's why it wasn't removed from long mode like segmentation was, though.
X86 is a mixed bag of history. Backwards compatibility is nice, but with it comes with all the problems. I remember the A20 gate issue like it was yesterday. What's funny is, next time it'll be A40. By then, i hope intel has learned it's lesson. Honestly, x86 could benefit from a reboot. ARM's starting to cream the x86 in certain markets, and not just from Jazelle.

User avatar
red assassin
Posts: 4613
Joined: Sun, 15. Feb 04, 15:11
x3

Post by red assassin » Mon, 8. Jan 18, 11:41

Okay, look. In this post alone you have claimed that:
  • Compilers probably don't use additional x64 registers, because they get clobbered sometimes, despite the trivially observable fact that they do and the result typically runs a little faster than comparable x86
  • That userland security is different from kernel security because userland security should not be relied upon, implying that kernel security is somehow perfect?
  • That "if I have access to the same data, I can do the same calculations and get the same result" is an OS security failing and not, say, a feature of reality
  • That I should explain again why NT chooses to bugcheck on any failure in kernel, presumably because you didn't like the reason the first few times
  • That [various conspiracy theories about Microsoft] therefore [something]
  • That syscalls are interrupts (we had that very discussion earlier in this thread)
  • That when you make a mistake, it's a learning experience, but that when anyone else makes a mistake it's just sloppy practises and failure of accountability (trust me, other people learn from their mistakes too, but a bug in released code is a bug in released code no matter how much you learnt from it)
  • That race conditions are trivially preventable, despite synchronisation being famously one of the hardest problems (guess what - sticking mutexes on everything defeats the point of threading in the first place, not to mention the risk of deadlocking)
  • That literally all the processor, OS, browser, etc developers and security researchers who've been doing all this work on Meltdown are wrong and that it's not usefully exploitable because side-channels are hard
There's a recurring theme in this entire thread, which is that you seem to believe that having written a toy OS one time (which is cool, don't get me wrong, but not unique) makes you an expert on modern hardware, OS design and security issues. When I point out that the real world is a little bit more complicated than this, that things change fast, and that yes, OS developers and security researchers do tend to know what they're talking about and don't just do things the way they do because they're terrible people, you tell me they're all wrong and you, personally, are right. However, you're admittedly too lazy to do any actual research, or even to read the things I give you, and you don't believe that my years of professional experience have taught me anything either, so when reality rears its ugly head you tell me it's wrong too.

I dunno man, I'm sure you're a talented programmer, but in the end you're asserting that everybody else is wrong and you're so right you don't even need to check. As you yourself wisely stated in the other thread, at some point if people don't want to learn there's no point trying any more. It's been fun, but I'm done. Why not put all that talent to use and build a properly secure OS or something?
A still more glorious dawn awaits, not a sunrise, but a galaxy rise, a morning filled with 400 billion suns - the rising of the Milky Way

Nanook
Moderator (English)
Moderator (English)
Posts: 27829
Joined: Thu, 15. May 03, 20:57
x4

Post by Nanook » Tue, 9. Jan 18, 00:00

Let's refrain from making this personal, or this thread is in danger of the dreaded 'CLICK'. OK?
Have a great idea for the current or a future game? You can post it in the [L3+] Ideas forum.

X4 is a journey, not a destination. Have fun on your travels.

RegisterMe
Posts: 8903
Joined: Sun, 14. Oct 07, 17:47
x4

Post by RegisterMe » Wed, 10. Jan 18, 10:51

I can't breathe.

- George Floyd, 25th May 2020

Post Reply

Return to “Off Topic English”