I wish that processor architecture was not so committed to a obsolete model of software. For example, it has become clear over the last few years that the sloppy shared memory thread model of programming has multiple drawbacks. Sharing of data, in particular, should be tightly controlled. Yet processor architectures are “optimized” for unstructured sharing of data between threads with enormously complex “snooping” caches running at high speed to inspect all transactions on what have become essentially cache buses, and initiating complex transactions when conflicts are detected. I put “optimized” in quotes because the performance is terrible for the reason that violations of locality cannot easily be cured. That is, caches are designed to permit totally random sharing of memory without software action. Performance gets hammered by cache locality anyways. Essentially what we have is an expensive (both in transistors and power) fix for poorly designed software that really does not get much advantage out of it, but gets protected from the logical (but not temporal) consequences of sloppy data sharing by the hardware. In point of fact, most threads do not share much memory and the ones that do could be rewritten to explicitly pass control of memory to each other.
Sadly, we are in a situation where processor architectures are constrained to run obsolete software, and then software is constrained to try to take advantage of obsolete hardware. All of this is especially good only for electric power utilities and manufacturers of air conditioning equipment.
A second, related, area of obsolete designs is in memory mapping and paging. This is treated as a problem that has been solved for all time by a hierarchical paging model that was finalized in the 1980s. Is the paging model designed for machines with 10 meg of memory and a 100megabyte swap partition on a slow disk a good model for machines with 1G or 10G of memory and couple of hundred spare gigabytes on disk, or maybe 40 spare on solid state drives? Well, if you give it a little thought, you can see a lot of potential problems. In the good old days, paging in 512 byte blocks or 4k byte blocks from a disk drive organized around such block, to and from a highly constrained memory where you really wanted to be able to avoid wasting even a couple of kilobytes required one design. But what happens when the entire concept of disk blocks disappears on the drive and a system with 10M of unused memory is almost certainly stalled? What kinds of memory use patterns are good for LAMP type systems where jobs do not appear and disappear ? You can look at processor design/operating systems sometimes and think that, just maybe, 1970s computer science departments are not the ultimate final inspiration for all architecture.
See also a synchronous processor