The hardware/ISA ecosystem is equally split, particularly amoung ARM, i386/x86, x86_64, Microchip/Atmel and now Risc-V.
Fragmentation also exists in the computer graphics industry, which in many ways has outpaced CPU growth in recent times.
Hardware Acceleration Layer/API design is primary split between Khronos and Microsoft, but now apple and Google are joining in; Even in the case of the two leading providers, backwards compatibility has proven to be a difficult problem, leading to increased segmentation and overhead.

I only see these problems getting worse in the future with the current state of traditional Moore's Law, increasing use of accelerators and FPGAs, IoT proliferation, diverse memory types and caching mechanisms, hyperscaling, etc...

In the past, segmentation has been forsean and in the case of character set localization, avoided. This is why the Unicode Consortium was Established, independent of any single company/organization. The Unicode Consortium has managed to standardize over 100,000 character encodings to date and this is virtually universal.

Theoretically I see no reason why this cannot be accomplished in the bitcode/turing realm.

@Gilles Unicode acts as a super-set of its predecessor, ASCII; it conforms to UTF-8, UTF-16 and UTF-32 standards. Modern microprocessors are no strangers to expanding type codes in their instruction decoding pipelines. Any machine can emulate larger types than its baseline and architecture level support for this is not uncommon.

$\begingroup$While there's definitely a lot of additional complexity, Unicode is basically just a table of characters and arbitrary natural numbers assigned to them. There is very little semantics to Unicode code points. The main value of Unicode is just to get us to all agree on the same arbitrary mapping. ISAs/etc. have far more semantics and constraints and are therefore much less arbitrary.$\endgroup$
– Derek ElkinsJul 28 at 2:15

2

$\begingroup$I don't think this is really anything to do with computer science. If anything, it's an engineering thing. Also, most of your tags were completely irrelevant, so I've removed them.$\endgroup$
– David RicherbyJul 28 at 20:34

$\begingroup$@DavidRicherby I was just considering this. Is Software Engineering more appropriate?$\endgroup$
– HoldenJul 28 at 20:52

$\begingroup$@Holden Possibly. I'm not too familiar with what's on-topic on that site. Their help centre should give some guidance.$\endgroup$
– David RicherbyJul 28 at 21:19

1

$\begingroup$You are vastly over-exaggerating the positive aspects of Unicode, I believe. 1) Unicode is far from universal. There are lots of languages, scripts, glyphs, and characters missing. 2) Unicode is far from unified. 3) Unicode is insanely complex. Complexity is the one thing you don't want in an ISA.$\endgroup$
– Jörg W MittagAug 2 at 23:18

2 Answers
2

Unicode is a suitable encoding for most scripts because it includes characters from all the scripts. The analog would be an instruction set architecture that includes instructions from all processors. This would be huge and impractical to implement whether in hardware or in software. It would even be contradictory: how do you reconcile an architecture with 16-bit words with an architecture with 64-bit words? an architecture that allows unaligned word accesses with one that doesn't? an architecture that guarantees the presence of an MMU with one that guarantees that addresses used by software map directly to addresses in physical memory? an architecture that guarantees that all memory accesses take exactly the same amount of time with one that has memory caches?

A realistic ISA can't combine what all the ISA do. You have to make choices, and these choices depend on your objectives. If you're building a processor for real-time processing, a guarantee that memory accesses take the same amount of time is crucial. If you're building a processor for fast numerical or symbolic computation, caching memory accesses is essential for performance. A good instruction set for general-purpose computing is not the same as for graphical rendering, and machine learning calls for yet another approach. There is no one-size-fits-all.

Why isn't there a standard model of vehicle? Why are there so many different models of cars and airplanes and bicycles and motorcycles and helicopters and submarines and so on?

$\begingroup$Maybe using the term ISA so broadly isn't appropriate? I understand the constraints of hardware design well enough... I do however believe there is a better, more unified approach to program representation and execution; if anything, LLVM and Risc-V are great steps in this direction. LLVM proves that a level of interoperability can exist; have you seen this? github.com/KhronosGroup/SPIRV-LLVM$\endgroup$
– HoldenJul 28 at 0:03

$\begingroup$Risc-V is beginning to prove that most architectures aren't that different, although capitalism may disagree.$\endgroup$
– HoldenJul 28 at 0:04

$\begingroup$Unicode conforms to standards including: UTF-8, UTF-16, UTF-32. Also, I can tell you, from experience, that modern microprocessors are no strangers to expanding type codes when it comes to instruction decoding...$\endgroup$
– HoldenJul 28 at 0:14

There is a very strong demand to be able to exchange textual information. EVERYBODY wants to be able to exchange textual information. Billions of people.

How many people want to use different code representations? It's only possibly compiler writers, and nobody else. But compiler writers are mostly happen with the one bytecode representation they are using: Java bytecode if you write a Java compiler, LLVM IR if you use Clang and so on.

And with Unicode, the effort is just writing a huge document and maintaining some tables. With a bytecode representation, you can't just standardise on one, you then have to go through the huge effort to turn it into actual code running on some machine.

Summary: No standard because the benefit is much less, and the cost much higher, than with standardising unicode.