Android Emulator gets Quick Boot

Today, we are excited to announce Quick Boot for the Android Emulator. With Quick Boot, you can launch the Android Emulator in under 6 seconds. Quick Boot works by snapshotting an emulator session so you can reload in seconds. Quick Boot was first released with Android Studio 3.0 in the canary update channel and we are excited to release the feature as a stable update today.

It’s 1000000 times more useful. Modern PC will process more data in 1 second than C64 during its whole lifetime. Let’s face it – old computets sucked, same as current state of art PCs will suck 20 years later. It’s nothing more than bunch of plastic and wires.

It’s 1000000 times more useful. Modern PC will process more data in 1 second than C64 during its whole lifetime. Let’s face it – old computets sucked, same as current state of art PCs will suck 20 years later. It’s nothing more than bunch of plastic and wires.

This actually adds credence to cybergorf’s point. Given the fact that hardware has gotten so much better, one would expect modern software to perform many times better than it does. When it comes to clock time, real and significant hardware gains have largely been offset by software inefficiencies.

Sure we’re tempted to say android is 1000 times more complex, but in all seriousness the entire kernel should load in the blink of an eye given how fast flash storage is, and by past standards it’d be inexcusable for an app loader to take so long to load itself on such fast hardware. The truth of the matter is that the software industry has left optimization on the back burner arguing that hardware improvements make software optimization irrelevant. This is a very common justification in the field of software development, and if that’s the industry’s consensus, then so be it. But we shouldn’t lie to ourselves and pretend that modern inefficiencies are intrinsically do to additional complexity, no we must recognize the fact that the art of software optimization has gotten lost along the way.

A. Kernel boot time has nothing to do with storage performance and has everything to do with the complexity of you call a “PC”. While the C64 had very limited set of devices that were initialized from ROM, a modern kernel needs to support 1000 upon 1000 of CPU types, chipsets, devices, etc. Most of them with unbelievably complex initialization sequence. If you don’t believe me look at the driver source code of modern GPUs or 25/40/100 GbE network devices.

B. The amount of optimization that goes into the kernel these days is million miles ahead of type-a-couple-of-1000s-of-ASM-LOC and shove them into a ROM that was used to design the C64.

A. Kernel boot time has nothing to do with storage performance and has everything to do with the complexity of you call a “PC”. While the C64 had very limited set of devices that were initialized from ROM, a modern kernel needs to support 1000 upon 1000 of CPU types, chipsets, devices, etc. Most of them with unbelievably complex initialization sequence. If you don’t believe me look at the driver source code of modern GPUs or 25/40/100 GbE network devices. [/q]

In all honestly most of the complexity is bloat. Even linus torvalds has acknwoledged linux has a bloat problem. Android is tuned for very specific hardware. If the kernel on your android phone does “support 1000 upon 1000 of CPU types, chipsets, devices, etc.”, then whoever built it did a pretty bad job in trimming it down to only the hardware present on the device. Know what I mean? The build it supposed to be optimized for that specific hardware.

B. The amount of optimization that goes into the kernel these days is million miles ahead of type-a-couple-of-1000s-of-ASM-LOC and shove them into a ROM that was used to design the C64.

In all honestly most of the complexity is bloat. Even linus torvalds has acknwoledged linux has a bloat problem. Android is tuned for very specific hardware. If the kernel on your android phone does “support 1000 upon 1000 of CPU types, chipsets, devices, etc.”, then whoever built it did a pretty bad job in trimming it down to only the hardware present on the device. Know what I mean? The build it supposed to be optimized for that specific hardware. [/q]

The OP in this sub thread was comparing C64 to a PC. He wasn’t talking about Android, nor was I.

Arguably not true by performance per clock…

Actually, The amount of rows/sec PostgreSQL -on- Linux can store or index is billion years a head of any database that existed in the C64 or even the early PC days. Even if you generate odd matrices such as rows/sec per Mhz or rows/second per disk RPM, the performance difference is simply staggering.

Same goes for networking (packets/sec per Mhz) and graphics (triangles/sec per Mhz).

Sure, but it doesn’t explain why the overhead is so much greater.

Sure it does.

Back in the C64 or even in the XT days, a graphics card was nothing more than a dual ported memory chip.

Today a graphics card is a hugh-SMP CPU that’s expected to push billions of vertices and handle complex requests simultaneously. How can you possibly expect to continue program such a beast by calling ‘int 10h’?

Networking is no different. How can you possibly compare a C64 modem that was barely cable of pushing 1200bps via the simplest of interfaces (serial port) to a multi-PCI-E 100GbE network device that includes smart buffering, packet filtering and load-balancing?

[q]IMHO it’s true of most code.

Being a system developer I can’t say that I care much for user facing code

The OP in this sub thread was comparing C64 to a PC. He wasn’t talking about Android, nor was I. [/q]

The actual OP was likely speaking generically, but never the less we can talk in terms of PCs if you like. Do you have any reason to believe there’s less bloat on PCs?

[q]Sure it does.

Back in the C64 or even in the XT days, a graphics card was nothing more than a dual ported memory chip.

Today a graphics card is a hugh-SMP CPU that’s expected to push billions of vertices and handle complex requests simultaneously. How can you possibly expect to continue program such a beast by calling ‘int 10h’?

Networking is no different. How can you possibly compare a C64 modem that was barely cable of pushing 1200bps via the simplest of interfaces (serial port) to a multi-PCI-E 100GbE network device that includes smart buffering, packet filtering and load-balancing?

The memory mapped devices are significantly faster than the legacy PIO ones, and on top of this the bus speeds have increased dramatically. Hardware initialization time is so fast that a stopwatch would be too slow to measure it. Most time is a result of software deficiencies. While complexity can contribute to software deficiencies, it’s not the inherent cause of slowdowns on modern hardware that you are making it out to be.

One problem is that network drivers, graphics drivers, audio drivers, printer drivers, usb drivers, etc come in packages of 10-100+MB, which is quite unnecessary and can end up causing delays and consuming system resources unnecessarily. At least modern SSDs are so fast that they help mask the worst IO bottlenecks caused by bloat, but alot of it is still happening under the hood.

I appreciate that fast hardware is considered much cheaper than optimizing code. However there’s little question that legacy programmers were writing more optimal software, that’s really the gist of what we’re saying. It was really out of necessity since on old hardware they couldn’t really afford to be wasteful like we are today.

The actual OP was likely speaking generically, but never the less we can talk in terms of PCs if you like. Do you have any reason to believe there’s less bloat on PCs? [/q]

Bloat is a general term that doesn’t mean much.

More-ever, I was commenting about bloat within the kernel, so I suggest we talk about specific cases of *bloat* within the kernel.

The memory mapped devices are significantly faster than the legacy PIO ones, and on top of this the bus speeds have increased dramatically. Hardware initialization time is so fast that a stopwatch would be too slow to measure it. Most time is a result of software deficiencies. While complexity can contribute to software deficiencies, it’s not the inherent cause of slowdowns on modern hardware that you are making it out to be.

Yeah. But you expect far more from your hardware than you did 30 years ago.

The Linux kernel boots within 2-5 seconds, within this window it needs to configure dozens of devices, load complex firmwares, program theses devices (in the case of GPUs, network devices and RAID controllers), setup virtualization (including virtualized devices) and start the user mode.

The amount of works being executed during these 5 seconds, is billions of time more complex that io.sys and msdos.sys did during the MSDOS days.

One problem is that network drivers, graphics drivers, audio drivers, printer drivers, usb drivers, etc come in packages of 10-100+MB, which is quite unnecessary and can end up causing delays and consuming system resources unnecessarily.

A. This is an issue only on Windows machines. Linux drivers are quite small and never bundle billion useless applications.

B. Most of the bloat comes from user-facing applications (again) that has nothing to do with the actual driver.

C. Having large number of garbage application doesn’t necessary have any effect on the actual performance. E.g. nVidia’s fat management application doesn’t necessary drops one FPS from any game it manages (quite the opposite).

At least modern SSDs are so fast that they help mask the worst IO bottlenecks caused by bloat, but alot of it is still happening under the hood.

Sure it does.

I doubt that you’ll be willing to live with C64 functionality these days.

Anecdotal evidence: I’m being considered a neanderthal as I can actually work a full day out of text console + VIM.

[q]I appreciate that fast hardware is considered much cheaper than optimizing code. However there’s little question that legacy programmers were writing more optimal software, that’s really the gist of what we’re saying. It was really out of necessity since on old hardware they couldn’t really afford to be wasteful like we are today.

Look, being a kernel / system developer, I still write asm code from time to time. But I should be honest, that even if I really, really try, I seldom get any meaningful performance increase compared to cleanly written C code + GCC optimization. (And writing cross platform asm is a real b**ch).

Even if you talking about UI: Its very easy to write highly optimized code when you’re dealing with simple requirements. Complexity (what you consider bloat) usually trails requirements.

The Linux kernel boots within 2-5 seconds, within this window it needs to configure dozens of devices, load complex firmwares, program theses devices (in the case of GPUs, network devices and RAID controllers), setup virtualization (including virtualized devices) and start the user mode.

The amount of works being executed during these 5 seconds, is billions of time more complex that io.sys and msdos.sys did during the MSDOS days. [/q]

Obviously DOS didn’t even have drivers for everything, hardware was frequently accessed directly from application software. DOS+Windows (WFW) would be a better comparison to a modern OS.

Sure we can allow for some additional complexity but it certainly isn’t on the order of billions of times more complex. Slow disk times was always the big bottleneck, but sending instructions to initialize hardware isn’t, be it a network card, graphics card, sound card, mouse, etc. These can all be up and usable practically instantaneously if the software is ready to use them, but it usually is not.

A. This is an issue only on Windows machines. Linux drivers are quite small and never bundle billion useless applications.

I’d agree that linux drivers are better in this respect, although there’s room for improvement. Most users running linux use default kernels that were not optimized for their system and carry tons of unused baggage.

B. Most of the bloat comes from user-facing applications (again) that has nothing to do with the actual driver.

C. Having large number of garbage application doesn’t necessary have any effect on the actual performance. E.g. nVidia’s fat management application doesn’t necessary drops one FPS from any game it manages (quite the opposite).

Yes, certainly user-facing applications are responsible for loading a lot more media. That’s improved considerably with SSD, although sometimes the amount of waste is just ridiculous (like my tplink icon tray software with 10MB resident footprint to run a simple GUI interface, or the 20MB realtek applet to control my sound card). It’s not always a developer’s fault either, modern compilers can output extremely bloated binaries. Obviously these can fit very comfortably in the amount of ram we have these days, however it would have never shipped this way in the past.

Sure it does.

I doubt that you’ll be willing to live with C64 functionality these days.

Anecdotal evidence: I’m being considered a neanderthal as I can actually work a full day out of text console + VIM.

My own linux distro doesn’t even have a GUI (or VIM for that matter, only the busybox light version of VI)

[q]Look, being a kernel / system developer, I still write asm code from time to time. But I should be honest, that even if I really, really try, I seldom get any meaningful performance increase compared to cleanly written C code + GCC optimization. (And writing cross platform asm is a real b**ch).

Even if you talking about UI: Its very easy to write highly optimized code when you’re dealing with simple requirements. Complexity (what you consider bloat) usually trails requirements.

You are saying that you rarely see any performance increase compared to C. That might be true by looking at x86-code. It might be true when optimizing a single routine. The worst bloat arises when the compiler puts those routines together, however. When programming in ASM and writing routines one rarely ever touches the stack (except for the return). Usually you can manage to hold things in registers. In compiled code, almost every function call comes with a saving to the stack, setting up the stack, doing some short stuff and rewinding the whole thing again. That is a p.i.t.a.! Moreover, the differences between a routine (and the functions it’s calling) residing completely in L1 can be enormous compared to something ‘slightly’ larger.

You are saying that the kernel has to initialize a lot of hardware. That is true. This should run in parallel whenever possible and it probably does. You should notice yourself that this works everything but perfect, nonetheless.

The kernel itself is a good example on how you can fuck up things by taking the wrong path. In the Unix world ‘everything is a file’. The kernel offers an interface where you can access all the hardware things and settings as files. This is the MOST INEFFICIENT way of handling things and is done that way for historical reasons. ‘Everything is memory’ being the right approach, of course… I still see many applications where files are being read in line by line. This only works, because CPU speeds are beyond the wazoo. Otherwise, this is dead inefficient. You’d be surprised to see how many layers you are ‘trespassing’ when tracing such a call. It’s worse than your average onion. Needless to say that replacing those by memory mapping would be the best approach, while reading in the whole thing and then parsing it afterwards the second best (with lots of distance in-between).

Working with embedded devices I find myself surprised many times when I get to see ‘how fast’ 84 MHz actually can be. I have moderately complicated routines, which can do several ten thousand rounds in a second on such a little thing. It is shocking to find yourself back on your PC with an application taking more than a second for something, which shouldn’t take even one hundredth of that second on today’s hardware. Those seconds add up to make matters worse.

So no, we are far off from anything even close to efficient these days.

As an adolescent, I used to crack software (protections) for fun. I started off on the Commodore Amiga with 68k assembler. In order to crack, one had to disassemble (parts of) the software and put in a fix. Let’s just say that the code the compilers created was far from optimal. Even if there was an entire routine missing, I could usually ‘create some space’ by simply optimizing the crappy compiler code.

I also cracked a bit of ’86 software. Eventually I stopped, as I got bored.

Today’s code is such a bloat that often you can’t even tell earth from water. Today’s software does with 30M what we would have done in less than 2K using assembler. Draw your own conclusions…

I feel the need for sharing an anecdote. A while ago, I was developing an application for an embedded device in C++. There was the need for ‘biased’ random generator functions, just as they were introduced with C++11 random number distributions. Unfortunately the compiler in use didn’t support C++11 yet. As the test-device had plenty of flash, I decided to include the random stuff via Boost… Soon, I was surprised to find out that my application had grown more than 10x in size and was now taking up more than half of the flash memory. An investigation revealed that the binary now even included internationalization routines! I’m guessing that the damn I18N had to be set up as well; this, on a device with only a few LEDs.

The compiler (and even more so the linker) was set to eliminate unused code/functions.

Luckily a compiler supporting C++11 was ready a few days later. Now the random stuff takes up only a few KB (though, arguably still more than needed).

Have you ever wasted your time by looking at the source-code of Mozilla’s Firefox? I went crazy after half an hour. There are tons of functions doing (more or less) the same thing. Obviously, they lost track of things. This is not only problematic in terms of code bloat and program efficiency, but also a security problem. You find an error in one function and you can almost be certain that there are more functions with the same error under a different name… what a mess! 😛

Bottom line: Complexity is just an excuse for not polishing the building blocks. Today’s software “foundations” are often already rotten in their core.