Posted
by
timothy
on Sunday October 23, 2011 @08:27PM
from the all-that-dessert-makes-you-sluggish dept.

ozmanjusri writes "New smartphones may be lightweight, compact objects, but their OSs are anything but. Ice Cream Sandwich will need workstations with no less than 16 GB RAM to build the source code, twice the amount Gingerbread needed. It will take 5 hours to compile on a dual quad-core 2+GHz workstation, and need 80GB disk space for all AOSP configs. Android developers are also being warned to be cautious of undocumented APIs: 'In almost every case, there's only one reason for leaving APIs undocumented: We're not sure that what we have now is the best solution, and we think we might have to improve it, and we're not prepared to make those commitments to testing and preservation. We're not claiming that they're "Private" or "Secret" — How could they be, when anyone in the world can discover them? We're also not claiming they're forbidden: If you use them, your code will compile and probably run.'"

If I remember correctly, you could build OpenOffice on Gentoo with -pipe and various other performance tuning flags, and the hardware requirement was only a minimum of 512MB. And every other package, including big stuff like the kernel and KDE, could be built with only 64MB... though 256MB was recommended.

I would guess the default Android build is optimized for the Google Android team, and so speed is the most important factor; they probably use a build server with multiple processors and big memory and d

Mmm, no. Third party modders do a lot of work, and make some really awesome builds, with all kinds of customizations and new features. Cyanogenmod, for instance. Quite the opposite of working for a large company with resources, their developer are now actually being hired by big companies because of their freelance work.

To me, that sounds like it takes 5 hours after compiling the code in parallel. So if it was a single threaded compilation job, in theory, the task would take much much longer.

Yes, it does SOUND that way... It's very "truthy" that way...

Relying on/. summaries just makes you look like an idiot, when you're just one quick and easy click away from the source. Surely, if you cant be bothered to put that much effort in, then you must not have enough time to write-up a response, either...

Verbatim quote from TFA:
"5+ hours of CPU time for a single build, 25+ minutes of wall time"

One of the better ways to optimize C++ code for building with GCC is to put all of the source code into one big code file. Or you can build it as a few independent modules, but the code is still quite large. Then you build it with the O3 flags. In GCC, the amount of RAM and CPU used in an O3 compile goes up by quite a lot as the code size in a single module increases. I am not sure what the exact equation is but I think it's an exponential function.

I run Gentoo and usually run make with -j5 on a tmpfs, and I've never managed to use even half of my 8GB RAM building anything from chromium to firefox to openoffice. And I certainly don't skimp on my CFLAGS.

Maybe if you build this thing on a tmpfs and run -j50 or something you'd need that kind of RAM, but seriously...

Plus, since parallel make tends to limit itself to a single module at a time in most build systems it is hard to get the parallelism to be all that high anyway.

One of the better ways to optimize C++ code for building with GCC is to put all of the source code into one big code file. Or you can build it as a few independent modules, but the code is still quite large. Then you build it with the O3 flags. In GCC, the amount of RAM and CPU used in an O3 compile goes up by quite a lot as the code size in a single module increases. I am not sure what the exact equation is but I think it's an exponential function.

This would easily explain the RAM and CPU usage.

And when LLVM/Clang gets full Concurrency in 3.1 you can bet Google will be put GCC in the rear of the bus with LLVM/Clang taking over the wheel.

Are you being obtuse? No, you're obviously not inlining 12GB worth of functions, I was just giving an example of an optimization that benefits from combining the code into one compilation unit, so you can make an intelligent decision on what can be optimized away.

And like I said, it was just an example. There's obviously more optimizations than just inlining.

Even well-separated and intelligently written software benefits from the compiler doing smart things, and to do that, it needs as much information as possible. Also, a *ton* of the memory usage for compiling something that large comes just from the bookkeeping required to keep track of debug information.

I can't think of any good reason that it would take 16 GB to compile anything.

Well how much RAM would you think should be needed to compile Android? If you're taking 5 hours of CPU time to ~25mins wall time then obviously your parallel compiles are going to be chewing up a lot of RAM. If you reduced the amount of parallel builds it would reduce the amount of RAM required - and also take a lot longer.

I'll meet you half way, say instead of a 16 GB or a 640K environment we use an 8 GB build environment?

Of course you could use 8GB, it would just take longer as either the data would have to be swapped to disk or you would end up running less parallel processes.

Nothing arbitrary about it, deliberate would be a better word to use.

Ok then, if it isn't just a baseless suggestion then how is it specifically that you would re-organize the Android project to achieve a no-compromise solution wrt RAM utilization and compilation time? You clearly think they're doing it wrong so what would you do differently?

Yes, there certainly are. The most obvious reason is code optimization. If your target device is something relatively light on resources like a mobile phone, then you probably want to optimize very aggressively. All forms of optimization require context. For something like "false && statement" all the required context for optimizing away the statement is very nearby. Something like the return value optimization [wikipedia.org] needs to know about the entire function. So far we're considering the easy stuff. If you want to go all out and get into whole program optimization [wikipedia.org] then some optimizations cannot be guaranteed to be safe without knowing the entire program.

Now if "compile" refers to the entire build process, then we're also probably talking about some serious static analysis. Checking for things like "can this function ever throw?" or "is this code reachable?" or "is the memory allocated here always eventually freed?" also requires an awful lot of context to check. In the worst case each of these questions requires knowing all of the code to answer.

The more you analyze at once, the better cross-function optimizations you can do. Think very deep inlining, loop unrolling, propagation of conditions that let the compiler skip various checks or completely optimize out some code, etc.

What does this have to do with "being predictable"? Compiler optimizations, by definition, don't change semantics of code so long as it's written right. If you're relying on something that's undefined behavior according to the language (or your compiler) spec, you're doing it wrong.

There's no line between compiler optimizations and bad organization of code - these two are completely orthogonal. As GGGGGP noted, a common way [sqlite.org] to optimize code real well, regardless of how it's written, is to concat all.c/.cpp

Maybe after you've actually worked on a professional C/C++ compiler you will be able to point out BOTH the positive and negatives to Unity/Bulk Builds. Until then, just because YOU are ignorant of Unity / Bulk Builds and have obviously never used them, doesn't mean other people would be wiling to trade there 5 min compiles + 2 min links times for 1+ hr builds using the inefficient precompiled header approach.

I have to wonder if the 16GB "requirement" is more of a recommendation and/or a bunch of default settings that deliberately avoid the disk as much as possible, and keep as many cores as you can throw at the job busy by compiling every little bit and piece in parallel...

On the one hand, with 16GB of RAM in the desktop/light workstation 4x4GB only running around $100(with the more workstation-friendly 2x8GB with ECC only twice that), it seems rather pointless to burn any developer time on trying to optimize the RAM needs of building the entire OS. RAM is cheap.

On the other hand, I have to wonder what they could possibly be doing to the process of compiling what is basically a weird-but-not-unrecognizable linux distro to make it that RAM hungry.

I have to wonder if the 16GB "requirement" is more of a recommendation and/or a bunch of default settings that deliberately avoid the disk as much as possible

I have to wonder if the 16 GB requirement is real.

Reading the blog linked to in the summary, there is no source mentioned. The author completely fails to mention how they came across this information. Even ignoring their bad English (obviously not their first language).

I think I'll wait for a more trustworthy source to confirm or deny this.

And if you read that original source, you'll see that they are recommendations for building future development machines:

-6GB of download.-25GB disk space to do a single build.-80GB disk space to build all AOSP configs at the same time.-16GB RAM recommended, more preferred, anything less will measurablybenefit from using an SSD.-5+ hours of CPU time for a single build, 25+ minutes of wall time, asmeasured on my workstation (dual-E5620 i.e. 2x quad-core 2.4GHz HT,with 24GB of RAM, no SSD),

Looking at those specs, maybe it's about time to think about switching Android to a modular architecture. There is no reason why the complete build needs to be made in one go. It's like running Gentoo and compiling the entire system from source when you really just want to upgrade an application. Or like the old OpenOffice days, when they bundled everything, so compiling it required compiling every library that it depended on, plus compiling Python and everything else that was embedded. Building/distributin

I'm really curious as to what it is that they're doing that's going to require more RAM than most of these devices have in total storage space.

Being lame. Apparently, being a certified smart person does not necessarily imply understanding how to code efficiently, including not understanding how to code a build efficiently. An excellent example of how Google hobbles its development pace by its fear of embracing community developers.

Just by way of illustration, there was one guy on my team at Google whose solution to a QA problem was to run the whole build under QEMU, increasing the latency of the build by two orders of magnitude. Turning it into an overnight process in fact. And this guy's approach had the full support of my manager, even being tech lead of the team I was not allowed to overrule that braindamage. No technical reason at all, just pure process of its own sake. This kind of mess is endemic at Google, including in some of

The compiler has no visibility into whether the memory space it is executing in is actually mapped to physical ram.

If the compiler doesn't know its own resident size, then how does top know the compiler's resident size, and how does ps aux know the compiler's resident size? I imagine that if a program detects that it's being swapped out, it might be able to adjust its CPU/memory tradeoffs at runtime.

Quick question for those with giant codebases such as this. How the heck do you test, and debug the software with those kind of lag times? Do you split everything up into smaller pieces or something? If so, then surely there are cases where you need to test something that requires EVERYTHING to be compiled. I can imagine such shot in the dark scenarios to be the stuff of pure nightmares.

Unless the build system is screwed up, recompiling after a change should be relatively fast. Usually source code is stored as lots of smaller files, and each file is compiled separately to produce a separate object file (e.g.,.o). Then next time a rebuild is requested, the system should notice what changed, and only rebuild the needed parts. Some parts take the same time each time (e.g., a final link), but it shouldn't take anywhere near the same amount of time.
There are lots of build tools, including make, cmake, and so on. If you use the venerable "make" tool, you might want to read Miller's "Recursive Make Considered Harmful":
http://aegis.sourceforge.net/auug97.pdf [sourceforge.net]
Cue the lovers and haters of "make", here:-).

In game programming, Incredibuild is a common tool for that. You run it on everyone's machine and it integrates with Visual Studio. Lets you reduce build time a ton since you have a lot of resources to use. Also tends to scale nicely as the larger the project, the more people working on it and thus the more computers available and so on. You can, of course, have dedicated servers just for compiling but many places don't bother, just having it use idle time from office systems as it is amazing how much that

Unless the build system is screwed up, recompiling after a change should be relatively fast. Usually source code is stored as lots of smaller files [...] Then next time a rebuild is requested, the system should notice what changed, and only rebuild the needed parts.

Back in my college days we had to submit a compilation job on the mainframe, and then wait around for a couple hours for someone to put the printout containing the results (or more likely a crash dump) into the appropriate mail box slot. Then you had to wait your turn to submit a revised copy. (No, this wasn't that long ago -- 89, 90, something like that -- but the community college I went to still taught their Cobol & assembly classes on an older mainframe -- 3270 terminals though, no punch cards).

Seems pretty unlikely unless you were in a deep backwater. Interactive terminals became commonplace very early in the 1980s. It wasn't uncommon to work on a batch processing system until the mid 1980s, but not with results delivered on paper.

The development model you're talking about properly dates to the 1960s and into 1970s, in backwaters where the future had yet to penetrate. FFS, the Xerox Alto [wikipedia.org] was introduced in 1973.

As others have said, you dont recompile the entire thing cause you changed one integer, but as others have not said you really should be testing in smaller chunks, you are not perfect enough to vomit out something that takes 5 hours of CPU time (which on the given systems is about a half hour of real time) perfectly the first time you try.

Its much easier to write a chunk and make sure it works than to write a freeking monster blob and go hunting for a chain reaction.

While it is a lot of RAM compared to what many system have, it really isn't a big deal these days. 4GB DDR3 sticks are $25 or less each, and that is for high quality RAM. Regular, consumer grade, LGA1155 boards support 4 of them. So for $100 you can have 16GB on a normal desktop system. My home system I type this on has 16GB for that reason. It was so cheap I decided "Why not?"

They actually can support more, with 8GB chips you can have 32GB on a standard desktop, but those are still expensive.

The enthusiast X79 LGA2011 boards coming out will have 8 sockets and thus handle 64GB. Of course beyond that there's workstation which cost a lot more, but not as much as you might first think.

At any rate, 16GB is now a "regular desktop" amount of RAM. Standard boards the likes of which you get in cheap ($1000 or less) towers support that much, and it only costs $100 to get. It is quite a realistic thing to require, for something high end.

Well, it is an amount of RAM you could cram into a brand new regular desktop, but it certainly isn't something you'd find on an average desktop. I think I have two slots free in mine so I could bump it up to 16GB, but that is $50 I don't really need to spend. I rarely am using more than half of my RAM as it is, though the extra obviously helps with caching/etc.

Android has always been RAM-intensive, and it makes sense since you have no choice but to build an entire OS at once (not like you can dynamically

My nearly two year old desktop has 12GB of ram in it, and can take 24GB (i7 930 based system). My next build which should be in December or January will likely have 32GB and expandable to 96GB (Dual socketed X79).

TFA: "5+ hours of CPU time for a single build, 25+ minutes of wall time, as measured on a workstation (dual-E5620 i.e. 2x quad-core 2.4GHz HT, with 24GB of RAM, no SSD)."/. Summary: "It will take 5 hours to compile on a dual quad-core 2+GHz workstation"

Of course that was the/. summary. Most people, including me, have never heard of "wall time" and I was trying to figure it out, until I did the calculation and figured out that "wall time" is the clock on the wall, not some fancy compiler term I've never heard before.

Hyperthreading! With 8 physical cores, the OS sees 16 logical cores. 16 threads can be running at once. Though technically only 8 are running because there aren't extra copies of the adders, multipliers, etc, whenever one is waiting on memory, or stalled because a branch miss-prediction emptied the pipeline, etc. then the other thread can run instantly. Normally it's stupid to context switch due to a branch error or cache miss, since probably that will be resolved before you can finish the very expensiv

...does a phone really require an OS of that complexity? Don't get me wrong, I have a current generation Android smartphone I bought 2-3 months ago, 4G enabled, it even has an HDMI out, and I completely comprehend that a modern smartphone is essentially a fully fledged computer.

That being said, it's still a phone. And in fact, it's horrible at it. To redial the last number I have to press 3 buttons (1 physical, 2 virtual) and suffer through 3s+ of erratic lag or more. On my 4 year old boring but fun

It's not the OS complexity. It's the laziness of programmers, who want everything abstracted through seven layers until they can write single lines of code that through black box code magically expands into what they think they want done, not that they can ever know for sure.And the poor compiler works overtime, because it has to redo the same work every time someone writes object.method().

There are reasons why so many of the kernel developers (including Linus Torvalds) are adamantly against using "higher

And the poor compiler works overtime, because it has to redo the same work every time someone writes object.method().

A larger problem is the hacked-up Java VM Google came up with to work around licensing problems with Sun/Oracle (fat lot of good that did them) Dalvik is a fairly poor implementation of a Java VM - on the same hardware the embedded version of Java runs much faster.

I feel the OP's pain. I have a pretty fast android device (Archos 43) and the UI feels smooth about half the time, the other half it's flaky and unresponsive. My wife's iPod touch never so much as hiccups. I still use a cheapo feature phone as a c

I have the same issue with my smartphone. Considering buying a second phone, a more traditional "dumb phone", to go with it. My LG P500 has pretty poor sound quality even. For the rest it's a great device.

That, and I don't understand why Android can't just read it's contacts from an LDAP server.

It's not just the OS, I believe. It's the whole kit that takes that long to build. Remember, there are all the debugging tools, the emulator, the virtual machine, the OS, and all the user interface goodies. In the OS you then have drivers for every different HDMI graphics chip that Android supports natively, drivers for the different radios, bluetooth, usb, drivers for the touch screens, sound chip drivers.

How long does it take you to compile the whole linux kernel, not just the few parts you use?

Unless the entire program is in 1 gigantic 8 billion billion billion line file why would it need that many resources or even be able to use 16 GB of RAM?Assuming it is like a normal program would it not just be a large collection of relatively small files that are compiled one after the other (theoretically number of CPUs + 1 threads running at a time with that many files being compiled concurrently being the optimal solution)?And I just do not see how you could ever use up 16 GB at any one time.

This article would be shocking, but considering that 16 GB of memory -- especially the dual-channel DDR3 used for the i5 and consumer i7's -- is so cheap, less than $100, this article doesn't have any shock value. It's just informative. It's letting us know the 'recommended' memory and giving more nerds an excuse to add more RAM. That is the NERDS that don't already have 24 gigs for their virtual machines.:P

Note that this is the entire system, so building Ice Cream Sandwich from source is just like rebuilding an entire Linux distribution from scratch from the ground up, from the kernel down to the user tools and the windowing system, etc. If you've ever used Gentoo, this should sound familiar. I wonder if some of the tools that Gentoo users are familiar with to help speed along
compilation, such as distcc [gentoo.org], could help with this.

Is this a case of preprocessing gone wrong? Sometimes preprocessing can be a monster because it blows up each.c file into a monster file due to (almost) every.h file being included, which lead to long compile times. This is especially the case when you have large numbers of.c files. Preprocessing was invented as a hack in the time that memory too small for the whole source to be compiled in one time. But when applied to large systems, it causes the compile time to increase, because every piece of code in

I got a rather workable gaming/development laptop (HP) from an office supply store just 3 years ago. It supports a max of 8gigs of ram. Granted, it was the cheapest laptop with a nvidia 9600GTM with a gig of dedicated ram(really, a better mobile graphics chip than most of the 100 and 200 series), which is to say not the cheapest thing on the shelf.

True, I didn't buy it for Android development, and making the trade off for a few extra minutes of wall time versus a whole new computer is worth it in my short r

16 GB is far more than any desktop user should need, and most laptops simply cannot hold that much, so it's creating a sharp demarcation between user and developer. This is bad. You want your advance users to naturally transition into becoming developers, and making your codebase inaccessible for them prevents that.

IOW, most people have suggestions for improvement for any tool they use. Ideally, it would be trivial for someone to download the source, modify it, recompile, test, and submit improvements

16 GB is far more than any desktop user should need, and most laptops simply cannot hold that much, so it's creating a sharp demarcation between user and developer.

I have 8GB in my development system at work. With two copies of Eclipse sucking up a gigabyte each, if I try to compile my C++ software without shutting down the old version, I go over 8GB and start to swap.

16GB would definitely be beneficial; I put 16GB of 1.6GHz DDR3 in my home server when I built it earlier this year and it cost under $150. ECC RAM for the work system would cost more, but would probably pay for itself in a few weeks.

If the undocumented API changes or disappears, be ready to either (1) change your code, or (2) emulate the old API. Nothing nefarious -- just too damned lazy to document something that might be unstable.

Human nature -- If you document it, people will expect it to be stable (no matter what you may say to the contrary). Undocumented API's have a built-in "we told you so" flavour to them.

The D programming language compiles much faster than C++. It was designed to be easier to lex & parse

So how much of the compile time for C/C++ code is spent processing the characters in the source code and how much is spent processing the intermediate representation and turning it into machine code? If the answer is "most of the time is spent doing the latter", then "designed to be easier to lex and parse" doesn't help much.

(And how much of Android is C/C++ and how much of it is in their Java dialect? How much of that time is spent translating C/C++ to machine code and how much of it is spent translatin

As far as I'm concerned, demanding that all free software be simple enough that anybody could buy a machine capable of compiling it in a reasonable time isn't exactly evil, but it's massively assholic. "Free" doesn't mean "simple enough that a cheap machine can compile it in a reasonable amount of time".