I have been programming in higher level languages (Python, C#, VBA, VB.NET) for around 10 years and I have completely zero understanding on what's going on, "under the hood."

I am wondering what are the benefits of learning assembly, and how will it aid me as a programmer? Can you please provide me with a resource that will show me exactly the connection between what I write in higher level code to what happens in assembly?

If you want to know what happens under the hood specifically in .Net, you might want to learn more about CIL. It's similar to assembly in some ways, but much more high level. Because of that, it's easier to understand than actual assembly.
–
svickJul 13 '12 at 18:25

4

If you learn assembly, you can avoid thinking you're optimizing a for loop by declaring variables outside of it. example
–
StriplingWarriorJul 13 '12 at 19:36

7

Oh my god. You just reminded me about the Assembly languages class I took in college about 1 year ago. It's just amazing to see how extremely simple stuff that we take for granted are translated in hundreds or even thousands of smaller and more low-level operations. Computers are extraordinary machines.
–
Radu MurzeaJul 13 '12 at 21:03

9

Learning assembly will bestow you with a deep and abiding love for the concept of programming language that protect you from EVER having to write complex code in assembly again.
–
ShadurJul 14 '12 at 5:57

18 Answers
18

You'll understand that function calls are not for free and why the call stack can overflow (e.g., in recursive functions). You'll understand how arguments are passed to function parameters and the ways in which it can be done (copying memory, pointing to memory).

You'll understand that memory is not for free and how valuable automatic memory management is. Memory is not something that you "just have", in reality it needs to be managed, taken care of and most importantly, not forgotten (because you need to free it yourself).

What it boils down to is that all the things we write in C# or Python need to be translated into a sequence of basic actions that a computer can execute. It's easy to think of a computer in terms of classes, generics and list comprehensions but these only exist in our high-level programming languages.

We can think of language constructs that look really nice but that don't translate very well to a low-level way of doing things. By knowing how it really works, you'll understand better why things work the way they do.

Except that after a few weeks of asm, you'll start to think of C as a high level programming language. Unless you're talking to low level embedded device developers, saying that out loud will cause most people to think you're a little bit crazy.
–
Dan NeelyJul 13 '12 at 19:53

15

@Dan: It's kinda funny how these terms change over time. 20 years ago, when I was getting started programming, if you'd asked someone that they'd say "Of course C is a high-level language!" That should be obvious; it provides a standardized heap and memory access model. And that's some serious abstraction away from the hardware; in a low-level language, you have to keep track of all the memory addresses yourself, or if you're doing something really fancy, you write your own heap allocator! So I have to wonder, what's the criteria that makes something high-level or low-level today?
–
Mason WheelerJul 13 '12 at 22:14

8

High-level/low-level isn't a binary. A well-rounded programmer who's written both assembly and Python in her career might consider C or C++ a mid-level language.
–
Russell BorogoveJul 14 '12 at 1:31

It will give you a better understanding of what is "happening under the hood" and how pointers work and the meaning of register variables and architecture (memory allocation and management, parameter passing (by value/by reference), etc) in general.

For a quick peek with C how's this?

#include <stdio.h>
main()
{
puts("Hello World.");
return(0);
}

compile with gcc -S so.c and take a look at the assembly output in so.s:

@Izkata ha ha .. good one, I didn't even notice that. I have a standard so.c file for stackoverflow questions (like I have so.py, so.awk etc) to test out things quickly. So.S .. :)
–
LevonJul 13 '12 at 19:55

8

If you compile with gcc -O -c -g -Wa,-ahl=so.s so.c you can see the assembly output for each line of C code. This makes it a little easier to understand what is going on.
–
Mackie MesserJul 13 '12 at 22:28

1

Yes, the output is long. You can search for 5:so.c to find the code for line 5 of so.c.
–
Mackie MesserJul 13 '12 at 22:53

Though it's true, you probably won't find yourself writing your next customer's app in assembly, there is still much to gain from learning assembly. Today, assembly language is used primarily for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, low-level embedded systems, and real-time systems.

The fact of the matter is, the more complex high level languages become, and the more ADT (abstract data types) that are written, the more overhead is incurred to support these options. In the instances of .NET, perhaps bloated MSIL. Imagine if you knew MSIL. This is where assembly language shines.

Assembly language is as close to the processor as you can get as a programmer so a well designed algorithm is blazing -- assembly is great for speed optimization. It's all about performance and efficiency. Assembly language gives you complete control over the system's resources. Much like an assembly line, you write code to push single values into registers, deal with memory addresses directly to retrieve values or pointers.

To write in assembly is to understand exactly how the processor and memory work together to "make things happen". Be warned, assembly language is cryptic, and the applications source code size is much much larger than that of a high-level language. But make no mistake about it, if you are willing to put in the time and the effort to master assembly, you will get better, and you will become a stand out in the field.

Additionally, I'd recommend this book because it has a simplified version of computer architecture:
Introduction to Computing Systems: From Bits and Gates to C and Beyond, 2/e
Yale N. Patt, University of Texas at Austin
Sanjay J. Patel, University of Illinois at Urbana/Champaign

This described what ASM is used for, and mentions that HLLs are bloated, but the only specific benefit given for learning ASM is to write super-fast code. Yes, but even if you learn ASM how likely are you to actually incorporate it in apps? Assuming you write business apps, not hardware controllers or device drivers.
–
Jon of All TradesJul 13 '12 at 21:48

2

@Jon, I really don't see why you would if you are developing business software. It's one thing if you are a DBA, or writing a compiler, or have limited memory space, but I don't think many people touch it frequently. Optimization is mostly taken care of by the compiler, which is the biggest reason to write in assembly. Sometimes it helps when tracking down memory leaks.
–
notkilroyJul 14 '12 at 1:03

1

I quite disagree. An automated optimizer can often beat out a human programmer in creating speedy assembly.
–
DeadMGJul 15 '12 at 22:25

You should be familiar with one level 'deeper' in the system that you are working at. Skipping too far down in one go isn't bad, but may not be as helpful as one would desire.

A programmer in a high level language should learn a lower level language (C is an excellent option). You don't need to go all the way to assembly to have an appreciation of what goes on under the covers when you tell the computer to instantiate an object, or create a hash table, or a set - but you should be able to code them.

For a java programmer, learning some C would help you with memory management, passing arguments. Writing some of the extensive java library in C would go a ways to understanding when to use what implementation of Set (do you want a hash? or tree?). Dealing with char* in a threaded environment will assist in understanding why String is immutable.

Taken to the next level... A C programmer should have some familiarity with assembly, and assembly types (oft found in embedded systems shops) would likely do well with understanding things at the level of gates. Those who work with gates should know some quantum physics. And those quantum physicists, well, they are still trying to figure out what the next abstraction is.

One level deeper is about right. I tend to go for a couple, but assuming that x86 assembly knowledge is worth the investment compared to studying MSIL for a C# programmer is asking for too much. As someone who studied assembly and solid state physics in uni, I don't think that knowing the physics of gate design has helped me at all, besides graduating with a degree in electronics.
–
Muhammad AlkarouriJul 14 '12 at 20:13

I used to know x86 assembly very well. It helped a little when assembly came up in my courses, it came up once during an interview, and it helped me prove that a compiler (Metrowerks) was generating bad code. It's fascinating how the computer actually works, and I feel intellectually richer for having learned it. It was also very fun to play with at the time.

However, today's compilers are better at generating assembly than almost anyone on almost any piece of code. Unless you're writing a compiler or checking that your compiler is doing the right thing, you are probably wasting your time by learning it.

I admit that many questions that C++ programmers still usefully ask are informed by knowing assembly. For example: should I use stack or heap variables? should I pass by value or by const reference? In almost all cases, however, I think that these choices should be made based on code readability rather than computational time savings. (E.g., use stack variables whenever you want to limit a variable to a scope.)

I don't agree. If you have extensive knowledge about a certain algorithm and a good grasp of the hardware, it is usually possible to create assembly code that is better optimized than what the compiler can create since it has to play it safe. Knowing roughly how your code is translated into assembly also helps when doing optimizations.
–
LeoJul 14 '12 at 17:24

Since you didn't mention C or C++ in the languages you know list. I would STRONGLY recommend learning them well before even thinking about assembly. C or C++ will give all the basic concepts that are totally transparent in managed languages and you will understand most of the concepts mentioned in this page with one of the most important languages that you could use in real world projects. It is a real added value to your programming skills. Please, be aware that assembly is used in very specific areas and it is not nearly as useful as C or C++.

I would even go further to say that you should not dive to assembly before understanding how unmanaged languages work. It is almost a mandatory reading.

You should learn assembly if you want to go even further down. You want to know how exactly each and every construct of the language is created. It is informative but it is a whole lot different level complexity.

Using Python(/CPython) as an example, if you start getting weird crashes or poor performance, knowledge of how to debug C code can be very useful, same with is knowledge of it's ref-counting memory management method. This would also help you know when/if to write something as a C extension, and so on...

To answer your question in this case, knowledge of assembly really wouldn't help an experienced Python developer (it's too many steps down in abstraction - anything done in Python would result in many many assembly instructions)

..but, if you are experienced with C, then knowing "the next level down" (assembly) would indeed be useful.

Similarly, if you are using CoffeScript then it's (very) useful to know Javascript. If you are using Clojure, knowledge of Java/JVM is useful.

This idea also works outside of programming languages - if you are using Assembly, it's a good idea to be familiar with how the underlying hardware functions. If you are a web-designer, it's a good idea to know how the web-application is implemented. If you are a car mechanic, it's a good idea to have some knowledge of some physics

There is no definitive answer, since programmers are not all of a type. Do you NEED to know what lurks underneath? If so, then learn it. do you just merely want to learn it, out of curiosity? If so, then learn it. If it will have no practical benefit to you, then why bother? Does one need a mechanic's level of knowledge just to drive a car? Does a mechanic need an engineer's level of knowledge, just to work on a car? This is a serious analogy. A mechanic can be a very good, productive mechanic without diving to engineer depth understand of the vehicles he maintains. Same for music. Do you really to plumb the complexities of melody, harmony and rhythm to be a good singer or player? No. Some exceptionally talented musicians can't read a lick of sheet music, let alone tell you the difference between Dorian and Lydian modes. If you want to, fine, but no, you don't need to. If you are a web developer, assembly has no practical use that I can think of. If you are in embedded systems or something really specially, then it might be necessary, but if it were, you'd know it.

Actually, what would probably be best for you would be a class that doesn't (to my knowledge) exist anywhere: It would be a class that combines a brief overview of machine/assembler language and storage addressing concepts with a tour through compiler construction, code generation, and runtime environments.

The problem is that with a high-level, far-away-from-the-hardware language like C# or Python you don't really appreciate the fact that every move you make turns into hundreds if not thousands of machine instructions, and you don't tend to comprehend how a few lines of a high-level language can cause vast amounts of storage to be accessed and modified. It's not so much that you need to know precisely what is going on "beneath the covers", but you need to have an appreciation for the scope of what's happening, and a general conception of the types of things that occur.

My answer to this question has evolved relatively recently. The existing answers cover what I would have said in the past. Actually, this is still covered by the top answer - the "appreciate the constructs in higher-level programming" point, but it's a special-case that I think is worth mentioning...

According to this Jeff Atwood blog post, which references a study, understanding assignment is a key issue in understanding programming. Learner programmers either understand that the notation just represents steps that the computer follows, and reasons by the steps, or else gets perpetually confused by misleading analogies to mathematical equations etc.

Well, if you understand the following from 6502 assembler...

LDA variable
CLC
ADC #1
STA variable

That really is just the steps. Then when you learn to translate that to an assignment statement...

variable = variable + 1;

You don't need an misleading analogy to a mathematical equation - you already have a correct mental model to map it to.

EDIT - of course if the explanation you get of LDA variable is basically ACCUMULATOR = variable, which is exactly what you get from some tutorials and references, you end up back where you started and it's no help at all.

I learned 6502 assembler as my second language, the first being Commodore Basic, and I hadn't really learned much of that at the time - partly because there was so little to learn, but also because assembler just seemed so much more interesting back then. Partly the times, partly because I was a 14 year old geek.

I don't recommend doing what I did, but I wonder if studying a few very simple examples in a very simple assembler language might be a worthwhile preliminary to learning higher-level languages.

Writing in assembly would not give you magic increase of speed as due to amount of details (register allocation etc.) you will probably write the most trivial algorithm ever.

Additionally with modern (read - designed after 70-80's) processors assembly will not give you sufficient number of details to know what is going on (that is - on most processors). Modern PU (CPUs and GPUs) are quite complex as far as scheduling instructions go. Knowing basics of assembly (or pseudoassembly) will allow to understand computer architecture books/courses which would provide further knowledge (caches, out-of-order execution, MMU etc.). Usually you don't need to know complex ISA to understand them (MIPS 5 is quite popular IIRC).

for i from 0 to N
for j from 0 to N
for k from 0 to N
A[i][j] += B[i][k] + C[k][j]

It may be 'good enough' for your purpose (if it is 4x4 matrix it might be compiled to vector instructions anyway). However there are quite important programs when you compile massive arrays - how to optimize them? If you write the code in assembly you might have a few % of improvement (unless you would do as most people do - also in naive way, underutilizing registers, loading/storing to memory constantly and in effect having slower program then in HL language).

However you can reverse tho lines and magically gain performance (why? I leave it as 'homework') - IIRC depending on various factors for large matrices it can be even 10x.

for i from 0 to N
for k from 0 to N
for j from 0 to N
A[i][j] += B[i][k] + C[k][j]

That said - there are working on compilers being able to do it (graphite for gcc and Polly for anything using LLVM). They are even capable of transforming it into (sorry - I'm writing blocking from memory):

for i from 0 to N
for K from 0 to N/n
for J from 0 to N/n
for kk from 0 to n
for jj from 0 to n
k = K*n + kk
j = J*n + jj
A[i][j] += B[i][k] + C[k][j]

To summarise - knowing basics of an assembly allows you to dig into various 'details' from processor design which would allow you to write faster programs. It might be good to know differences between RISC/CISC or VLIW/vector processor/SIMD/... architectures. However I would not start with x86 as they tend to be quite complicated (possibly ARM too) - knowing what is a register etc. is IMHO sufficient for start.

Normally it's VERY important for debugging purposes. What do you do when the system breaks in the middle of an instruction and the error makes no sense? It's much less of an issue with .NET languages so long as you're only using safe code--the system will almost always shield you from what's going on under the hood.

Here is the obligatory Car Analogy

High Level == Turn By Turn Directions

Imagine Python ( or any other high level language ) is like giving directions to someone on how to drive from Point A to Point B based at the same high level, turn by turn.

Turn by turn directions don't bother with details about what car you are driving, or any of the minutia involved with starting the car or using the brakes or anything like that. They just tell you what to do ( Python ) and assume you ( Python Runtime ) can make the machine ( computer ) do what it needs to do to accomplish the goal ( getting to Point B ).

Assembly == Microscopic Burdensome Context Dependent Details

Imagine having to tell someone every minute detail about how to build a car before they can even start the engine.

Then based on which car how to start the car ( keys vs buttons vs voice activation, etc ), on when and how to put the car in gear ( based on transmission type, manual, auto, cvt, etc ) and then when to select each gear, and how to select them each time. When and how to press the accelerator pedal down to absolute values of how much. When and how to apply the brakes ( manual or power or anti-lock ) in absolute values of how much.

Now you have the most finite detailed control over exactly what happens when but you also have to be exactly correct to a level of detail that is extremely hard to manage. But only if you know the details of all these components ( CPU, FPU, etc. ) because ASM is machine specific.

But it would be extremely burdensome to communicate to another how to get from the same Point A to Point B in this manner.

Do we really need to know every detail of automotive engineering to drive a car?

Windows, Apple, AOL, MSN, Geocities, myspace and Facebook they all show us that no we don't have to know any of this stuff to use a computer.

As a developer that does only General Programming with High Level Languages and doesn't write compilers or anything else that translates directly to machine code ( virtual machines with JIT compilers, bytecode runtimes with JIT compilers, etc ) knowing ASM isn't really going to do you much good.

Knowing how an internal combustion engine that is driven by an ECU you don't control works isn't going to make you a better driver because you can't directly affect the output of the system as a whole.

Pseudo Assembly

Learning how Python creates its byte code, or Java creates its byte code might be a better intermediary step because it is basically platform agnostic assembly code. These might be better places to start, you can write the byte code yourself and execute it in both cases.

In short I think the answer is because you can do more if you learn assembly. Learning assembly grants access to the realms of embedded device programming, security penetration and circumvention, reverse engineering and system programming which are very hard to work in if you don't know assembler.

As for learning it to improve program performance, this is doubtful in applications programming. Most of the time there are so many things to focus on first before ever hitting this level of optimization like optimizing your i/o access on both disk and network, optimizing how you build the GUI, choosing the right algorithms, maxing out all your cores, running on the best hardware money can buy and switching from interpreted to compiled languages. Unless you're creating software for other end users, hardware is cheap compared to a programmer's hourly wage, especially with cloud availability.

Also, you have to weigh increased program execution speed with readability of your code after you get hit by a bus, quit or come back to the code base to change it a year after you wrote the last version.

Also learn lisp see Structure and Interpretation of Computer Programs groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures this video course will teach you everything you need to know, including algorithms (how to do everything based on a few primitive commands, one lisp primitive and some assembler provocatives).

Finally if you must learn assembler learn an easy one like ARM ( also it is used in about 4 times more devices than x86).

Well, the answer is that just simply because the language you are using must be interpreted or compiled into assembler at the end. No matter the language or the machine.

The design of languages derives from the way the CPU works. More on low level programs, less on high level programs.

I will end by saying that it is not only that you need to know little assembler but CPU architecture, which you learn by learning assembler.

Some examples: There are many java programmers that do not understand why this does not work, and even less than know what happens when you run it.

String a = "X";
String b = "X";
if( a==b) return true;

If you knew a little assembler you would always know that it is not the same the content of a memory location vs the number in the pointer variable that "points" to that location.

Even worse, even in published books you will read something like in JAVA primitives are passed by value and objects by reference, which is completely incorrect. ALL arguments in JAVA are passed by value, and JAVA can NOT pass objects to functions, only pointers, which are passed by value.

If you now assembler its obvious what's going on, if not it is so complicated to explain that most authors just give you a pious lie.

Of course, the ramifications of these are sublet but can get you in real trouble later on. If you know assembler it’s a non issue, if no, you are in for a long long night of debugging.

Your first paragraph is completely incorrect: languages aren't compiled into ASM, they are compiled into Machine Code. Interpreters don't compile into ASM either, they interpret the code or byte code and call functions or methods on precompiled machine code.
–
Jarrod RobersonJul 13 '12 at 18:27

5

Ever single thing you claim about Java is incorrect as well. Starting with String a = "X"; String b = "X"; if( a==b) return true; which does in fact == true because of something called String interning that the compiler does. All the other Java statements are wrong as well. Java doesn't have pointers, it has references which are not the same thing. And none of any of that has anything to do with assembler in any fashion. Java passes primitives by value as well as references by value. Java doesn't have pointers so it can't pass them by anything. Again all irrelevant to knowing ASM.
–
Jarrod RobersonJul 13 '12 at 18:33

2

@FrankComputer last time I looked gcc compiled C/C++/fortran/java/ada/etc to internal byte code, and internal byte code to assembler. It then dispatches this assembler code to an assembler to convert it to machine code.
–
richardJul 14 '12 at 21:06