I've read a lot of examples on locking threads.. but why should you lock them?
From my understanding, when you initiate threads without joining them, they will compete with the main thread and all other threads for resources and then execute, sometimes simultaneously, sometimes not.

No I'm not asking about the GIL, I know the limitations of it in python and am happy with it, the question is about locking threads with acquire() and release() non related to the GIL (other than it has lock in the name)
–
MistahXJun 18 '11 at 0:32

1

Retagged, as not exclusive to python at all.
–
Alex R.Jun 18 '11 at 0:40

retagged as python, i am referring to the lock methods in the threading module, forgot to add that in
–
MistahXJun 18 '11 at 0:50

1

You say "lock functions like lock() and acquire in the threading module", but there are no such functions. There is a Lock() factory that returns a Lock object, which has acquire() and release() methods. This has nothing to do with "locking a thread", it's just a basic mutex.
–
Nicholas KnightJun 18 '11 at 1:28

2 Answers
2

A lock allows you to force multiple threads to access a resource one at a time, rather than all of them trying to access the resource simultaneously.

As you note, usually you do want threads to execute simultaneously. However, imagine that you have two threads and they are both writing to the same file. If they try to write to the same file at the same time, their output is going to get intermingled and neither thread will actually succeed in putting into the file what they wanted to.

Now maybe this problem won't come up all the time. Most of the time, the threads won't try to write to the file all at once. But sometimes, maybe once in a thousand runs, they do. So maybe you have a bug that occurs seemingly at random and is hard to reproduce and therefore hard to fix. Ugh!

Or maybe... and this has happened at the company I work for... you have such bugs but don't know they're there because hardly any of your customers have more than 4 CPUs. Then they all start buying 16-CPU boxes... and your software runs as many threads as there are CPU cores, so now there are 4 times as many threads and suddenly you're crashing a lot or getting the wrong results.

So anyway, back to the file. To prevent the the threads from stepping on each other, each thread must acquire a lock on the file before writing to it. Only one thread can hold the lock at a time, so only one thread can write to the file at a time. The thread holds the lock until it is done writing to the file, then releases the lock so another thread can use the file.

If the threads are writing to different files, this problem never arises. So that's one solution: have your threads write to different files, and combine them afterward if necessary. But this isn't always possible; sometimes, there's only one of something.

It doesn't have to be files. Suppose you are trying to simply count the number of occurrences of the letter "A" in a bunch of different files, one thread per file. You think, well, obviously, I'll just have all the threads increment the same memory location each time they see an "A." But! When you go to increment the variable that's keeping the count, the computer reads the variable into a register, increments the register, and then stores the value back out. What if two threads read the value at the same time, increment it at the same time, and store it back at the same time? They both start at, say, 10, increment it to 11, store 11 back. So the counter's 11 when it should be 12: you have lost one count.

Acquiring locks can be expensive, since you have to wait until whoever else is using the resource is done with it. This is why Python's Global Interpreter Lock is a performance bottleneck. So you may decide to avoid using shared resources at all. Instead of using a single memory location to hold the number of "A"s in your files, each thread keeps its own count, and you add them all up at the end (similar to the solution I suggested with the files, funnily enough).

Okay this makes sense, but the locks i'm talking about lock threads, not files? So what do they do?
–
MistahXJun 18 '11 at 0:54

1

@MistahX: No, they lock whatever you decide they lock. You use them as you see fit to prevent multiple threads from doing the same thing at the same time. They're primitives, you build from them what you need. If you wrap access to a file in a lock, you're effectively "locking" that file.
–
Nicholas KnightJun 18 '11 at 1:23

You don't lock a thread; a thread acquires or releases a lock on some shared resource. I suppose a thread that is waiting for a lock to be released could be said to be "locked" but that isn't common parlance.
–
kindallJun 18 '11 at 1:26

Now, you might know that either one of these threads could complete and print first. You would expect to see both output 30.

But they might not.

y is a shared resource, and in this case, the bits that read and write to y are part of what is called a "critical section" and should should be protected by a lock. The reason is you don't get units of work: either thread can gain the CPU at any time.

Think about it like this:

t1 is happily executing code and it hits

a = 2 * y

Now t1 has a = 20 and stops executing for a while. t2 becomes active while t1 waits for more CPU time. t2 executes:

a = 2 * y
b = y / 5
y = a + b + x

at this point the global variable y = 30

t2 stops stops for a bit and t1 picks up again. it executes:

b = y / 5
y = a + b + x

Since y was 30 when b was set, b = 6 and y is now set to 34.

the order of the prints is non-deterministic as well and you might get the 30 first or the 34 first.

This necessarily makes this section of code linear -- only one thread at a time. But if your entire program is sequential you shouldn't be using threads anyway. The idea is that you gain speed up based on the percentage of code you have that can execute outside locks and run in parallel. This is (one reason) why using threads on a 2 core system doesn't double performance for everything.

the lock itself is also a shared resource, but it needs to be: once one thread acquires the lock, all other threads trying to acquire the /same/ lock will block until it is released. Once it is released, the first thread to move forward and acquire the lock will block all other waiting threads.