I have a python method that takes a list of tuples of the form (string, float) and
returns a list of strings that, if combined, would not exceed a certain limit. I am not splitting sentences to preserve the output length, but making sure to stay within a sentence length from the desired output length.

For example:
s: [('Where are you',1),('What about the next day',2),('When is the next event',3)]

@atlantis: Your variable name "max_length" AND your "would not exceed a certain limit" AND your "make sure the output word length does not exceed max_length" contradict what you say in comments. Please edit your question so that it is consistent with what you really want to do.
–
John MachinNov 8 '11 at 20:54

So, are you are looking for the shortest set of strings that has at least the given number of words? That is what your example seems to be doing. Also, what is the point of the number in the pairs? Must we choose the first string before any of the others?
–
101100Nov 8 '11 at 21:43

Also, in your examples, the output is slightly bigger than the set maximum of words. If you want the number of words to be always less that the limit (but still the maximum possible), than just put yield after checking against the limit:

In your final snippet, you will never get exactly the max_length.
–
John MachinNov 8 '11 at 21:21

@JohnMachin You are right. I'll edit the answer. It's usually a good practice to check the solution on the 'corner' cases (what I didn't do here).
–
ovgolovinNov 8 '11 at 21:23

Now re-read the first sentence of your answer and apply it to your final snippet.
–
John MachinNov 8 '11 at 21:47

@JohnMachin Sorry, but it's already implemented. What was in the OP's question is that he calculated l+=... and did nothing with it on the current iteration, it's usage was deferred till the next iteration where he compared it with max_length. In my code these operations come together.
–
ovgolovinNov 8 '11 at 21:56

Sorry*2, but l+=... is irrelevant. The issue is that in your final snippet, when tot_len == max_length, it does not break after yielding, it goes around once more (if the input is not depleted) and uselessly calculates the length of the next item. This behaviour qualifies as "defer the breaking out of loop if the maximum length is reached to the next iteration"
–
John MachinNov 8 '11 at 22:21

Your code doesn't stop when the limit is reached. "max_length" is a bad name ... it is NOT a "maximum length", your code allows it to be exceeded (as in your first example) -- is that deliberate? "l" is a bad name; let's call it tot_len. You even keep going when tot_len == max_length. Your example shows joining with a space but your code doesn't do that.

Yes, max_length does control number of words in the output but not precisely. I am not splitting sentences to preserve the output length, but making sure to stay within a sentence length from the desired output length. for output length 5, I can't split the second sentence.
–
atlantisNov 8 '11 at 20:38

When you post an answer, it should be an actual attempt to answer the question. As it is now, it should be a comment. If this is just some kind of placeholder answer, you should never do this.
–
Jeff MercadoNov 8 '11 at 20:41

@atlantis okay i see, so it's not the maximum length, but it indicates which string will be your last one.
–
aeroNotAutoNov 8 '11 at 20:42

@JeffMercado well i'm still continuing to type the answer i was going to provide, but i wanted to see if i could get more information from the op while i finished it up. i don't have the ability to comment yet. i guess i should just not answer in the future, sorry.
–
aeroNotAutoNov 8 '11 at 20:43

Understandable. Please try to complete your answer then now that you have the information you wanted. Otherwise it will end up being deleted.
–
Jeff MercadoNov 8 '11 at 20:45

If NumPy is available the following solution using list comprehension works.

import numpy as np
# Get the index of the last clause to append.
s_cumlen = np.cumsum([len(s[0].split()) for s in s_tuples])
append_until = np.sum(s_cumlen < max_length)
return ' '.join([s[0] for s in s_tuples[:append_until+1]])

For clarity: s_cumlen contains the cumulative sums of the word counts of your strings.