This works together with a wave file by acting as a cutlist. So the desired parts are 1->2, 4->6, 12->15, ...

If the distance between End-Time-In-Seconds of the previous element and Start-Time-In-Seconds of the current element is below a threshold of seconds(I call it Pausendauer) I merge those two, ie if the threshold is 3 seconds then the list will be

Start-Time-In-Seconds;End-Time-In-Seconds
1;6
12;15
...

If the distance between Start-Time-In-Seconds and End-Time-In-Seconds is below a threshold of seconds(I call it Minimallänge) I discard this sample, ie if the threshold is 4 seconds then the list will be

Start-Time-In-Seconds;End-Time-In-Seconds
1;6
...

What could an algorithm look like that iterates (intelligently) through all combinations of Minimallänge and Pausendauer to aim at a certain number of entries? Example:

The number of entries should be 3. Given the number 3 the algorithm should iterate (intelligently) through all combinations of Minimallänge and Pausendauer to output something like this:

Start-Time-In-Seconds;End-Time-In-Seconds
1;12
18;20
50;100

And that should be all. You notice I did not add "..." to it as the final list is to only consist of three entries.

Some background: The wave file contains several interviews being recorded continuously with pauses in between. A VAD gave me areas where it presumes voice to be. As I know the number of total conversations(f.i. 3, usually more which is why this makes sense) my goal is to determine them automatically. The cutlist is the raw output of my VAD which I want to turn into a usable cutlist for ffmpeg.

Hi, on this site the community will often edit your question to improve it and make it a better fit to the site. Rolling back such a change is generally frowned upon. In this case that final sentence was probably edited because it gives the impression that you are not interested in the reasoning behind solving this problem but just want a solution - as the FAQ (programmers.stackexchange.com/faq) says: "we are looking for questions that inspire answers that explain “why” and “how”.
–
Joris TimmermansMar 7 '13 at 13:34

3

The problem is a bit under-specified. For example, suppose the ranges are 2;12, 14;24, 26;36, 38;48 and the desired number of entries is three. What is the desired outcome?
–
Eric LippertMar 7 '13 at 13:56

1

@EricLippert This will not happen as the wave file contains several interviews being recorded continuously with pauses in between. A VAD gave me areas where it presumes voice to be. As I know the number of total conversations(f.i. 3) my goal is to filter them automatically. The cutlist is the raw output of my VAD.
–
user1505034Mar 7 '13 at 14:04

3 Answers
3

Let us consider a series of audio chunks separated by pauses, and let Li be the length of chunk i, and Pi the pause between chunk i and chunk i+1. So we have:

[ chunk 0, L0 = 15s ]..(P0 s of silence)..[ chunk 1, L1 = 7s ]...

If we merge chunks wherever Pi < P, we will get from a minimum of 1 chunk (when P >= max(Pi) ) to a maximum of N (when P < min(Pi)).

If we reject chunks of length less than L, the pauses will merge: by discarding chunk Cj, the pause between C*j-1* and C*j+1* becomes P*j-1*+L*j*+P*j*, and therefore the number of superchunks for any given P will increase.

The number of chunks for any given L will then monotonically decrease with increasing P, from a maximum of C*L* = number of chunks longer than L.

The result should be something like this:

So the area of interest will be sort of L-shaped (not necessarily one "cell" wide or tall), and seen from above, it could look like this:

#
##
###
###
#######
########
#########

So, given that one "exploration" of the array is going to cost O(N), you could start with a suitable value of (L,P), e.g. (0,0) and "walk" the array increasing L until you meet two points, one above, one below (or equal) to the desired threshold.

# 0
## 1
### 2
### 3
######A 4
####98765
#######

(Here, 0...9,A..F are the iterations. Note that at iteration 6 you also check the cell "above" the 6, as 4 is "above" the 5, so they cost double).

The cost decreases from O(L'P') (where L' is the maximum length you consider, P' is the maximum pause) to O(L'+P').

But a major clinch could be, what happens if the "intra-conversation" pause is longer than the "inter-conversation" pause?

I mean, if the interval between interviews is longer than any interval within the interviews, then all of the above is redundant: just hunt for the N longer pauses, and those will be the pauses between interviews.

What happens on the other hand if there is one "internal" pause that is longer than the space between interviews? Then, the above algorithm (actually, any length-based algorithm I can think of, unless the average length of an interview is known and reliable, and the extra pause is not too near to the beginning or end of the interview) will choose that pause as an interview splitter, and whatever is before (or after) will be assigned to the adjacent interview.

To address this issue, I think that you need to do a deeper inspection, maybe classifying chunks based on frequency distribution. You might still misattribute the first or last interviewer's chunk, if it is the same in two adjacent interviews and there's no reliable "script" (e.g. the interviews are always closed by the interviewer, etc.).:

I think this is an interesting question, and I don't have the solution. However, bear with me, this is going to be long, and not an implementation or even an answer (I deserve the downmodding in advance) but a rephrasing of the question with additional remarks and observations to attempt to condition the problem, which may lead you on the path to finding an implementation.

Remark: I wrote this before the explanation of the actual problem was added in the comments, so this may be overly generic, but I'll still post it.

Consider an ordered list of non-overlapping time chunks with a start and end time (where end time > start time).

We have a given filter with parameters pause_threshold and minimal_length that, in order:

Merges all time chunks t0 and t1 where t1.starttime - t0.endtime < pause_threshold.
This can be done in one pass, the merges do not affect the distance between merged time chunks.

Discards all time chunks t0 where t0.endtime - t0.starttime < minimal_length.
This also can be done in one pass, but I am assuming here that this must be done after the merge pass, because that one definitely affects the length of time chunks.

The actual question is: devise an algorithm for the following: For a given finite time-chunk list L and count c, determine pause_threshold and minimal_length such that after the two passes the list contains exactly c entries.

Observations:

A valid upper bound for pause_threshold is slightly larger than the largest time between two adjacent time chunks in L. This is easy to see: using this value for pass 1 of the algorithm would merge all the chunks resulting in only one entry, which is already overkill.

The total set of pause_thresholds to try is finite: it is the set of all unique distances between time chunks in L.

Similarly the minimal_length parameter is also bound. If you choose it as slightly larger than the length of the longest chunk, all chunks will be discarded, so that is an upper bound for the minimal_lengths to try. A bounded finite set of minimal_lengths to try is the set of unique chunk lengths in L plus 0 (the "no discards" value).

Now you know the problem is bound - you can simply try all possible combinations from the two sets and see if any of them arrive at a solution (i.e. the number of entries in the resulting list after applying the filter is equal to c).

This analysis does not reveal if an answer is always possible: it's trivially easy to prove that it's not the case in general: just consider a starting list L with fewer than c entries.

That observation leads to another angle of attack on the algorithm: the inductive attack.

If L has less than c entries, there is no solution possible.

If L has c entries it's already trivially correct so you want no merges or discards. A valid (but not unique) solution is pause_threshold and minimal_length 0.

If L has n > c entries, then n-c entries will have to be eliminated either through merging or discarding. A merge and discard have exactly the same effect: they reduce the number of chunks in the list by 1. Therefore you need n-c merges, n-c discards, or n-c merges + discards. This is where it becomes tricky, because you may not have unique pause lengths between chunks, nor unique lengths of chunks (before or after merges).

The reason it gets tricky with non-unique lengths or pauses is because you won't have a unique mapping of threshold values to the number of items eliminated. For example consider a chunk list with lengths [1 3 3 5 7]. Pick minimal_length value 2 and you eliminate 1 value. Pick 4 and you eliminate 3. There is no value you can pick to eliminate just 2, so you cannot solve it with discards alone.

...
I'm going to have to cut it short here, but I hope this can be the start of constructive community work on an interesting question!

I usually just ignore the problem with duplicates (non-unique pauses), because in reality the data are basically continuous (audio samples). The fact that they are discretized is a bit of a distraction.
–
nneonneoMar 7 '13 at 17:28

Then, there are n-1 gaps between adjacent entries. The Pausendauer determines which gaps are closed and which ones are not, and so there are at most n-1 useful possibilities for the Pausendauer (values between "useful" possibilities don't change the set of closed gaps, so they need not be tested).

After the Pausendauer step closes some number of gaps, the Minimallänge discards some number of segments. Because we have a specific target of output segments, Minimallänge must be set such that it discards all but k of the segments. Therefore, you can find Minimallänge simply by looking for the length of the kth largest segment and setting Minimallänge equal to that, minus one.

Therefore, we have an algorithm that will run in at most O(n2 log n) time: it tests each of the Pausendauer possibilities, and for each Pausendauer it will sort the segments by length and find the kth largest segment to set Minimallänge.

Observe that this means that, for anyPausendauer, there is always a Minimallänge that produces the desired number of outputs (ignoring ties). Therefore, you may want to slap on an additional constraint to minimize the parameters, e.g. to find the solution (P, M) that minimizes P + M, or something like that.

Observe that this means that, for any Pausendauer, there is always a Minimallänge that produces the desired number of outputs. It's trivial to produce a set that proves that that isn't true: [1-2,4-5,7-8], P=2, C=2
–
Jonathan RichMar 7 '13 at 17:58

@JonathanRich: Ties don't count. (See my comment to MadKeithV's answer). Note that I assert ties don't matter for this application because the data is essentially continuous in nature. (I'm sorry this wasn't clear in the original post).
–
nneonneoMar 7 '13 at 17:59

@nneonneo what would be k in your code?
–
user1505034Mar 8 '13 at 14:03