s represents start time
d represents duration
w represents word
(the number of r tag is NOT fixed and changes from row to row of xml_table)

The idea now is to sift through each row, and within each XML, calculate the longest consecutive duration when a 'SIL' or 's' appears as a in the w attribute and then to return this in a new table as longest_pause (i.e longest consecutive SIL/s duration) with word_id and word_data also.

So in the above example xml we have three consecutive periods where the longest_pause can occur where the total durations are 100 (100), 900 (350+550) and 350 (200 + 50 + 100) and therefore the longest_pause is 900 so 900 would be returned.

I added a ID field that is used in a recursive CTE to walk through the nodes and calculating a running sum where w is SIL or s. Then fetching the longest duration from the CTE using TOP(1) ... ORDER BY.

If you instead want the row in xml_table with the longest duration you can do like this.

The recursive CTE part works the same as before but but for multiple rows at the same time and it is getting the value for duration from the XML directly using the column node that is incremented for each iteration. The query against the CTE uses row_number() to find the longest duration for each row.

Have you considered using something like python instead?

You can query the SQL to get the data, then use regular expressions to extract the values from the XML, calculate the value wanted, then insert it back into the results table.

I recently did something slightly similar and decided doing the processing in python was a much easier way to do it if that's possible for you