How do I truncate an HTML string without breaking the HTML code?

Sometimes on a website, you just want to show the first few hundred characters of a text as an introduction, and link to the full text. But by simply using the PHP substr() function, you’re likely to break the HTML code or cut words in half. The PHP function below allows you to maintain your HTML and complete words while trimming your HTML string. The code is from the cakephp framework.

I just ported this over to C# for a project. I think there might be a bug in your ‘exact’ logic, though. Consider the case where your truncated HTML ends with:

In this case, the last space is *within* the EMBED tag, and your ‘exact’ code would result in invalid HTML.

I’m trying to implement a regex that will find spaces that are not between characters, and then truncating at the last match, instead. Let me know if you have any better ideas (or if I’m wrong altogether). =)

I just had an issue because I used spaces in my tags (like ) which was considered as the last space in text so the tag was broke right in the middle (moreover the tag was not inside the open tags list).