Home Tips & Tricks How to Trim Strings and Keep HTML Tags – Snippet Included

How to Trim Strings and Keep HTML Tags – Snippet Included

There are a lot of methods for trimming strings that take into account the number of chars versus the number of words, but things get a bit complicated when we need to keep the HTML tags at the same time. Don’t they?

How to Trim Strings While Still Keeping HTML Tags

Consistent appearance

Most of the time, when displaying content for archives, one of the challenges is to make all the items look similar, or have almost the same size. This is directly related to the content of each post.

We can use the WordPress native function `wp_trim_words` that allows us to get a substring that contains a specified number of words. That is not working as we intend all the time, as some words a longer, and the resulting string after applying this type of trimming can vary to a great degree.

How to solve this

So, to get more uniformity for the items we display, we are aiming to use a function that takes into account the number of chars versus the number of words. There are a lot of methods to do so, but things get a bit complicated when we need to keep the markup at the same time (like preserving tags, for example, links, bold or italic text, etc.).

If you find the article useful and would like to support my work, please consider making a donation / buy me a coffee, or share this article on your feed.
Thank you very much!

PHP function

This is a simple snippet you can use to achieve this.

/**
 * Get short HTML, keeping specific tags.
 *
 * @param  string  $string     The initial string to be truncated.
 * @param  integer $max_len    The maximum number of chars for the returned string.
 * @param  string  $end_string Trailing string.
 * @param  string  $allow_tags Preserve HTML tags.
 * @param  bool    $break      Break the last word to a fixed length (defaults to false).
 * @return string
 */
function get_short_html( $string, $max_len = 80, $end_string = '...', $allow_tags = '<a><b><strong><em><i>', $break = false ) {
	if ( empty( $string ) || mb_strlen( $string ) <= $max_len ) {
		return $string;
	}

	// Prepare the string for the match.
	$string = strip_shortcodes( $string );
	$string = str_replace( array( "\r\n", "\r", "\n", "\t" ), ' ', $string ); // phpcs:ignore
	$string = preg_replace( '/\>/i', '> ', $string );
	$string = preg_replace( '/\</i', ' <', $string );
	$string = preg_replace( '/[\x00-\x1F\x7F]/u', '', $string );
	$string = str_replace( ' ', ' ', $string );
	$string = preg_replace( '/\s+/', ' ', $string );
	$string = preg_replace( '/\s\s+/', ' ', trim( strip_tags( $string, $allow_tags ) ) );
	$string = html_entity_decode( $string );

	// Check for HTML tags and plain text.
	$words_tags  = preg_split( '/(<[^>]*[^\/]>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE );
	$current_len = 0;
	$collection  = [];
	$opened_tags = [];
	if ( ! empty( $words_tags ) ) {
		foreach ( $words_tags as $item ) {
			if ( $current_len >= $max_len ) {
				// No need to continue.
				break;
			}
			if ( substr_count( $item, '<' ) && substr_count( $item, '>' ) ) {
				// This is a tag, let's collect it.
				$collection[] = $item;
				if ( substr_count( $item, '</' ) ) {
					// This is an ending tag, let's remove the opened one.
					array_pop( $opened_tags );
				} elseif ( substr_count( $item, '/>' ) ) {
					// This is a self-closed tag, nothing to do.
					continue;
				} else {
					// This is an opening tag, let's add it to the opened list.
					$t = explode( ' ', $item );
					array_push( $opened_tags, substr( $t[0], 1 ) );
				}
			} else {
				// This is a plain text, let's assess the length and maybe collect it.
				$words = preg_split( '/\s/i', $item, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE );
				if ( ! empty( $words ) ) {
					foreach ( $words as $word ) {
						// Add + 1 as spaces count too.
						$new_lenght = $current_len + mb_strlen( $word ) + 1;
						if ( $new_lenght <= $max_len ) {
							$collection[] = $word . ' ';
						} else {
							if ( true === $break ) {
								$diff         = $max_len - $new_lenght - 1;
								$collection[] = substr( $word, 0, $diff ) . ' ';
							}
						}
						$current_len = $new_lenght;
						if ( $current_len >= $max_len ) {
							break;
						}
					}
				}
			}
		}
	}

	$string = implode( '', $collection );
	if ( ! empty( $opened_tags ) ) {
		// There were some HTML tags opened that need to be closed.
		array_reverse( $opened_tags );
		foreach ( $opened_tags as $tag ) {
			$string .= '</' . $tag;
		}
	}

	// One final round of preparing the returned string.
	$string = trim( $string );
	$string = preg_replace( '/<[^\/>][^>]*><\/[^>]+>/', '', $string );
	$string = preg_replace( '/(\s+\<\/+)+/', '</', $string );
	$string = preg_replace( '/(\s+\,+)+/', ',', $string );
	$string = preg_replace( '/(\s+\.+)+/', '.', $string );

	// Maybe append the custom ending to the trimmed string.
	$string .= ( ! empty( $end_string ) ) ? ' ' . $end_string : '';

	return $string;
}

The function above allows you to keep only the HTML tags you want and works for both self-closed tags and container tags, to append a trailing string (like read more, etc.), trim to a fixed length or a very close length without breaking the last word.