How to Trim Strings and Keep HTML Tags – Snippet Included
There are a lot of methods for trimming strings that take into account the number of chars versus the number of words, but things get a bit complicated when we need to keep the HTML tags at the same time. Don’t they?
Consistent Appearance
Most of the time, when displaying content for archives, one of the challenges is to make all the items look similar, or have almost the same size. This is directly related to the content of each post.
We can use the WordPress native function wp_trim_words
that allows us to get a substring that contains a specified number of words. That is not working as we intend all the time, as some words a longer, and the resulting string after applying this type of trimming can vary to a great degree.
How to Solve This?
In order to get more uniformity for the items we display, we are aiming to use a function that takes into account the number of chars versus the number of words. There are a lot of methods to do so, but things get a bit complicated when we need to keep the markup at the same time (like preserving tags, for example, links, bold or italic text, etc.).
PHP Snippet
This is a simple snippet you can use to achieve this.
/**
* Get short HTML, keeping specific tags.
*
* @param string $string The initial string to be truncated.
* @param integer $max_len The maximum number of chars for the returned string.
* @param string $end_string Trailing string.
* @param string $allow_tags Preserve HTML tags.
* @param bool $break Break the last word to a fixed length (defaults to false).
* @return string
*/
function get_short_html( $string, $max_len = 80, $end_string = '...', $allow_tags = '<a><b><strong><em><i>', $break = false ) {
if ( empty( $string ) || mb_strlen( $string ) <= $max_len ) {
return $string;
}
// Prepare the string for the match.
$string = strip_shortcodes( $string );
$string = str_replace( array( "\r\n", "\r", "\n", "\t" ), ' ', $string ); // phpcs:ignore
$string = preg_replace( '/\>/i', '> ', $string );
$string = preg_replace( '/\</i', ' <', $string );
$string = preg_replace( '/[\x00-\x1F\x7F]/u', '', $string );
$string = str_replace( ' ', ' ', $string );
$string = preg_replace( '/\s+/', ' ', $string );
$string = preg_replace( '/\s\s+/', ' ', trim( strip_tags( $string, $allow_tags ) ) );
$string = html_entity_decode( $string );
// Check for HTML tags and plain text.
$words_tags = preg_split( '/(<[^>]*[^\/]>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE );
$current_len = 0;
$collection = [];
$opened_tags = [];
if ( ! empty( $words_tags ) ) {
foreach ( $words_tags as $item ) {
if ( $current_len >= $max_len ) {
// No need to continue.
break;
}
if ( substr_count( $item, '<' ) && substr_count( $item, '>' ) ) {
// This is a tag, let's collect it.
$collection[] = $item;
if ( substr_count( $item, '</' ) ) {
// This is an ending tag, let's remove the opened one.
array_pop( $opened_tags );
} elseif ( substr_count( $item, '/>' ) ) {
// This is a self-closed tag, nothing to do.
continue;
} else {
// This is an opening tag, let's add it to the opened list.
$t = explode( ' ', $item );
array_push( $opened_tags, substr( $t[0], 1 ) );
}
} else {
// This is a plain text, let's assess the length and maybe collect it.
$words = preg_split( '/\s/i', $item, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE );
if ( ! empty( $words ) ) {
foreach ( $words as $word ) {
// Add + 1 as spaces count too.
$new_lenght = $current_len + mb_strlen( $word ) + 1;
if ( $new_lenght <= $max_len ) {
$collection[] = $word . ' ';
} else {
if ( true === $break ) {
$diff = $max_len - $new_lenght - 1;
$collection[] = substr( $word, 0, $diff ) . ' ';
}
}
$current_len = $new_lenght;
if ( $current_len >= $max_len ) {
break;
}
}
}
}
}
}
$string = implode( '', $collection );
if ( ! empty( $opened_tags ) ) {
// There were some HTML tags opened that need to be closed.
array_reverse( $opened_tags );
foreach ( $opened_tags as $tag ) {
$string .= '</' . $tag;
}
}
// One final round of preparing the returned string.
$string = trim( $string );
$string = preg_replace( '/<[^\/>][^>]*><\/[^>]+>/', '', $string );
$string = preg_replace( '/(\s+\<\/+)+/', '</', $string );
$string = preg_replace( '/(\s+\,+)+/', ',', $string );
$string = preg_replace( '/(\s+\.+)+/', '.', $string );
// Maybe append the custom ending to the trimmed string.
$string .= ( ! empty( $end_string ) ) ? ' ' . $end_string : '';
return $string;
}
The function above allows you to keep only the HTML tags you want and works for both self-closed tags and container tags, to append a trailing string (like read more, etc.), trim to a fixed length or a very close length without breaking the last word.
Click the heart.
Are you interested in more programming tips and tricks?
How to Integrate Google reCAPTCHA v3 in Forms that Use AJAX Validation
I recently had to update one of the sites that is…
Brief Introduction to Acceptance Testing
Last month I survived my first public presentation, at Bucharest WordPress Meetup…
Exploring the Latest Post Shortcode Configuration and Results
In this article, we’ll take a closer look at a simple…
How to Implement a Custom Card Output for LPS
Here is a small snippet that allows you to use the Latest Post Shortcode plugin…