I have HTML strings stored in database, and I want to extract only the text content, stripping out all the HTML tags. What is the best way to do this in PHP?
I want to extract the plain text. How can I achieve this?
You can use PHP’s DOMDocument class to parse the HTML and extract the text content. Here’s how you can do it:
<?php
require_once 'includes/connection.inc.php';
function removeElementsByTagName($tagName, $document)
{
$nodeList = $document->getElementsByTagName($tagName);
for ($nodeIdx = $nodeList->length; --$nodeIdx >= 0;) {
$node = $nodeList->item($nodeIdx);
$node->parentNode->removeChild($node);
}
}
$fetch_data = "SELECT * FROM emails";
$fetch_data_conn = mysqli_query($connection, $fetch_data);
while ($row = mysqli_fetch_assoc($fetch_data_conn)) {
$content = $row["content"];
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($content);
libxml_clear_errors();
removeElementsByTagName('script', $doc);
removeElementsByTagName('style', $doc);
removeElementsByTagName('link', $doc);
$textContent = strip_tags($doc->saveHTML());
echo $textContent;
echo "<hr>";
}
?>
This code uses DOMDocument to parse the HTML and strip_tags to remove any remaining tags, leaving you with just the plain text content.
Muhammad Aleem is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.