Parse HTML using php

January 3, 2011 § 3 Comments

I found it useful for me, thanks to the original provider www.phpro.org

function getTextBetweenTags($tag, $html, $strict=0)
{
/*** a new dom object ***/
$dom = new domDocument;

/*** load the html into the object ***/
if($strict==1)
{
$dom->loadXML($html);
}
else
{
$dom->loadHTML($html);
}

/*** discard white space ***/
$dom->preserveWhiteSpace = false;

/*** the tag by its tag name ***/
$content = $dom->getElementsByTagname($tag);

/*** the array to return ***/
$out = array();
foreach ($content as $item)
{
/*** add node value to the out array ***/
$out[] = $item->nodeValue;
}
/*** return the results ***/
return $out;
}

$html = '<body>
<h1>Heading</h1>
<a href="http://phpro.org">PHPRO.ORG</a>
<p>paragraph here</p>
<p>Paragraph with a <a href="http://phpro.org">LINK TO PHPRO.ORG</a></p>
<p>This is a broken paragraph
</body>';

$content = getTextBetweenTags('a', $html);

foreach( $content as $item )
{
echo $item.'<br />';
}

Advertisements

Tagged: , ,

§ 3 Responses to Parse HTML using php

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading Parse HTML using php at ARP's Web Blog.

meta

%d bloggers like this: