DomCrawler filterXpath for emails

  domcrawler, filter, symfony, xpath

In my project I am trying to use filterXPath for emails. So I get an E-Mail via IMAP and put the mail body into my DomCrawler.

$crawler = new Crawler();
$crawler->addHtmlContent($mail->textHtml); //mail html content utf8

Now to my issue. I only want the plain text of the mail body, but still remain all new lines spaces etc – the exact same as the mail looks just in plain text without html (still with nr etc).

For that reason I tried using $crawler->filterXPath('//body/descendant-or-self::*/text()') to get every text node inside the mail.

However my test-mail containts html like:

            <a href="mailto:[email protected]">
                <span style="color:#0563C1">[email protected]</span>
            <a href="">
                <span style="color:#0563C1"></span>

In my mail this looks like [email protected] · (in one single line).

With my filterXPath I get multiple nodes which result in following (multiple lines):

[email protected]

I know that probably the might be the problem, which is a r, but since I can’t change the html in the mail, I need another solution – as mentioned before in the mail it is only a single line.

Please keep in mind, that my solution has to work for every mail – I do not know how the mail html looks like – it can change every time. So I need a generic solution.

I already tried using strip_tags too – this does not change the result at all.

My current approach:

$crawler = new Crawler();

$text = "";
foreach ($crawler->filterXPath('//body/descendant-or-self::*/text()') as $element) {
    $part = trim($element->textContent);
    if($part) {
        $text .= "|".$part."|n"; //to see whitespaces etc
echo $text;

|[email protected]|
| |
| |

Source: Symfony Questions