When scrapping HTML-pages it can be usefull to get elements based on CSS-selectors. Therefore I wrote a nifty function that builds a XPath-query that can be used to select the elements in a DOMXPath-document.
The function can be found below:
<?php
/**
* Convert a CSS-selector into an xPath-query
*
* @return string
* @param string $selector The CSS-selector
*/
function buildXPathQuery($selector)
{
// redefine
$selector = (string) $selector;
// the CSS selector
$cssSelector = array( // E F: Matches any F element that is a descendant of an E element
'/(\w)\s+(\w)/',
// E > F: Matches any F element that is a child of an element E
'/(\w)\s*>\s*(\w)/',
// E:first-child: Matches element E when E is the first child of its parent
'/(\w):first-child/',
// E + F: Matches any F element immediately preceded by an element
'/(\w)\s*\+\s*(\w)/',
// E[foo]: Matches any E element with the "foo" attribute set (whatever the value)
'/(\w)\[([\w\-]+)]/',
// E[foo="warning"]: Matches any E element whose "foo" attribute value is exactly equal to "warning"
'/(\w)\[([\w\-]+)\=\"(.*)\"]/',
// div.warning: HTML only. The same as DIV[class~="warning"]
'/(\w+|\*)?\.([\w\-]+)+/',
// E#myid: Matches any E element with id-attribute equal to "myid"
'/(\w+)+\#([\w\-]+)/',
// #myid: Matches any E element with id-attribute equal to "myid"
'/\#([\w\-]+)/'
);
// the xPath-equivalent
$xPathQuery = array( '\1//\2',
'\1/\2',
'*[1]/self::\1',
'\1/following-sibling::*[1]/self::\2',
'\1 [ @\2 ]',
'\1[ contains( concat( " ", @\2, " " ), concat( " ", "\3", " " ) ) ]',
'\1[ contains( concat( " ", @class, " " ), concat( " ", "\2", " " ) ) ]',
'\1[ @id = "\2" ]',
'*[ @id = "\1" ]'
);
// return
return (string) '//'. preg_replace($cssSelector, $xPathQuery, $selector);
}
?>
In a post that will be published in the near future you 'll see why I really needed it.