I want to get elements from a website using XPath expressions. I’m using the built-in Microsoft library MSXML2 to do it, but doesn´t have a getElementByXpath()
method. I’ve found this very good function getXPathElement()
shown below to get element by Xpath in this old thread and it works fine only with full xpath
expressions, but I need to find elements by Xpath that contains some text.
For example, if I want to get the element that contains the text HTML Editors
from the url https://www.w3schools.com/html/, the full Xpath = "/html/body/div[4]/div/div/a[3]"
but one option of Xpath based on a text could be Xpath = "//a[text()[contains(.,'HTML Editors')]]"
With this second XPath, the function fails. Is there a way to find this kind of Xpath expressions?
BTW: I know that in Selenium there is that option, but for what I’ve seen, that implies install Selenium driver in a tricky way since there is no a direct binding for VBA and if possible I’d like to avoid other installations.
This is my current code:
Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP60
Dim elem As HTMLBaseElement
url = "https://www.w3schools.com/html/"
oHttp.Open "GET", url, False
oHttp.send
Dim html As New HTMLDocument
html.body.innerHTML = oHttp.responseText
Set elem = getXPathElement("/html/body/div[4]/div/div/a[3]", html)
Debug.Print elem.innerText
End Sub
Public Function getXPathElement(sXPath As String, objElement As Object) As HTMLBaseElement
Dim sXPathArray() As String
Dim sNodeName As String
Dim sNodeNameIndex As String
Dim sRestOfXPath As String
Dim lNodeIndex As Long
Dim lCount As Long
' Split the xpath statement
sXPathArray = Split(sXPath, "/")
sNodeNameIndex = sXPathArray(1)
If Not InStr(sNodeNameIndex, "[") > 0 Then
sNodeName = sNodeNameIndex
lNodeIndex = 1
Else
sXPathArray = Split(sNodeNameIndex, "[")
sNodeName = sXPathArray(0)
lNodeIndex = CLng(Left(sXPathArray(1), Len(sXPathArray(1)) - 1))
End If
sRestOfXPath = Right(sXPath, Len(sXPath) - (Len(sNodeNameIndex) + 1))
Set getXPathElement = Nothing
For lCount = 0 To objElement.ChildNodes().Length - 1
If UCase(objElement.ChildNodes().Item(lCount).nodeName) = UCase(sNodeName) Then
If lNodeIndex = 1 Then
If sRestOfXPath = "" Then
Set getXPathElement = objElement.ChildNodes().Item(lCount)
Else
Set getXPathElement = getXPathElement(sRestOfXPath, objElement.ChildNodes().Item(lCount))
End If
End If
lNodeIndex = lNodeIndex - 1
End If
Next lCount
End Function