What I am using:
C#, Selenium (v 4.20), NUnit, Visual Studio
The Scenario:
I am trying to assert that all the text on a given web page is visible and is also the correct text. I have a handful of pages I need to verify, all of them have very large amounts of text. I am able to find each element for each paragraph like any other element assertion normally, and make sure all the .Text() of that element matches what I expect. However this is incredibly time consuming, and makes it difficult when those elements are subject to change.
Is there a better way of doing this? Like for example getting the page source text, and then comparing that to a file of expected text that is saved in my solution? That seems like a somewhat ‘heavy’ operation, but I don’t know of any other way.
I have tried asserting each element holding each paragraph of text, while this works for the most part it is incredibly time consuming. Also an issue I see is that the page text from the same paragraph/ section of text will appear in separate elements, making some areas difficult to assert properly. I am expecting that there must be a better way of doing this.
Use the selenium is_displayed method to get all the elements that are visual. Then use .text on the visual elements you created to collect all the visual text. Split the visual text into individual sentences and split your required text into individual sentences. Check that each required sentence exists within the visible sentences list, if it does then the webpage passes else it fails.
This shouldn’t be very time consuming since it is completely automated but anything using selenium is going to be slower than requests+beautiful soup because you are emulating the browser. If you are doing a lot of websites then I’d recommend first setting up requests+beautiful soup similarly get the webpage text and check it against your required text and once that has passed you can then use selenium to do a second check. This will be a lot faster assuming that the majority of the websites you are checking will fail the first check.
Make sure you implement rate limiting if you are accessing the same website repeatedly.