Thiết kế website giá rẻ

Question

I intend to write a script which browse a web page from chrome with the following link:
https://isfinder.biotoul.fr/ from which i have to select “TOOLS” dropdown menu and then select “Blast”. I also manage to upload file in which is in fasta format but then under the heading “Algorithm parameters” changed parameter Evalue : 0.01 and then run blast, after the complete loading of the page I get the following page and save it in a file with name blast_results_TA373.html in a specific folder which i am able to do so.

I was able to download this page. Now I was able get the links of the table with respect to its query node under the heading “Sequence producing significant alignment ” For example link of ISEc1 and many others but I am unable to get contants of the links which should look like the following page:

Save it in a file with name for example jobtitle_query_node_ISEc5.html in a specific folder which i am unable to do so. Please kindly help me to achive this task and tell me where i made mistake.

I made the code to extract Query Identifiers and links which gave output like this.

Query Identifier: 123
Query Identifier: 123
Query Identifier: 123
Query Identifier: 2699
Query Identifier: 2699
Query Identifier: 2699
Link: https://www-is.biotoul.fr/under_construct.php
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc5
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISVsa13
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc5
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISSe1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISCysp7
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=IS1H
Link: https://www-is.biotoul.fr/scripts/ficheIS.ph

Also i tried to filter unwanted links but unable to do it completely. My links of interest are like this :

Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc5
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISVsa13
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISEc5
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISSe1
Link: https://www-is.biotoul.fr/scripts/ficheIS.php?name=ISCysp7

Here is my code:

# Necessary webdrivers need to be imported
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import StaleElementReferenceException
import os
import time

# This is for Chrome. Similarly if

# Firefox is needed, then it has to be specified
webBrowser = webdriver.Chrome()

# This will open Is finder site in chrome
webBrowser.get('https://www-is.biotoul.fr/index.php')

# Find and click on the 'TOOLS' link
tools_link = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.LINK_TEXT, 'TOOLS')))
tools_link.click()

# Find and click on the 'Blast' link
blast_link = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Blast')))
blast_link.click()

# Locate and interact with form elements
file_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='file']")))
file_input.send_keys("/Users/somil/Desktop/gene_bank.file/TA373.fasta")


# Enter job title
job_title_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input")))
job_title_input.send_keys("TA373")

# Get the job title entered
job_title = job_title_input.get_attribute('value')

# Locate the Evalue input field within the Algorithm parameters fieldset

evalue_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='expect']")))

# Clear the current value (if any) and input your custom value
evalue_input.clear()

#set input values 
evalue_input.send_keys("0.01")


#submit the task
submit_button = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable(( By.CLASS_NAME,"boutonblast")))
submit_button.click()
 
# Wait until the page is fully loaded
WebDriverWait(webBrowser, 60).until(EC.presence_of_all_elements_located((By.XPATH, "//*")))
time.sleep(100) ##############ADJUST_TIME_ACCORDING_TO_NEED#######################################################

# Create a directory based on job title
directory_name = f'{job_title}.IS_finder_results'
os.makedirs(directory_name, exist_ok=True)

# Once the content is fully loaded, download the page
page_content = webBrowser.page_source

# Save the page content to a file within the directory
file_name = f'blast_results_{job_title}.html'
file_path = os.path.join(directory_name, file_name)
with open(file_path, 'w', encoding='utf-8') as f:
    f.write(page_content)
###################################################PART_1_COMPLETE################################################################################

###############PART_2###############################

# Find all elements containing "Query" sections
query_sections = webBrowser.find_elements(By.XPATH, '//b[starts-with(text(), "Query=")]')

# Iterate over each "Query" section
for query_section in query_sections:
    # Extract the parent element's text
    parent_text = query_section.find_element(By.XPATH, './..').text

    # Extract the query identifier
    try:
        query_identifier = parent_text.split('=')[1].split()[0]
        print("Query Identifier:", query_identifier)
    except IndexError:
        print("Error: Unable to extract query identifier from:", parent_text)

# Find the first <a> element on the page
continue_link = webBrowser.find_element(By.TAG_NAME, 'a')

# Find all elements with the href attribute
elements_with_href = webBrowser.find_elements(By.XPATH, "//*[@href]")

# Iterate over all elements with the href attribute
for elem in elements_with_href:
    try:
        # Get the href attribute value
        href = elem.get_attribute("href")

        # Exclude links leading to NCBI and links starting with # and containing #BL_ORD_ID
        if "ncbi" not in href and not href.startswith("#") and "#BL_ORD_ID" not in href:
            print("Link:", href)
    except StaleElementReferenceException:
        print("Element is stale. Refinding...")
        # Refind the element
        elem = webBrowser.find_element(By.XPATH, f'//a[@href="{href}"]')
        # Get the href attribute value again
        href = elem.get_attribute("href")

Then I modified this code to perform the required task:

# Necessary webdrivers need to be imported
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import StaleElementReferenceException
import os
import time

# This is for Chrome. Similarly if

# Firefox is needed, then it has to be specified
webBrowser = webdriver.Chrome()

# This will open Is finder site in chrome
webBrowser.get('https://www-is.biotoul.fr/index.php')

# Find and click on the 'TOOLS' link
tools_link = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.LINK_TEXT, 'TOOLS')))
tools_link.click()

# Find and click on the 'Blast' link
blast_link = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Blast')))
blast_link.click()

# Locate and interact with form elements
file_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='file']")))
file_input.send_keys("/Users/somil/Desktop/gene_bank.file/TA373.fasta")


# Enter job title
job_title_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input")))
job_title_input.send_keys("TA373")

# Get the job title entered
job_title = job_title_input.get_attribute('value')

# Locate the Evalue input field within the Algorithm parameters fieldset

evalue_input = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='expect']")))

# Clear the current value (if any) and input your custom value
evalue_input.clear()

#set input values
evalue_input.send_keys("0.01")


#submit the task
submit_button = WebDriverWait(webBrowser, 60).until(EC.element_to_be_clickable(( By.CLASS_NAME,"boutonblast")))
submit_button.click()

# Wait until the page is fully loaded
WebDriverWait(webBrowser, 60).until(EC.presence_of_all_elements_located((By.XPATH, "//*")))
time.sleep(100) ##############ADJUST_TIME_ACCORDING_TO_NEED#######################################################

# Create a directory based on job title
directory_name = f'{job_title}.IS_finder_results'
os.makedirs(directory_name, exist_ok=True)

# Once the content is fully loaded, download the page
page_content = webBrowser.page_source

# Save the page content to a file within the directory
file_name = f'blast_results_{job_title}.html'
file_path = os.path.join(directory_name, file_name)
with open(file_path, 'w', encoding='utf-8') as f:
    f.write(page_content)
###################################################PART_1_COMPLETE################################################################################

###############PART_2###############################
# Find all elements containing "Query" sections
query_sections = webBrowser.find_elements(By.XPATH, '//b[starts-with(text(), "Query=")]')

# Iterate over each "Query" section
for query_section in query_sections:
    # Extract the parent element's text
    parent_text = query_section.find_element(By.XPATH, './..').text

    # Extract the query identifier
    try:
        query_identifier = parent_text.split('=')[1].split()[0]
        print("Query Identifier:", query_identifier)
    except IndexError:
        print("Error: Unable to extract query identifier from:", parent_text)

    # Find all elements with the href attribute again (to avoid StaleElementReferenceException)
    elements_with_href = webBrowser.find_elements(By.XPATH, "//*[@href]")

    # Iterate over all elements with the href attribute
    for elem in elements_with_href:
        try:
            # Get the href attribute value
            href = elem.get_attribute("href")

            # Exclude unwanted links
            exclude_links = [
                "biotoul.fr/styles/", "biotoul.fr/blast/", "biotoul.fr/index.php"
            ]
            if any(link in href for link in exclude_links):
                continue

            # Exclude links leading to NCBI and links starting with # and containing #BL_ORD_ID
            if "ncbi" not in href and not href.startswith("#") and "#BL_ORD_ID" not in href:
                print("Link:", href)

                # Get the content of the link
                webBrowser.get(href)
                link_content = webBrowser.page_source

                # Extract the part after '?name=' from the href
                identifier = href.split('=')[1] if '=' in href else 'NoIdentifier'

                # Save the content to a file within the specific directory
                file_name = f'{job_title}.{identifier}.{query_identifier}.html'
                file_path = os.path.join(directory_name, file_name)
                with open(file_path, 'w', encoding='utf-8') as f:
                    f.write(link_content)

                print(f"Content saved for Link: {href}")
        except StaleElementReferenceException:
            print("Element is stale. Refinding...")
            # Refind the element
            elem = webBrowser.find_element(By.XPATH, f'//a[@href="{href}"]')
            # Get the href attribute value again
            href = elem.get_attribute("href")
        except NoSuchElementException:
            print(f"No such element found for href: {href}. Skipping...")
            continue

I get the following error :

Query Identifier: NODE_195
Link: https://www-is.biotoul.fr/general_information.php
Content saved for Link: https://www-is.biotoul.fr/general_information.php
Element is stale. Refinding...
Traceback (most recent call last):
  File "/Users/somil/Downloads/navigate_4.py", line 95, in <module>
    href = elem.get_attribute("href")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py", line 178, in get_attribute
    attribute_value = self.parent.execute_script(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 407, in execute_script
    return self.execute(command, {"script": script, "args": converted_args})["value"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 347, in execute
    self.error_handler.check_response(response)
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found
  (Session info: chrome=124.0.6367.158); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#stale-element-reference-exception
Stacktrace:
0   chromedriver                        0x0000000104e63ae8 chromedriver + 5217000
1   chromedriver                        0x0000000104e5b723 chromedriver + 5183267
2   chromedriver                        0x00000001049cd527 chromedriver + 406823
3   chromedriver                        0x00000001049dd814 chromedriver + 473108
4   chromedriver                        0x00000001049de10a chromedriver + 475402
5   chromedriver                        0x00000001049d3595 chromedriver + 431509
6   chromedriver                        0x00000001049de13b chromedriver + 475451
7   chromedriver                        0x00000001049d3595 chromedriver + 431509
8   chromedriver                        0x00000001049d189e chromedriver + 424094
9   chromedriver                        0x00000001049d4bfa chromedriver + 437242
10  chromedriver                        0x0000000104a5b6a4 chromedriver + 988836
11  chromedriver                        0x0000000104a3b702 chromedriver + 857858
12  chromedriver                        0x0000000104a5a6bf chromedriver + 984767
13  chromedriver                        0x0000000104a3b4a3 chromedriver + 857251
14  chromedriver                        0x0000000104a0bfe3 chromedriver + 663523
15  chromedriver                        0x0000000104a0c92e chromedriver + 665902
16  chromedriver                        0x0000000104e21a00 chromedriver + 4946432
17  chromedriver                        0x0000000104e27ab4 chromedriver + 4971188
18  chromedriver                        0x0000000104e024fe chromedriver + 4818174
19  chromedriver                        0x0000000104e285c9 chromedriver + 4974025
20  chromedriver                        0x0000000104df2784 chromedriver + 4753284
21  chromedriver                        0x0000000104e4ac78 chromedriver + 5115000
22  chromedriver                        0x0000000104e4ae37 chromedriver + 5115447
23  chromedriver                        0x0000000104e5b343 chromedriver + 5182275
24  libsystem_pthread.dylib             0x00007ff80f5051d3 _pthread_start + 125
25  libsystem_pthread.dylib             0x00007ff80f500bd3 thread_start + 15


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/somil/Downloads/navigate_4.py", line 125, in <module>
    elem = webBrowser.find_element(By.XPATH, f'//a[@href="{href}"]')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 741, in find_element
    return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 347, in execute
    self.error_handler.check_response(response)
  File "/Users/somil/miniconda3/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href="https://www-is.biotoul.fr/general_information.php"]"}
  (Session info: chrome=124.0.6367.158); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x0000000104e63ae8 chromedriver + 5217000
1   chromedriver                        0x0000000104e5b723 chromedriver + 5183267
2   chromedriver                        0x00000001049cd527 chromedriver + 406823
3   chromedriver                        0x0000000104a18ff2 chromedriver + 716786
4   chromedriver                        0x0000000104a19181 chromedriver + 717185
5   chromedriver                        0x0000000104a5d1d4 chromedriver + 995796
6   chromedriver                        0x0000000104a3b72d chromedriver + 857901
7   chromedriver                        0x0000000104a5a6bf chromedriver + 984767
8   chromedriver                        0x0000000104a3b4a3 chromedriver + 857251
9   chromedriver                        0x0000000104a0bfe3 chromedriver + 663523
10  chromedriver                        0x0000000104a0c92e chromedriver + 665902
11  chromedriver                        0x0000000104e21a00 chromedriver + 4946432
12  chromedriver                        0x0000000104e27ab4 chromedriver + 4971188
13  chromedriver                        0x0000000104e024fe chromedriver + 4818174
14  chromedriver                        0x0000000104e285c9 chromedriver + 4974025
15  chromedriver                        0x0000000104df2784 chromedriver + 4753284
16  chromedriver                        0x0000000104e4ac78 chromedriver + 5115000
17  chromedriver                        0x0000000104e4ae37 chromedriver + 5115447
18  chromedriver                        0x0000000104e5b343 chromedriver + 5182275
19  libsystem_pthread.dylib             0x00007ff80f5051d3 _pthread_start + 125
20  libsystem_pthread.dylib             0x00007ff80f500bd3 thread_start + 15

Please help me as I am unable to understand this error and also I am unable to achieve this task.

Thiết kế website giá rẻ

Danh mục

unable to get contant from the link using selinium