How do I use Selenium with the Python Scholarly Library to avoid being rate-limited by Google Scholar?

I’ve been trying to retrieve BibTeX citations based on article/paper names using the Scholarly Library on Python (more specifically, the scholarly.search_pubs function). However, after around 20-60 queries in a 24 hour period, Google Scholar starts rate limiting me (image of the consequential error message provided in both image and text):
Error Message in image form

MaxTriesExceededException: Cannot Fetch from Google Scholar.

To start, this is my BibTeX retrieval code (also provided in an image) (earlier in the code I have pip installed scholarly and pybtex):

BibTeX retrieval code

from scholarly import scholarly

def get_bibtex_citation(paper_title):
    # Search for the publication by title
    search_query = scholarly.search_pubs(paper_title)

    try:
        # Get the first result from the search query
        publication = next(search_query)
    
        # Fill in the details of the publication
       filled_publication = scholarly.fill(publication)
    
        # Get the BibTeX citation for the filled publication
         bibtex_citation = scholarly.bibtex(filled_publication)
    
        return bibtex_citation
    except StopIteration:
        return "No publication found with the given title."

paper_title = "" # <----=----{Input Paper Name Into Here}----=----
bibtex_citation = get_bibtex_citation(paper_title)
print(bibtex_citation)

It works (for the most part, I still have to fix a bug but I digress) until

After checking Scholarly’s documentation, I found out that I theoritically should be able to use a proxy to avoid the rate-limits from Google Scholar. As such, I tried to use some of Scholarly’s proxy tools (image of code and text of code provided):

Proxy Code

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
success = pg.FreeProxies()
scholarly.use_proxy(pg)`

However, it didn’t end up working at all and in fact all it did was extend the time the query took by 20 seconds just to fail anyway. The same error message occurred.

After searching a bit on some random forums, I saw that I should use selenium instead of proxies because proxies don’t worry that well or something? Something about too many people overloading proxies and therefore them taking a lot of time. I didn’t care that much about the time and I would use proxies even if they extended each query by a not unreasonable amount of time so long as they prevented me from getting rate limited, but I digress.

After that, I watched two videos on Selenium with Python.
(This video: https://www.youtube.com/watch?v=Xjv1sY630Uc)
(And this video:: https://www.youtube.com/watch?v=b5jt2bhSeXs)

I was able to set up Selenium after installing Chrome Canary and the respective ChromeDriver version for Canary (windows x64).

I have some simple code here that shows me messing around with selenium (in both image and text):

Just starting to use selenium

import selenium

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

PATH = "" # <--- path in here
driver = webdriver.Chrome(PATH)

driver.get("") # <--- website in here

print(driver.title)

search = find_element(by=By.CLASS_NAME, value="element to look for")# <--- put class name and element to look for in here

search.send_keys("test")
search.send_keys(Keys.RETURN)

time.sleep(1)

driver.quit()

However, even after watching these tutorials, I wasn’t able to make selenium code that helped me in my goal. What I’m thinking is to set up a user interface where I can complete the captchas for my scholarly program using selenium so I don’t get rate limited. If anyone could give me some guidance or help I would appreciate it!

New contributor

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 21:37

Thẻ: pythonselenium-webdriverbibtexgoogle-scholar

Thiết kế website giá rẻ

Danh mục

How do I use Selenium with the Python Scholarly Library to avoid being rate-limited by Google Scholar?