Files
Obsidian-Main/21.01. Programming/Python/selectolax.md

1.4 KiB

tags, aliases, date, time, description
tags aliases date time description
2024-11-10 16:55:41

可以用來代替Beautiful Soup Documentation

Why Beautiful Soup Documentation is Overrated:

Speed: Not very fast, when the size of a document is very big.

Thread blocking: Much like Requests itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites.

Instead What you should use: selectolax

selectolax is a less famous library that uses libxml2 for better performance and with less memory consumption.

from selectolax.parser import HTMLParser  
  
html_content = "<html><body><p>Test</p></body></html>"  
tree = HTMLParser(html_content)  
text = tree.css("p")[0].text()  
print(text)  # Output: Test

As it will turn out, by using Selectolax, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.

“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.

參考來源