vault backup: 2025-03-04 11:17:00

2025-03-04 11:17:00 +08:00
parent d1e51bfd2f
commit ff12c4f4ca
161 changed files with 1 additions and 2 deletions
--- a/Programming/Python/selectolax.md
+++ b/Programming/Python/selectolax.md
@@ -0,0 +1,35 @@
+---
+tags: 
+aliases: 
+date: 2024-11-10
+time: 16:55:41
+description:
+---
+
+**可以用來代替[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)**
+
+## **Why [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is Overrated:**
+
+**Speed:** Not very fast, when the size of a document is very big.
+
+**Thread blocking:** Much like `Requests` itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites.
+
+## **Instead What you should use:** `selectolax`
+
+`selectolax` is a less famous library that uses `libxml2` for better performance and with less memory consumption.
+
+```python
+from selectolax.parser import HTMLParser  
+  
+html_content = "<html><body><p>Test</p></body></html>"  
+tree = HTMLParser(html_content)  
+text = tree.css("p")[0].text()  
+print(text)  # Output: Test
+```
+
+As it will turn out, by using `Selectolax`, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.
+
+> **“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.**
+
+# 參考來源
+- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180)