vault backup: 2025-03-04 11:17:00

This commit is contained in:
2025-03-04 11:17:00 +08:00
parent d1e51bfd2f
commit ff12c4f4ca
161 changed files with 1 additions and 2 deletions

View File

@@ -0,0 +1,35 @@
---
tags:
aliases:
date: 2024-11-10
time: 16:55:41
description:
---
**可以用來代替[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)**
## **Why [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is Overrated:**
**Speed:** Not very fast, when the size of a document is very big.
**Thread blocking:** Much like `Requests` itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites.
## **Instead What you should use:** `selectolax`
`selectolax` is a less famous library that uses `libxml2` for better performance and with less memory consumption.
```python
from selectolax.parser import HTMLParser
html_content = "<html><body><p>Test</p></body></html>"
tree = HTMLParser(html_content)
text = tree.css("p")[0].text()
print(text) # Output: Test
```
As it will turn out, by using `Selectolax`, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.
> **“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.**
# 參考來源
- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180)