36 lines
1.4 KiB
Markdown
36 lines
1.4 KiB
Markdown
---
|
||
tags:
|
||
aliases:
|
||
date: 2024-11-10
|
||
time: 16:55:41
|
||
description:
|
||
---
|
||
|
||
**可以用來代替[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)**
|
||
|
||
## **Why [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is Overrated:**
|
||
|
||
**Speed:** Not very fast, when the size of a document is very big.
|
||
|
||
**Thread blocking:** Much like `Requests` itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites.
|
||
|
||
## **Instead What you should use:** `selectolax`
|
||
|
||
`selectolax` is a less famous library that uses `libxml2` for better performance and with less memory consumption.
|
||
|
||
```python
|
||
from selectolax.parser import HTMLParser
|
||
|
||
html_content = "<html><body><p>Test</p></body></html>"
|
||
tree = HTMLParser(html_content)
|
||
text = tree.css("p")[0].text()
|
||
print(text) # Output: Test
|
||
```
|
||
|
||
As it will turn out, by using `Selectolax`, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.
|
||
|
||
> **“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.**
|
||
|
||
# 參考來源
|
||
- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180)
|