Files

36 lines
1.4 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
tags:
aliases:
date: 2024-11-10
time: 16:55:41
description:
---
**可以用來代替[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)**
## **Why [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is Overrated:**
**Speed:** Not very fast, when the size of a document is very big.
**Thread blocking:** Much like `Requests` itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites.
## **Instead What you should use:** `selectolax`
`selectolax` is a less famous library that uses `libxml2` for better performance and with less memory consumption.
```python
from selectolax.parser import HTMLParser
html_content = "<html><body><p>Test</p></body></html>"
tree = HTMLParser(html_content)
text = tree.css("p")[0].text()
print(text) # Output: Test
```
As it will turn out, by using `Selectolax`, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.
> **“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.**
# 參考來源
- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180)