diff --git a/21. 資料收集/Programming/Python/Plotly.md b/21. 資料收集/Programming/Python/Plotly.md new file mode 100644 index 0000000..3733652 --- /dev/null +++ b/21. 資料收集/Programming/Python/Plotly.md @@ -0,0 +1,34 @@ +--- +tags: +aliases: +date: 2024-11-10 +time: 16:58:43 +description: +--- + +**可以用來代替[Matplotlib](https://matplotlib.org/)** + +Yes, `Matplotlib` is classic-it’s virtually the standard to go to when it comes to visualizing data in Python. But to be frank, it feels so much like trying to use an axe for delicate brain surgery, and its syntax? A little verbose, if we’re being honest. If you’re not creating highly customized visualizations, there are better options with a more straightforward syntax. + +## Why [Matplotlib](https://matplotlib.org/) is Overrated: + +**Clunky syntax**: Even simple charts take an amazingly large number of lines to plot sometimes. + +**Outdated default style:** The default style is configurable, but it isn’t exactly inspiring-or, for that matter, particularly readable. + +## What You Should Replace It With: Plotly + +Where visualization cleanliness and interactivity matter, and definitely don’t want a pile of code, `Plotly` is great. This is especially useful when you have to share visuals fast or within presentations on the web. + +```python +import plotly.express as px + +df = px.data.iris() +fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species") +fig.show() +``` + +With `Ploty`, you immediately get interactive graphs with great default visuals. The code is more concise and, by default, includes things like tooltips and zooming. + +# 參考來源 +- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180) diff --git a/21. 資料收集/Programming/Python/Polars.md b/21. 資料收集/Programming/Python/Polars.md new file mode 100644 index 0000000..f771bf3 --- /dev/null +++ b/21. 資料收集/Programming/Python/Polars.md @@ -0,0 +1,34 @@ +--- +tags: +aliases: +date: 2024-11-10 +time: 16:57:31 +description: +--- + +**可以用來代替[pandas](https://pandas.pydata.org/)** + +Now, listen up-the thing is, `Pandas` is great at data exploration and for middle-sized datasets. But people just use it for everything, like it’s some magic solution that’s going to solve every problem in data, and quite frankly, it isn’t. Working with `Pandas` on huge datasets can turn your machine into a sputtering fan engine, and memory overhead just doesn’t make sense for some workflows. + +## **Why [pandas](https://pandas.pydata.org/) Is Overrated:** + +**Memory Usage:** As `Pandas` operates mainly in-memory, any operation on a large dataset will badly hit performance. + +**Limited Scalability:** Scaling with `Pandas` isn’t easy. It was never designed for big data. + +## What You Should Use Instead: Polars + +`Polars` is an ultra-fast DataFrame library in Rust using Apache Arrow. Optimized for memory efficiency and multithreaded performance, this makes it perfect for when you want to crunch data without heating up your CPU. + +```python +import polars as pl + +df = pl.read_csv("big_data.csv") +filtered_df = df.filter(pl.col("value") > 50) +print(filtered_df) +``` + +**Why** `**Polars**`**?** It will process data that would bring `Pandas` to its knees, and it handles operations in a fraction of the time. Besides that, it also has lazy evaluation-meaning it is only computing what’s needed. + +# 參考來源 +- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180) diff --git a/21. 資料收集/Programming/Python/PyTorch.md b/21. 資料收集/Programming/Python/PyTorch.md new file mode 100644 index 0000000..e773329 --- /dev/null +++ b/21. 資料收集/Programming/Python/PyTorch.md @@ -0,0 +1,45 @@ +--- +tags: +aliases: +date: 2024-11-10 +time: 17:00:12 +description: +--- + +**可以用來代替[scikit-learn](https://scikit-learn.org/stable/)** + +I know, `Scikit-Learn` isn’t supposed to be a deep learning library, but people use it as if it were. It is incredibly handy at quick prototyping and traditional machine learning models, but when it comes to neural networks, it’s just not in the same league as a library designed with tensors in mind. + +## Why [scikit-learn](https://scikit-learn.org/stable/) is Overrated: + +**No GPU Support:** Deep learning can be life-changing when training on GPUs. However, this is something that is not supported in `Scikit-Learn`. + +**Not Optimized for Neural Networks:** `Scikit-learn` wasn’t designed for doing deep learning; using it this way is reactively assured poor results. + +## What You Should Use Instead: PyTorch + +`PyTorch` is more general and supports GPU. Hence, it’s perfect for deep learning projects. It’s Pythonic-this means for one coming from `Scikit-Learn`, it will feel natural, but with much more power. + +import torch +import torch.nn as nn +import torch.optim as optim + +# Define a simple model +```python +model = nn.Sequential( + nn.Linear(10, 5), + nn.ReLU(), + nn.Linear(5, 2) +) +``` + +# Define optimizer and loss +```python +optimizer = optim.SGD(model.parameters(), lr=0.01) +loss_fn = nn.CrossEntropyLoss() +``` + +If you’re serious about deep learning, you’ll want to use a library worked out for the task at hand-which will save you from such limitations and inefficiencies. You will fine tune models with `PyTorch` and leverage the GPUs to your heart’s content. + +# 參考來源 +- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180) \ No newline at end of file diff --git a/21. 資料收集/Programming/Python/httpx.md b/21. 資料收集/Programming/Python/httpx.md new file mode 100644 index 0000000..3e41f95 --- /dev/null +++ b/21. 資料收集/Programming/Python/httpx.md @@ -0,0 +1,36 @@ +--- +tags: +aliases: +date: 2024-11-10 +time: 16:54:12 +description: +--- + +**可以用來代替[requests](https://pypi.org/project/requests/)** + +## **Why [requests](https://pypi.org/project/requests/) is Overrated:** + +**Blocking IO:** `Requests` is synchronous, which means each call waits for the previous call to finish. This is less than ideal when working with I/O-bound programs. + +**Heavy:** It’s got loads of convenience baked in, but it does have a cost in terms of speed and memory footprint. Not a big deal on a simple script, but on larger systems this can be a resource hog. + +## **What You Should Instead Use:** `httpx` + +For parallel processing of requests, `httpx`provides a similar API but with asynchronous support. So, if you make many API calls, it’ll save you some time and resources because it will process those requests concurrently. + +```python +import httpx + +async def fetch_data(url): + async with httpx.AsyncClient() as client: + response = await client.get(url) + return response.json() + +# Simple and non-blocking +data = fetch_data("https://api.example.com/data") +``` + +> **Pro Tip:** Asynchronous requests can reduce the processing time by a great amount if the task at hand is web scraping or ingesting data from somewhere. + +# 參考來源 +- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180) diff --git a/21. 資料收集/Programming/Python/selectolax.md b/21. 資料收集/Programming/Python/selectolax.md new file mode 100644 index 0000000..1d4e028 --- /dev/null +++ b/21. 資料收集/Programming/Python/selectolax.md @@ -0,0 +1,35 @@ +--- +tags: +aliases: +date: 2024-11-10 +time: 16:55:41 +description: +--- + +**可以用來代替[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)** + +## **Why [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is Overrated:** + +**Speed:** Not very fast, when the size of a document is very big. + +**Thread blocking:** Much like `Requests` itself, it is not designed with async in mind, which certainly makes it ill-suited for scraping dynamic websites. + +## **Instead What you should use:** `selectolax` + +`selectolax` is a less famous library that uses `libxml2` for better performance and with less memory consumption. + +```python +from selectolax.parser import HTMLParser + +html_content = "

Test

" +tree = HTMLParser(html_content) +text = tree.css("p")[0].text() +print(text) # Output: Test +``` + +As it will turn out, by using `Selectolax`, you retain the same HTML parsing capabilities but with much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive. + +> **“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper tool is half the battle.** + +# 參考來源 +- [5 Overrated Python Libraries (And What You Should Use Instead) | by Abdur Rahman | Nov, 2024 | Python in Plain English](https://python.plainenglish.io/5-overrated-python-libraries-and-what-you-should-use-instead-106bd9ded180)