52: Scraping Data from the Web
The two most popular uses of Python are Data Science and Web Scraping because web scraping typically feeds data to your data science pipeline. If you have an application that needs beer sales then scraping it off the ATF TTB website is probably your only solution. If you need to train a GPT model on text then scraping it off various forum websites is a good option. The web has so much data available, it's just in unfriendly visual formats.
Web scraping is also a great beginner topic for many of the same reasons as data munging:
- It's something everyone understands because they use browsers all day long. Most people have some concept of what a web page is.
- Web scraping doesn't require a ton of theory or computer science knowledge. You just need a way to get a web page and parse HTML raw for what you want.
- It's easy to manually download a page you want to study and then work on it for as long as you want.
- Just like data munging, the code is almost never "elegant." You're free to create the worst hacks possible to get it working and then refine later.
- It's also a very important part of many data science projects. Data science needs data. The web has a ton of data.
- Web scraping also leads you to automated testing of web applications, so you can do double education by learning it.
Register for Learn Python the Hard Way, 5th Edition (2023-2024)
Register today for the course and get the all currently available videos and lessons, plus all future modules for no extra charge.