recipe-scrapers 核心API详解：从入门到精通的数据提取技巧-深圳市維司達科技有限公司

recipe-scrapers 核心API详解：从入门到精通的数据提取技巧

【免费下载链接】recipe-scrapersPython package for scraping recipes data项目地址: https://gitcode.com/gh_mirrors/re/recipe-scrapers

recipe-scrapers 是一个强大的 Python 包，专为从各种网站提取食谱数据而设计。它提供了简单易用的 API，让开发者能够轻松获取食谱的标题、食材、烹饪步骤等关键信息，无需深入了解网页解析的复杂细节。

快速入门：安装与基础使用

要开始使用 recipe-scrapers，首先需要安装这个包。你可以通过 pip 命令轻松安装：

pip install recipe-scrapers

安装完成后，你就可以在 Python 代码中导入并使用它了。下面是一个简单的示例，展示如何从一个食谱网页中提取信息：

from recipe_scrapers import scrape_me # 要抓取的食谱网页 URL url = "https://example.com/recipe" # 创建一个 scraper 对象 scraper = scrape_me(url) # 提取食谱信息 print("标题:", scraper.title()) print("食材:", scraper.ingredients()) print("烹饪步骤:", scraper.instructions())

核心 API 解析

主要 Scraper 类

recipe-scrapers 的核心是各种 Scraper 类，它们都继承自AbstractScraper。每个网站都有一个对应的 Scraper 类，例如：

AllRecipes：用于抓取 AllRecipes 网站的食谱
BBCGoodFood：用于抓取 BBC Good Food 网站的食谱
SeriousEats：用于抓取 Serious Eats 网站的食谱

这些类定义在recipe_scrapers目录下的各个文件中，如recipe_scrapers/allrecipes.py、recipe_scrapers/bbcgoodfood.py等。

常用方法

所有的 Scraper 类都提供了一系列用于提取食谱信息的方法，以下是一些最常用的方法：

1. title()

title()方法用于获取食谱的标题。例如：

print(scraper.title()) # 输出食谱标题

2. ingredients()

ingredients()方法返回一个包含食谱所需食材的列表。例如：

ingredients = scraper.ingredients() for ingredient in ingredients: print(ingredient)

3. instructions()

instructions()方法返回食谱的烹饪步骤，通常是一个字符串。有些 Scraper 还提供了instructions_list()方法，返回一个步骤列表。例如：

print(scraper.instructions()) # 输出烹饪步骤字符串 # 或者 steps = scraper.instructions_list() for step in steps: print(step)

4. total_time()

total_time()方法返回烹饪这道菜所需的总时间（以分钟为单位）。例如：

print("总时间:", scraper.total_time(), "分钟")

高级用法

处理异常

在使用 recipe-scrapers 时，可能会遇到各种异常情况，例如网站不支持、页面结构变化等。recipe-scrapers 提供了一些异常类来处理这些情况，定义在recipe_scrapers/_exceptions.py文件中。

常见的异常包括：

WebsiteNotImplementedError：当不支持某个网站时抛出
ElementNotFoundInHtml：当无法在网页中找到所需元素时抛出

你可以使用 try-except 块来捕获这些异常：

from recipe_scrapers._exceptions import WebsiteNotImplementedError, ElementNotFoundInHtml try: scraper = scrape_me(url) # 提取信息 except WebsiteNotImplementedError: print("不支持该网站") except ElementNotFoundInHtml as e: print(f"找不到元素: {e}")

自定义 Scraper

如果你需要从一个 recipe-scrapers 不支持的网站抓取食谱，你可以创建自定义的 Scraper 类。自定义 Scraper 应该继承自AbstractScraper，并实现必要的方法。

你可以参考templates/scraper.py文件来了解如何创建自定义 Scraper。

实际应用示例

下面是一个完整的示例，展示如何使用 recipe-scrapers 从多个网站抓取食谱信息：

from recipe_scrapers import scrape_me from recipe_scrapers._exceptions import WebsiteNotImplementedError def get_recipe_info(url): try: scraper = scrape_me(url) return { "title": scraper.title(), "ingredients": scraper.ingredients(), "instructions": scraper.instructions(), "total_time": scraper.total_time() } except WebsiteNotImplementedError: return {"error": "不支持该网站"} except Exception as e: return {"error": str(e)} # 测试多个 URL urls = [ "https://www.allrecipes.com/recipe/12345", "https://www.bbcgoodfood.com/recipes/67890", "https://www.seriouseats.com/recipe/abcde" ] for url in urls: print(f"URL: {url}") info = get_recipe_info(url) if "error" in info: print(f"错误: {info['error']}") else: print(f"标题: {info['title']}") print(f"食材: {info['ingredients'][:3]}...") # 只显示前3种食材 print(f"总时间: {info['total_time']}分钟") print("-" * 50)