如何在Python中获取Google页面排名？代码示例

2021年11月16日21:11:39 发表评论 920 次浏览

了解如何使用 Google Custom Search Engine API 在 Python 中获取特定页面的关键字位置排名。

Google 自定义搜索引擎 (CSE)是一种搜索引擎，可让开发人员在其应用程序中包含搜索功能，无论是桌面应用程序、网站还是移动应用程序。

Python如何获取Google页面排名？能够跟踪你在 Google 上的排名是一个方便的工具，尤其是当你是网站所有者并且你希望在撰写或编辑文章时跟踪你的页面排名时。

如何在Python中获取Google页面排名？在本教程中，我们将制作一个 Python 脚本，该脚本能够使用 CSE API获取你的域的页面排名。在我们深入研究之前，我需要确保你已设置 CSE API 并准备就绪，如果不是这样，请查看教程以开始使用 Python 中的自定义搜索引擎 API。

一旦你的搜索引擎启动并运行，请继续安装请求，以便我们可以轻松地发出 HTTP 请求：

pip3 install requests

Python获取Google页面排名示例介绍和解析 - 打开一个新的 Python 并继续。让我们从导入模块并定义我们的变量开始：

import requests
import urllib.parse as p

# get the API KEY here: https://developers.google.com/custom-search/v1/overview
API_KEY = "<INSERT_YOUR_API_KEY_HERE>"
# get your Search Engine ID on your CSE control panel
SEARCH_ENGINE_ID = "<INSERT_YOUR_SEARCH_ENGINE_ID_HERE>"
# target domain you want to track
target_domain = "xyz.com"
# target keywords
query = "google custom search engine api python"

同样，请查看本教程，其中我向你展示了如何获取API_KEY和SEARCH_ENGINE_ID。target_domain是你要搜索的域，query是目标关键字。举例来说，如果你想跟踪stackoverflow.com的"convert string to int python"关键字，然后你把他们target_domain和query分别。

现在，CSE 使我们能够看到前 10 个页面，每个搜索页面有 10 个结果，因此总共要检查 100 个 URL，下面的代码块负责遍历每个页面并在结果中搜索域名：

for page in range(1, 11):
    print("[*] Going for page:", page)
    # calculating start 
    start = (page - 1) * 10 + 1
    # make API request
    url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start={start}"
    data = requests.get(url).json()
    search_items = data.get("items")
    # a boolean that indicates whether `target_domain` is found
    found = False
    for i, search_item in enumerate(search_items, start=1):
        # get the page title
        title = search_item.get("title")
        # page snippet
        snippet = search_item.get("snippet")
        # alternatively, you can get the HTML snippet (bolded keywords)
        html_snippet = search_item.get("htmlSnippet")
        # extract the page url
        link = search_item.get("link")
        # extract the domain name from the URL
        domain_name = p.urlparse(link).netloc
        if domain_name.endswith(target_domain):
            # get the page rank
            rank = i + start - 1
            print(f"[+] {target_domain} is found on rank #{rank} for keyword: '{query}'")
            print("[+] Title:", title)
            print("[+] Snippet:", snippet)
            print("[+] URL:", link)
            # target domain is found, exit out of the program
            found = True
            break
    if found:
        break

如何在Python中获取Google页面排名？所以之后我们做一个API请求的每一页，我们遍历结果，并使用提取域名urllib.parse.urlparse()功能，看它是否符合我们的target_domain，我们正在使用的原因的endsWith（）函数，而不是双等于（==）是因为我们不想错过以www或其他子域开头的 URL 。

Python如何获取Google页面排名？脚本完成了，这是我的执行输出（当然是在替换了我的 API 密钥和搜索引擎 ID 之后）：

[*] Going for page: 1
[+] xyz.com is found on rank #3 for keyword: 'google custom search engine api python'
[+] Title: How to Use Google Custom Search Engine API in Python - Python ...
[+] Snippet: 10 results ... Learning how to create your own Google Custom Search Engine and use its
Application Programming Interface (API) in Python.
[+] URL: https://www.xyz.com/article/use-gcse-api-in-python

太棒了，该网站在该关键字上排名第三，这是另一个运行示例：

[*] Going for page: 1
[*] Going for page: 2
[+] xyz.com is found on rank #13 for keyword: 'make a bitly url shortener in python'
[+] Title: How to Make a URL Shortener in Python - Python Code
[+] Snippet: Learn how to use Bitly and Cuttly APIs to shorten long URLs programmatically
using requests library in Python.
[+] URL: https://www.xyz.com/article/make-url-shortener

在以上Python获取Google页面排名示例中，这次它转到了第二页，因为它在第一页中没有找到。如前所述，它将一直运行到第 10 页并停止。