BeautifulSoup查找网页所有类

BeautifulSoup查找网页所有类

BeautifulSoup查找网页所有类

  编写一个程序来查找给定网站 URL 的所有类。在 Beautiful Soup 中没有找到所有类的内置方法。本文晓得博客为你介绍使用BeautifulSoup查找网页所有类

  需要的模块:

  bs4 : Beautiful Soup(bs4) 是一个用于从 HTML 和 XML 文件中提取数据的 Python 库。这个模块没有内置在 Python 中。要安装此类型,请在终端中输入以下命令。

pip install bs4

  requestsRequests 允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置在 Python 中。要安装此类型,请在终端中输入以下命令。

pip install requests

  推荐:使用BeautifulSoup解析表和XML

1、在给定的 HTML 文档中查找类

在给定的HTML文档中查找类
  • 创建 HTML 文档。
  • 导入模块。
  • 将内容解析为 BeautifulSoup。
  • 按类名迭代数据
# html code
html_doc = """<html><head><title>Welcome to geeksforgeeks</title></head>
<body>
<p class="title"><b>Geeks</b></p>


<p class="body">geeksforgeeks a computer science portal for geeks
</body>
"""

# import module
from bs4 import BeautifulSoup

# parse html content
soup = BeautifulSoup( html_doc , 'html.parser')

# Finding by class name
soup.find( class_ = "body" )

输出:
<p class="body">geeksforgeeks a computer science portal for geeks
</p>

  推荐:Xpath与BeautifulSoup一起使用

2、在 URL中查找所有类

在URL中查找所有类
  • 导入模块
  • 制作请求实例并传递到 URL
  • 将请求传递给 Beautifulsoup() 函数
  • 然后我们将迭代所有标签并获取类名
# Import Module
from bs4 import BeautifulSoup
import requests

# Website URL
URL = 'https://learnpython.com/blog/'

# class list set
class_list = set()

# Page content from Website URL
page = requests.get( URL )

# parse html content
soup = BeautifulSoup( page.content , 'html.parser')

# get all tags
tags = {tag.name for tag in soup.find_all()}

# iterate all tags
for tag in tags:

	# find all element of tag
	for i in soup.find_all( tag ):

		# if tag has attribute of class
		if i.has_attr( "class" ):

			if len( i['class'] ) != 0:
				class_list.add(" ".join( i['class']))

print( class_list )


输出:
{'main-menu', 'main-menu__item main-menu__item--logout hide', 'site-header-home-navigation-hamburger-link', 'page-item active', 'blog-list-summary-info', 'main-menu__item main-menu__item--create-free-account', 'footer__bottom-text', 'footer__social-share', 'site-navigation', 
'footer__hr', 'blog-post-date', 'site-header-home-navigation-below showOnLogged hide', 'logout__full-name user-name-element', 'footer__main-section', 'footer__social-share-item', 'site-header-home-navigation-layer-menu-icon middle', 'main-menu__item main-menu__item--courses', 
'summary-blog-post-meta-list-author-name', 'footer__policies-list', 'site-header-home-navigation-layer-menu-icon top', 'logout-modal__link', 
'blog-post-featured-image blog-list-feature-image tall', 'site-header-home-navigation-hamburger-wrapper pages hide', 'site-header-home-navigation-hamburger-item showOnLogged hide', 'page-link', 'footer__assistance', 'library-modal__layer', 'summary-read-more-link button--link', 
'logout-modal__avatar avatar',
 'site-header-home-navigation-hamburger-layout', 'library-modal modal', 'learnpy-blog-navigation', 'site-header-home-navigation-layer-menu-cover-active', 'page-item disabled', 'footer__wrapper', 'footer__quick-link-list-item', 'summary-read-more blog-list', 
'footer__assistance-content', 'footer__quick-links', 'site-header-home-navigation-hamburger-item site-header-home-navigation-hamburger-item--articles', 'site-header', 'site-header-home-navigation-hamburger', 'button--primary', 
'site-header-home-navigation-layer-menu pages', 'site-header-home-navigation-below-item-button button--primary', 'blog-list-summary', 'logout-modal modal', 'logout-modal__separator-border', 
'blog-list-feature-image-link', 'footer__header', 'pagination', 'learnpy-blog-navigation-wrapper', 'to-top home', 'logout-modal__intro', 'blog-list-first-article', 'site-header-home-navigation-hamburger-item site-header-home-navigation-hamburger-item--courses', 'site-header-home-navigation-layer-menu-icon bottom', 'site-header-home-navigation-below-item-link', 
'site-header-home-navigation-layer-menu-icon pages', 
'button--footer', 'logout-modal__layer modal__layer', 'logout-modal__window modal__window', 'footer__follow-us', 'lazyload', 'learnpy-blog-navigation-item active', 'summary__content', 'footer__quick-link-list', 'summary-blog-post-meta-author-link', 'main-menu__item main-menu__item--library', 'blog-list-summary-title', 'logout__avatar avatar', 'footer', 
'site-header-home-navigation-below-item-link button--link', 
'footer__policies-list-item', 'page-item', 'blog-list-header-gradient', 'logout-modal__link logout-modal__link--logout', 'footer__vertabelo-link', 'blog-list-content tab-content', 'blog-list-header-background', 'main-menu__item main-menu__item--log-in', 'logout-modal__name user-name-element', 'button--ghost', 'footer__copyright', 'site-header-home-navigation-below hideOnLogged',
 'site-header-home-navigation-below-item', 
'blog-list-container', 'library-modal__window modal__window', 'site-header-home-navigation-layer-menu-cover', 'site-logo', 'footer__logo'}

  推荐:常用的50个Python模块列表

  推荐:BeautifulSoup教程


晓得博客,版权所有丨如未注明,均为原创
晓得博客 » BeautifulSoup查找网页所有类

转载请保留链接:https://www.pythonthree.com/python-beautifulsoup-find-all-class/

滚动至顶部