欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

python 爬虫 user-agent 生成

程序员文章站 2023-11-06 23:54:46
有些网站做了反爬技术,如:比较初级的通过判断请求头部中的user-agent字段来检测是否通过浏览器访问的。 在爬这类网站时需要模拟user-agent user-agent.txt 百度网盘 链接:https://pan.baidu.com/s/1ramkIyjVSI2_GXbxypj1Dg 提取 ......

有些网站做了反爬技术,如:比较初级的通过判断请求头部中的user-agent字段来检测是否通过浏览器访问的。

在爬这类网站时需要模拟user-agent

import random
import re
from typing import dict, list


class useragent:

    '''
    代理
    '''
    __filepath = 'user-agent.txt'

    '''
    对象实例
    '''
    __instance = none

    '''
    代理浏览器
    '''
    __dict: dict[str, list] = {}

    '''
    代理浏览器
    '''
    __list: list[str] = []

    '''
    初始化
    '''

    def __init__(self):
        reg = re.compile(r'firefox|chrome|msie|opera', re.i)
        with open(self.__filepath, 'r', encoding='utf_8_sig') as f:
            for r in f:
                result = reg.search(r) and reg.search(r).group().lower()
                if result and (not result in self.__dict):
                    self.__dict[result] = []
                result and self.__dict[result].append(r.strip())
                self.__list.append(r.strip())

    '''
    单例 - 构造函数
    '''
    def __new__(cls):
        if not cls.__instance:
            cls.__instance = super(useragent, cls).__new__(cls)
        return cls.__instance

    '''
    谷歌
    '''
    @property
    def chrome(self) -> str:
        return random.choice(self.__dict['chrome'])

    '''
    火狐
    '''
    @property
    def firefox(self) -> str:
        return random.choice(self.__dict['firefox'])

    '''
    ie
    '''
    @property
    def ie(self) -> str:
        return random.choice(self.__dict['msie'])

    '''
    opera 浏览器
    '''
    @property
    def opera(self) -> str:
        return random.choice(self.__dict['opera'])

    '''
    随机
    '''

    def random(self) -> str:
        return random.choice(self.__list)


    '''
    迭代
    '''
    def __iter__(self):
        self.__iter = iter(self.__list)
        return self

    '''
    下一个
    '''
    def __next__(self):
        return next(self.__iter)

    '''
    索引
    '''
    def __getitem__(self, index) -> str or list(str):
        return self.__list[index]


useragent = useragent()
print(useragent.random())

'''
for n in useragent:
    print(n)
'''

 

user-agent.txt

mozilla/5.0 (windows nt 6.1) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2228.0 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_10_1) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2227.1 safari/537.36
mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2227.0 safari/537.36
mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2227.0 safari/537.36
mozilla/5.0 (windows nt 6.3; wow64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2226.0 safari/537.36
mozilla/5.0 (windows nt 6.4; wow64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2225.0 safari/537.36
mozilla/5.0 (windows nt 6.3; wow64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2225.0 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2224.3 safari/537.36
mozilla/5.0 (windows nt 10.0) applewebkit/537.36 (khtml, like gecko) chrome/40.0.2214.93 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_10_1) applewebkit/537.36 (khtml, like gecko) chrome/37.0.2062.124 safari/537.36
mozilla/5.0 (windows nt 6.3; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/37.0.2049.0 safari/537.36
mozilla/5.0 (windows nt 4.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/37.0.2049.0 safari/537.36
mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/36.0.1985.67 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/36.0.1985.67 safari/537.36
mozilla/5.0 (x11; openbsd i386) applewebkit/537.36 (khtml, like gecko) chrome/36.0.1985.125 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_9_2) applewebkit/537.36 (khtml, like gecko) chrome/36.0.1944.0 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/35.0.3319.102 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/35.0.2309.372 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/35.0.2117.157 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_9_3) applewebkit/537.36 (khtml, like gecko) chrome/35.0.1916.47 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/34.0.1866.237 safari/537.36
mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/34.0.1847.137 safari/4e423f
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/34.0.1847.116 safari/537.36 mozilla/5.0 (ipad; u; cpu os 3_2 like mac os x; en-us) applewebkit/531.21.10 (khtml, like gecko) version/4.0.4 mobile/7b334b safari/531.21.10
mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/33.0.1750.517 safari/537.36
mozilla/5.0 (windows nt 6.2; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/32.0.1667.0 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_9_0) applewebkit/537.36 (khtml, like gecko) chrome/32.0.1664.3 safari/537.36
mozilla/5.0 (macintosh; intel mac os x 10_8_0) applewebkit/537.36 (khtml, like gecko) chrome/32.0.1664.3 safari/537.36
mozilla/5.0 (windows nt 5.1) applewebkit/537.36 (khtml, like gecko) chrome/31.0.1650.16 safari/537.36
mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/31.0.1623.0 safari/537.36
...... mozilla/5.0 (macintosh; u; intel mac os x 10_5_8; zh-cn) applewebkit/533.18.1 (khtml, like gecko) version/5.0.2 safari/533.18.5

 

百度网盘

链接:https://pan.baidu.com/s/1ramkiyjvsi2_gxbxypj1dg
提取码:hak8