欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

python模拟Facebook的requests方式登录(python采集帖子信息)

程序员文章站 2023-11-06 23:28:10
需求工作中需要采集FB上的帖子信息,目前FB只有小组中的帖子支持公开采集,其它个人的帖子需要登录上FB后方能采集,而分析登录的过程发现,post的请求体中有一段加密的信息,如下所示:请求的url为:link......

需求

工作中需要采集FB上的帖子信息,目前FB只有小组中的帖子支持公开采集,其它个人的帖子需要登录上FB后方能采集,而分析登录的过程发现,post的请求体中有一段加密的信息,如下所示:

  1. 请求的url为:https://www.facebook.com/
  2. 输入用户名密码后会跳转到 https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110
  3. 通过上面的url,可以发现是一个post请求,然后需要的参数为:
    python模拟Facebook的requests方式登录(python采集帖子信息)
    发现只有email参数,并没有发现password,但是通过分析第一步的url的html,可以发现这样一段:
    python模拟Facebook的requests方式登录(python采集帖子信息)

开始处理

# -*- coding: utf-8 -*-

import requests
import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

base_url = 'https://www.facebook.com'

user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " \
             "Chrome/76.0.3809.87 Safari/537.36"
cookie = 'locale=en_US;'
headers = {
    'user-agent': user_agent,
    'accept-language': 'en-US,en;q=0.5',
    'cookie': cookie
}

session = requests.session()
session.headers.update(headers)
response = session.get(url=base_url)
html = response.text
pattern = re.compile(r'<form.*?action=\"(.*?)\"')
action = pattern.findall(html, re.S)
action_url = [url for url in action if 'login/device-based' in url]
if action_url:
    action_url = action_url[0].replace('&amp;','&')
else:
    soup = BeautifulSoup(html, 'html.parser')
    form = soup.find('form', attrs={'method': 'post', 'id': 'login_form', 'action':True})
    if form:
        action_url = form['action'].replace('&amp;','&')
if not action_url:
    print('Get Login Url Error')
    action_url = 'https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110'
login_url = urljoin(base_url, action_url)
data = {
    'email': '你的邮箱或者手机号',
    'pass': '密码'
}
r = session.post(login_url, data=data)
cookies = requests.utils.dict_from_cookiejar(session.cookies)
print(cookies)

可以通过访问m.facebook.com来登录,与上面的方式一模一样:

# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
import re

base_url = 'https://m.facebook.com'

user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " \
             "Chrome/76.0.3809.87 Safari/537.36"
cookie = 'locale=en_US;'
headers = {
    'user-agent': user_agent,
    'accept-language': 'en-US,en;q=0.5',
    'cookie': cookie
}

session = requests.session()
session.headers.update(headers)
response = session.get(url=base_url)
html = response.text
pattern = re.compile(r'<form.*?action=\"(.*?)\"')
action = pattern.findall(html, re.S)
action_url = [url for url in action if 'login/device-based' in url]
if action_url:
    action_url = action_url[0].replace('&amp;','&')
else:
    soup = BeautifulSoup(html, 'html.parser')
    form = soup.find('form', attrs={'method': 'post', 'id': 'login_form', 'action':True})
    if form:
        action_url = form['action'].replace('&amp;','&')
if not action_url:
    print('Get Login Url Error')
    action_url = '/login/device-based/regular/login/?refsrc=https%3A%2F%2Fm.facebook.com%2F&lwv=100&refid=8'
login_url = urljoin(base_url, action_url)
data = {
    'email': '你的邮箱或者手机号',
    'pass': '密码'
}
r = session.post(login_url, data=data)
cookies = requests.utils.dict_from_cookiejar(session.cookies)
print(cookies)

本文地址:https://blog.csdn.net/minghao2164/article/details/107077621