python selenium UI自动化解决验证码的4种方法

2020-02-16 11:28:00

字体：大中小

来源：转载

供稿：网友

本文介绍了python selenium UI自动化解决验证码的4种方法，分享给大家，具体如下：

测试环境

windows7+ firefox50+ geckodriver # firefox浏览器驱动 python3 selenium3

selenium UI自动化解决验证码的4种方法：去掉验证码、设置万能码、验证码识别技术-tesseract、添加cookie登录，本次主要讲解验证码识别技术-tesseract和添加cookie登录。

1. 去掉验证码

去掉验证码，直接通过用户名和密码登陆网站。

2. 设置万能码

设置万能码，就是不管什么情况，输入万能码，都可以成功登录网站。

3. 验证码识别技术-tesseract

准备条件

tesseract，下载地址：https://github.com/parrot-office/tesseract/releases/tag/3.5.1 Python3.x，下载地址：https://www.python.org/downloads/ pillow（Python3图像处理库）

安装好Python，通过pip install pillow安装pillow库。然后将tesseract中的tesseract.exe和testdata文件夹放到测试脚本所在目录下，testdata中默认有eng.traineddata和osd.traineddata，如果要识别汉语，请自行下载对应包。

以下是两个主要文件，TesseractPy3.py是通过python代码去调用tesseract以达到识别验证码的效果。code.py是通过selenium获取验证码图片，进而使用TesseractPy3中的函数得到验证码，实现网站的自动化登陆。

TesseractPy3.py

#coding=utf-8import osimport subprocessimport tracebackimport loggingfrom PIL import Image # 来源于Pillow库TESSERACT = 'tesseract' # 调用的本地命令名称TEMP_IMAGE_NAME = "temp.bmp" # 转换后的临时文件TEMP_RESULT_NAME = "temp" # 保存识别文字临时文件CLEANUP_TEMP_FLAG = True # 清理临时文件的标识INCOMPATIBLE = True # 兼容性标识def image_to_scratch(image, TEMP_IMAGE_NAME):  # 将图片处理为兼容格式  image.save(TEMP_IMAGE_NAME, dpi=(200,200))def retrieve_text(TEMP_RESULT_NAME):  # 读取识别内容  inf = open(TEMP_RESULT_NAME + '.txt','r')  text = inf.read()  inf.close()  return textdef perform_cleanup(TEMP_IMAGE_NAME, TEMP_RESULT_NAME):  # 清理临时文件  for name in (TEMP_IMAGE_NAME, TEMP_RESULT_NAME + '.txt', "tesseract.log"):    try:      os.remove(name)    except OSError:      passdef call_tesseract(image, result, lang):  # 调用tesseract.exe，将识读结果写入output_filename中  args = [TESSERACT, image, result, '-l', lang]  proc = subprocess.Popen(args)  retcode = proc.communicate()def image_to_string(image, lang, cleanup = CLEANUP_TEMP_FLAG, incompatible = INCOMPATIBLE):  # 假如图片是不兼容的格式并且incompatible = True，先转换图片为兼容格式（本程序将图片转换为.bmp格式），然后获取识读结果;如果cleanup=True,操作之后删除临时文件。  logging.basicConfig(filename='tesseract.log')  try:    try:      call_tesseract(image, TEMP_RESULT_NAME, lang)      text = retrieve_text(TEMP_RESULT_NAME)    except Exception:      if incompatible:        image = Image.open(image)        image_to_scratch(image, TEMP_IMAGE_NAME)        call_tesseract(TEMP_IMAGE_NAME, TEMP_RESULT_NAME, lang)        text = retrieve_text(TEMP_RESULT_NAME)      else:        raise    return text  except:     s=traceback.format_exc()    logging.error(s)  finally:    if cleanup:      perform_cleanup(TEMP_IMAGE_NAME, TEMP_RESULT_NAME)

上一篇：Python+OpenCV让电脑帮你玩微信跳一跳

下一篇：Windows下Anaconda的安装和简单使用方法