2019-09-03发表2022-02-22更新Python随记2 分钟读完 (大约305个字)

Python实现图像文字识别

使用的实际是tesseract这个OCR引擎。如果识别的有中文，需要添加中文的chi_sim.traineddata。

我这里使用的是windows，下载是Windows Installer made with MinGW-w64，在安装过程中注意有个选项，展开可以看到各种语言的数据，勾上chi_sim.traineddata即可。

我这里使用的是Python3.7，首先安装必备的依赖

1 2	pip install pytesseract pip install pillow

我使用的Pycharm，直接Alt+Enter导入。

完整的代码如下：

import pytesseract
from PIL import Image

image = Image.open("../pic/c.png")
code = pytesseract.image_to_string(image,lang="chi_sim",config="-psm 6")
print(code)

直接运行可能会报错，会提示tesseract识别不了，我看到其他人都是直接修改pytesseract源文件

1	tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

我这里的做法是将C:\Program Files (x86)\Tesseract-OCR\添加到环境变量PATH中。

有可能不会生效，需要重启Pycharm。

OCR识别不是100%准确，我这里测试的结果是，可能会多或者少一些字符。

参考资料：

https://blog.csdn.net/github_33304260/article/details/79155154

Python实现图像文字识别

https://jingzhouzhao.github.io/archives/99680cca.html

作者

太阳当空赵先生

发布于

2019-09-03

更新于

2022-02-22

许可协议

#Python OCR

爱发电支付宝

送我杯咖啡微信

Python实现图像文字识别

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

评论

链接

分类

归档

最新文章

标签

follow.it