欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Mac安装Tesseract,使用Tess4j实现OCR识别

程序员文章站 2022-07-06 10:39:27
...

Tesseract是一个开源的OCR引擎,支持多国语言,其官方地址:https://github.com/tesseract-ocr/tesseract

文档地址:https://tesseract-ocr.github.io/docs/

1.MAC下安装Tesseract

命令安装brew install --with-training-tools tesseract,现在提示 Error: invalid option: --with-training-tools,没有--with-training-tools参数,想把训练工具training-tools一起安装了,最后采用编译的方式安装

# Packages which are always needed.
brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
# Packages required for training tools.
brew install pango
# Optional packages for extra features.
brew install libarchive
# Optional package for builds using g++.
brew install gcc
 
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
mkdir build
cd build
# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.
../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig
make -j
# Optionally install Tesseract.
sudo make install
# Optionally build and install training tools.
make training
sudo make training-install

之后下载语言包

下载.traineddata文件 并且拷贝到tessdata文件夹下。

具体语言包地址:https://github.com/tesseract-ocr/tessdata

都执行完后,可以控制台执行命令看一下识别的结果:tesseract 111.jpg stdout 

Mac安装Tesseract,使用Tess4j实现OCR识别

安装参考文章:https://www.freesion.com/article/1345377723/

 

2.Java语言识别,tess4j开发OCR识别

引入tess4j的maven依赖

<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.5.1</version>
</dependency>

执行识别demo代码:

public class Tess4jOcrTest {

    public static void main(String[] args) {
        String bath = "/Users/seapeak/Desktop/";
        test1(bath + "555.jpg");
    }

    /**
     * 根据路径识别文字结果
     * @param path
     */
    public static void test1(String path) {
        File file = new File(path);
        ITesseract it = new Tesseract();
        // 如果没有改变tessdata目录位置请输入.
//        it.setDatapath(".");
//        // 如果变更过tessdata目录请指定位置
        it.setDatapath("/Users/seapeak/Desktop/it/java/tesseract/tessdata/");
        //如果是汉字居多设置语言,如果字符偏多设置eng
        it.setLanguage("chi_sim");
        try {
            String result = it.doOCR(file);
            log.info("识别结果:"+result );
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            log.error("Tess4jOcrTest TesseractException:{}",e);
        }
    }

}

执行时如果如下未找到language的错误,则设置setDatapath的tessdata目录

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
参考文档:https://blog.csdn.net/chenhailonghp/article/details/102704842

 

3.training-tools 训练工具 的使用,待续

可以参考:

https://blog.csdn.net/guanzhen3657/article/details/81138868?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

https://blog.csdn.net/dcrmg/article/details/53677739

https://www.freesion.com/article/1345377723/

https://blog.csdn.net/kangshuaibing/category_7973951.html

https://blog.csdn.net/u010670689/article/details/78374623

相关标签: java 开发工具