欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

FileNotFoundError: [Errno 2] No such file or directory: ‘errors.out‘ (python自然语言处理 5.6 最后的示例报错)

程序员文章站 2022-03-15 15:06:55
在使用python3.7运行Natural Language Processing with Python Chapter 5 的最后一个示例from nltk.tbl import demo as brill_demobrill_demo.demo()print(open("errors.out").read())时, 出现如下错误:Traceback (most recent call last): File "E:/Python Practice/NLP/Chapter5.py...

在使用python3.7运行Natural Language Processing with Python Chapter 5 的最后一个示例

from nltk.tbl import demo as brill_demo
brill_demo.demo()
print(open("errors.out").read())

时, 出现如下错误:

Traceback (most recent call last):
  File "E:/Python Practice/NLP/Chapter5.py", line 332, in <module>
    print(open("errors.out").read())
FileNotFoundError: [Errno 2] No such file or directory: 'errors.out'

字面意思就是说,该文件不存在,在当前目录查找后也确实没有。通过搜索没有找到现成的解决方法,于是在StackFlow询问,怀疑是nltk.tbl.demo模块的版本问题——是不是新的模块中有其他类似的生成errors.out文件的方法?

于是查看nltk/tbl/demo模块的源码,果然发现有一个类似的函数,如下

def demo_error_analysis():
    """
    Writes a file with context for each erroneous word after tagging testing data
    """
    postag(error_output="errors.txt")

根据注释,发现这个函数的功能正是生成类似errors.out的文件。于是自然就想到,我们只要首先执行demo_error_analysis()函数,然后读取生成的文件就好啦,

brill_demo.demo_error_analysis()

然而事情往往没有那么简单。。。运行后报错如下:

Traceback (most recent call last):
  File "E:/Python Practice/NLP/Chapter5.py", line 331, in <module>
    brill_demo.demo_error_analysis()
  File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 124, in demo_error_analysis
    postag(error_output="errors.txt")
  File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 322, in postag
    u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n" #
TypeError: can't concat str to bytes

跟随提示的路径找到报错所在的源文件,如下

  # writing error analysis to file
    if error_output is not None:
        with open(error_output, "w") as f:
            f.write("Errors for Brill Tagger %r\n\n" % serialize_output)
            f.write(
                u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"
            )
        print("Wrote tagger errors including context to {0}".format(error_output))

那么报错的意思就是说,在下面这一行,生成error_list时出现类型转换的问题了

 u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"

通过查阅这篇文章,发现问题所在:encode函数返回的是bytes类型的变量,不可以直接和string类型的变量合并,需要再调用decode函数,把bytes类型转变为string类型。

因此,解决方法很简单,即把这一行改成

u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8").decode() + "\n" #add .decode()

(修改时可能会出现提示信息询问是否确认修改,放心大胆的改吧朋友们,如果不放心的话后面注释一下修改的内容,向我上面那样做)

经过小小的改动之后,再次运行 

brill_demo.demo_error_analysis()

这时候就正常啦!

Loading tagged data from treebank... 
Read testing data (200 sents/5251 wds)
Read training data (800 sents/19933 wds)
Read baseline data (800 sents/19933 wds) [reused the training set]
Trained baseline tagger
    Accuracy on test set: 0.8366
Training tbl tagger...
TBL train (fast) (seqs: 800; tokens: 19933; tpls: 24; min score: 3; min acc: None)
Finding initial useful rules...
    Found 12799 useful rules.

           B      |
   S   F   r   O  |        Score = Fixed - Broken
   c   i   o   t  |  R     Fixed = num tags changed incorrect -> correct
   o   x   k   h  |  u     Broken = num tags changed correct -> incorrect
   r   e   e   e  |  l     Other = num tags changed incorrect -> incorrect
   e   d   n   r  |  e
------------------+-------------------------------------------------------
  23  23   0   0  | POS->VBZ if Pos:PRP@[-2,-1]
  18  19   1   0  | NN->VB if Pos:-NONE-@[-2] & Pos:TO@[-1]
  14  14   0   0  | VBP->VB if Pos:MD@[-2,-1]
  12  12   0   0  | VBP->VB if Pos:TO@[-1]
  11  11   0   0  | VBD->VBN if Pos:VBD@[-1]
  11  11   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:VBP@[2]
  10  11   1   0  | VBN->VBD if Pos:PRP@[-1]
   9  10   1   0  | VBD->VBN if Pos:VBZ@[-1]
   8   8   0   0  | NN->VB if Pos:MD@[-1]
   7   7   0   1  | VB->NN if Pos:DT@[-1]
   7   7   0   0  | VB->VBP if Pos:PRP@[-1]
   7   7   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:VBZ@[2]
   7   8   1   0  | IN->RB if Word:as@[2]
   6   6   0   0  | VBD->VBN if Pos:VBP@[-2,-1]
   6   6   0   1  | IN->WDT if Pos:-NONE-@[1] & Pos:VBD@[2]
   5   5   0   0  | POS->VBZ if Pos:-NONE-@[-1]
   5   5   0   0  | VB->VBP if Pos:NNS@[-1]
   5   5   0   0  | VBD->VBN if Word:be@[-2,-1]
   4   4   0   0  | POS->VBZ if Pos:``@[-2]
   4   4   0   0  | VBP->VB if Pos:VBD@[-2,-1]
   4   6   2   3  | RP->RB if Pos:CD@[1,2]
   4   4   0   0  | RB->JJ if Pos:DT@[-1] & Pos:NN@[1]
   4   4   0   0  | NN->VBP if Pos:NNS@[-2] & Pos:RB@[-1]
   4   5   1   0  | VBN->VBD if Pos:NNP@[-2] & Pos:NNP@[-1]
   4   4   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:MD@[2]
   4   8   4   0  | VBD->VBN if Word:*@[1]
   4   4   0   0  | JJS->RBS if Word:most@[0] & Word:the@[-1] & Pos:DT@[-1]
   3   3   0   0  | VBD->VBN if Pos:VBN@[-1]
   3   4   1   0  | VBN->VB if Pos:TO@[-1]
   3   4   1   1  | IN->RB if Pos:.@[1]
   3   3   0   0  | JJ->RB if Pos:VBD@[1]
   3   3   0   0  | PRP$->PRP if Pos:TO@[1]
   3   3   0   0  | NN->VBP if Pos:NNS@[-1] & Pos:DT@[1]
   3   3   0   0  | VBP->VB if Word:n't@[-2,-1]
Trained tbl tagger in 2.45 seconds
    Accuracy on test set: 0.8572
Tagging the test data
Wrote tagger errors including context to errors.txt

我们可以看到当前目录下多出了一个errors.txt文件

FileNotFoundError: [Errno 2] No such file or directory: ‘errors.out‘ (python自然语言处理 5.6 最后的示例报错)

最后一步,读取并输出文件

print(open("errors.txt").read())

输出内容如下(部分):

Errors for Brill Tagger None

             left context |    word/test->gold     | right context
--------------------------+------------------------+--------------------------
                          |      Soon/NN->RB       | ,/, T-shirts/NNS *ICH*-1/
n/IN the/DT corridors/NNS |      that/IN->WDT      | *T*-2/-NONE- carried/VBD 
NNS that/WDT *T*-2/-NONE- |    carried/VBN->VBD    | the/DT school/NN 's/POS f
D the/DT school/NN 's/POS |    familiar/NN->JJ     | red-and-white/JJ GHS/NNP 
ool/NN 's/POS familiar/JJ |  red-and-white/NN->JJ  | GHS/NNP logo/NN on/IN the
iliar/JJ red-and-white/JJ |      GHS/NN->NNP       | logo/NN on/IN the/DT fron
/NN ,/, the/DT shirts/NNS |     read/VBP->VBD      | ,/, ``/`` We/PRP have/VBP
,/, ``/`` We/PRP have/VBP |      all/DT->PDT       | the/DT answers/NNS ./. ''
JJ colleagues/NNS are/VBP |      angry/NN->JJ      | at/IN Mrs./NNP Yeargin/NN
n/NNP Rice/NNP ,/, who/WP |   *T*-100/NN->-NONE-   | had/VBD discovered/VBN th
VBD discovered/VBN the/DT |      crib/JJ->NN       | notes/NNS ./.
             ``/`` We/PRP |      work/NN->VBP      | damn/RB hard/RB at/IN wha
    ``/`` We/PRP work/VBP |      damn/NN->RB       | hard/RB at/IN what/WP we/
/IN what/WP we/PRP do/VBP |   *T*-101/NN->-NONE-   | for/IN damn/RB little/JJ 
VBP *T*-101/-NONE- for/IN |      damn/NN->RB       | little/JJ pay/NN ,/, and/
...

至此,我们就解决了最初的问题~

赶在双十一的尾巴总结一下这个困扰我两三个小时的问题,希望对后来者有帮助~

本文地址:https://blog.csdn.net/qq_36332660/article/details/109632925