Python3处理文件中每个词的方法

程序员文章站 2023-04-08 09:14:13

本文实例讲述了Python3处理文件中每个词的方法。分享给大家供大家参考。具体实现方法如下： ''''' Created on Dec 21, 2012...

本文实例讲述了Python3处理文件中每个词的方法。分享给大家供大家参考。具体实现方法如下：

''''' 
Created on Dec 21, 2012 
处理文件中的每个词 
@author: liury_lab 
''' 
import codecs 
the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') 
for line in the_file: 
  for word in line.split(): 
    print(word, end = "|") 
the_file.close() 
# 若词的定义有变，可使用正则表达式 
# 如词被定义为数字字母，连字符或单引号构成的序列 
import re 
the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') 
print() 
print('************************************************************************') 
re_word = re.compile('[\w\'-]+') 
for line in the_file: 
  for word in re_word.finditer(line): 
    print(word.group(0), end = "|") 
the_file.close() 
# 封装成迭代器 
def words_of_file(file_path, line_to_words = str.split): 
  the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') 
  for line in the_file: 
    for word in line_to_words(line): 
      yield word 
  the_file.close() 
print() 
print('************************************************************************') 
for word in words_of_file('d:/text.txt'): 
  print(word, end = '|') 
def words_by_re(file_path, repattern = '[\w\'-]+'): 
  the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') 
  re_word = re.compile('[\w\'-]+') 
 
  def line_to_words(line): 
    for mo in re_word.finditer(line): 
      yield mo.group(0) # 原书为return，发现结果不对，改为yield 
  return words_of_file(file_path, line_to_words) 
print() 
print('************************************************************************') 
for word in words_by_re('d:/text.txt'): 
  print(word, end = '|')

希望本文所述对大家的Python程序设计有所帮助。

上一篇： numpy.ndarray 交换多维数组(矩阵)的行/列方法

下一篇：使用PHP 5.0创建图形的巧妙方法

Python3处理文件中每个词的方法

python3 对list中每个元素进行处理的方法

pycharm 将django中多个app放到同个文件夹apps的处理方法

Python3中常用的处理时间和实现定时任务的方法的介绍

Python3按一定数据位数格式处理bin文件的方法

Python3实现统计单词表中每个字母出现频率的方法示例

php中关于普通表单多文件上传的处理方法

C#中调用DLL时未能加载文件或程序集错误的处理方法(详解)

Python3处理文件中每个词的方法

在Python中处理大型文件的最快方法

python3 对list中每个元素进行处理的方法