使用ANTLR处理文本

程序员文章站 2022-07-08 16:42:52

...

引用

使用 Antlr 处理文本
https://www.ibm.com/developerworks/cn/java/j-lo-antlrtext/index.html
该文章写的非常好,无耐是2011年写的,与现有的antlr版本差别较大,编译不过去,编译过去,也测试不出来正确的结果,以下为用antlr4.2重写的

新项目使用maven和ant构建,需要以下几个文件

pom.xml
build.xml
SqlExtrator.g4语法文件
SqlExtrator.clj测文件
Test.java 测试代码

测试方法,

先用ant执行compile任务,生成和编译生成的一堆词法解析器和语法解析器代码,

test

SqlExtrator.clj

使有Test.java,手动编程调用

使用ant任务的截图,
使用ANTLR处理文本

博客分类： java java

pom.xml


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.xxx.lang</groupId>
	<artifactId>fieldTypeUpdate</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>fieldTypeUpdate</name>
	<url>http://maven.apache.org</url>


	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	</properties>

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>3.8.1</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.antlr</groupId>
			<artifactId>antlr4</artifactId>
			<version>4.2</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.codehaus.mojo</groupId>
				<artifactId>build-helper-maven-plugin</artifactId>
				<version>1.8</version>
				<executions>
					<execution>
						<id>add-source</id>
						<phase>generate-sources</phase>
						<goals>
							<goal>add-source</goal>
						</goals>
						<configuration>
							<sources>
								<source>src/generated/java</source>
							</sources>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>

</project>

build.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <project basedir="." default="test" name="mylang">
    <property environment="env"/>
    <property name="debuglevel" value="source,lines,vars"/>
    <property name="target" value="1.8"/>
    <property name="source" value="1.8"/>
    <property name="language" value="sqlExtrator"/>
    <path id="mylang.classpath">
    	<pathelement location="lib/antlr-2.7.7.jar"/>
		<pathelement location="lib/antlr-runtime-3.5.jar"/>
		<pathelement location="lib/antlr4-4.2.jar"/>
		<pathelement location="lib/antlr4-annotations-4.2.jar"/>
		<pathelement location="lib/antlr4-runtime-4.2.jar"/>
		<pathelement location="lib/junit-3.8.1.jar"/>
		<pathelement location="lib/org.abego.treelayout.core-1.0.1.jar"/>
		<pathelement location="lib/ST4-4.0.7.jar"/>
		<pathelement location="lib/stringtemplate-3.2.1.jar"/>
    </path>
    
    <path id="antlr.classpath">
        <pathelement location="antlr-4.7.1-complete.jar"/>
    </path>
    
    <path id="compile.path">
    	 <pathelement location="target/classes"/>
    </path>
    

    
    <target name="clean">
        <delete dir="target"></delete>
    	<delete dir="src/main/java/com/xxx/lang/mylang/${language}"></delete>
    </target>
    
    <target depends="clean" name="gen">
        <echo message="generate java from g4 file"/>
       <java classname="org.antlr.v4.Tool" fork="yes" failonerror="true">
    				<classpath refid="mylang.classpath"/>
    				<arg value="src/main/resources/SqlExtrator.g4"/>
    				<arg line="-package "/>
    				<arg value="com.xxx.lang.mylang.${language}"/>
    				<arg line="-o "/>
    				<arg value="src/main/java/com/xxx/lang/mylang/${language}/"/>
       				<arg value="-visitor"/>
       				<arg value="-no-listener"/>
       				<arg value="-encoding"/>
       				<arg value="UTF-8"/>
    			</java>
    </target>
    
    <target depends="gen" name="compile">
        <echo message="compile generate java file"/>
         <mkdir dir="target/classes"/>
        <javac debug="true" debuglevel="${debuglevel}" destdir="target/classes" includeantruntime="false" source="${source}" target="${target}">
            <src path="src/main/java"/>
        	<compilerarg line="-encoding UTF-8 "/>
            <classpath refid="mylang.classpath"/>
        </javac>
    </target>

    	
    
    <target name="test"  description="Run the main class" >
    			<java classname="org.antlr.v4.gui.TestRig" fork="yes" failonerror="true">
    				<classpath refid="antlr.classpath"/>
    				<classpath refid="compile.path"/>
    				<sysproperty key="file.encoding" value="UTF-8"/>
    				<arg value="com.xxx.lang.mylang.${language}.SqlExtrator"></arg>
    				<arg value="sql"></arg>
    				<arg value="-gui"></arg>
    				<arg value="src/test/java/SqlExtrator.clj"></arg>
    			</java>
    </target>
    	

</project>

SqlExtrator.g4 语法文件该语法文件,仅可以识别词法规定的字符,词法外的字符将会报错

	

grammar SqlExtrator; 


WS : (' ' |'\t' |'\r' |'\n' )+  ; 
 
INT: '0'..'9' + ;   
  
ID : ('a'..'z' |'A'..'Z' |'_' ) ('a'..'z' |'A'..'Z' |'_' |'0'..'9' )*;

EOL: ('\n' | '\r' | '\r\n')*;

SUCCESS:'DB20000I  The SQL command completed successfully.'EOL  ; 

SqlFrg :'INSERT INTO SYSA.' ID '(' ID ',' ID ')' WS 'VALUES' '(\'' ID '\',\'' INT '\')'EOL ;
 

txt:mysql=SqlFrg {System.out.println($mysql.text);} SUCCESS;

sql:(txt)+;

第二个版本的语法,添加了:
FILTER: .? -> skip;
仅这一行,这行代码,使用正则的非贪婪匹配规则,

引用

Wildcard Operator and Nongreedy Subrules

正则表达式贪婪与非贪婪模式

1.什么是正则表达式的贪婪与非贪婪匹配

　　如：String str="abcaxc";

　　　　Patter p="ab.*c";

　　贪婪匹配:正则表达式一般趋向于最大长度匹配，也就是所谓的贪婪匹配。如上面使用模式p匹配字符串str，结果就是匹配到：abcaxc(ab.*c)。

　　非贪婪匹配：就是匹配到结果就好，就少的匹配字符。如上面使用模式p匹配字符串str，结果就是匹配到：abc(ab.*c)。

2.编程中如何区分两种模式

　　默认是贪婪模式；在量词后面直接加上一个问号？就是非贪婪模式。

　　量词：{m,n}：m到n个

　　　　　*：任意多个

　　　　　+：一个到多个

　　　　　？：0或一个


	
grammar SqlExtrator; 



SqlFrg :'INSERT INTO SYSA.' ID '(' ID ',' ID ')' WS 'VALUES' '(\'' ID '\',\'' INT '\')' ;
 


fragment WS : (' ' |'\t' |'\r' |'\n' )+  ; 
 
fragment ID: ('a'..'z' |'A'..'Z' |'_' ) ('a'..'z' |'A'..'Z' |'_' |'0'..'9' )*; 

fragment INT: '0'..'9' + ;   

fragment EOL: '\n' | '\r' | '\r\n';

 SUCCESS:'DB20000I  The SQL command completed successfully.' ;



all: (SqlFrg    SUCCESS  {System.out.println($SqlFrg.text);})+ ; 

FILTER: .? -> skip;

SqlExtrator.clj 测试文件


INSERT INTO SYSA.IF_EMPUSRRLA(USRNUM,EMPNUM) VALUES('U037508','275159') 
DB20000I  The SQL command completed successfully. 

document.write(v+' test is '+result+'</br>');//该行代码在第一个版本的语法中会报错

INSERT INTO SYSA.IF_USRSTNRLA(USRNUM,STNNUM) VALUES('U037710','00026') 
DB20000I  The SQL command completed successfully.

Test.java 测试代码

public class Test {

	public static void main(String[] args)  {
		
		 try {
			String filename = "D:\\workplace\\fieldTypeUpdate\\src\\test\\java\\SqlExtrator.clj"; 
			 InputStream in = new FileInputStream(filename); 
			 ANTLRInputStream input = new ANTLRInputStream(in); 

			 SqlExtratorLexer lexer = new SqlExtratorLexer(input); 
			 
			 CommonTokenStream tokens = new CommonTokenStream(lexer); 
			 
			 SqlExtratorParser parser = new SqlExtratorParser(tokens);
			
			 parser.sql();
			System.out.println("done!");
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}

}

测试结果控制台输出:

引用

INSERT INTO SYSA.IF_EMPUSRRLA(USRNUM,EMPNUM) VALUES('U037508','275159')

INSERT INTO SYSA.IF_USRSTNRLA(USRNUM,STNNUM) VALUES('U037710','00026')

done!

查看图片附件

使用ANTLR处理文本

php实现使用正则将文本中的网址转换成链接标签

使用numpy和PIL进行简单的图像处理方法

如何处理Python3.4 使用pymssql 乱码问题

Oracle中使用DBMS_XPLAN处理执行计划详解

删除html标签得到纯文本可处理嵌套的标签

PHP使用GIFEncoder类处理gif图片实例

windows下关于sublime text2,HTML/CSS/JS Prettify插件使用路径问题处理

使用Jquery实现点击文字后变成文本框且可修改

文本中如何批量添加括号？使用EmEditor批量添加括号教程图解

CorelDRAW 12循序渐进之文本处理的方法介绍