appendix to the mmseg Chinese word segmentation tool
http://sites.google.com/site/sekewei/home/download/dictionary.zip
---
readme.txt
filelist:
getdic.c source to retrieve dictionary in use
getdic.exe binary to retrieve dictionary in use
gcc -o getdic -I../src getdic.c ../src/lexicon.c
mmsegmnt.c source to segment a chunk in simple mode
getchunk.c source to get a chunk for segmenting
mmseg.exe binary to segment words
gcc -o mmseg -I. mmseg.c search.c
segment.c b5char.c iofiles.c
lexicon.c message.c
..\dictionary\getchunk.c
..\dictionary\mmsegmnt.c
mmseg inp.txt out.txt ../lexicon/ complex verbose
usage: mmseg in out lexicon [complex|simple*] [verbose|standard|quiet*]
default options are simple and quiet
mklex1.c source to build dictionary tables CHR?.TXT
mklex1.exe binary to build dictionary tables CHR?.TXT
gcc -o mklex1 -I. ..\dictionary\mklex1.c
mklex1 TSAIWORD.TXT
mklex2.c source to build dictionary tables CHR?.LEX
mklex2.exe binary to build dictionary tables CHR?.LEX
gcc -o mklex2 -I. ..\dictionary\mklex2.c
mklex2 .
mklex3.c source to build dictionary tables CHR?.INX
mklex3.exe binary to build dictionary tables CHR?.INX
gcc -o mklex3 -I. ..\dictionary\mklex3.c
mklex3 .
mmsegwords_raw.txt retrieved dictionary in use (137546 items)
mmsegwords.txt revised dictionary (137555 items)
mmsegwords_sort.txt revised dictionary sorted (137555 items)
TSAIWORD.TXT separate sorted dictionary by Tsai (137425 items)
diff.txt difference between mmsegwords_sort.txt
and TSAIWORD.TXT
2009年7月10日 星期五
dictionary appendix to MMSEG
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言