2009年7月10日 星期五

dictionary appendix to MMSEG


appendix to the mmseg Chinese word segmentation tool
http://sites.google.com/site/sekewei/home/download/dictionary.zip
---
readme.txt

filelist:
getdic.c source to retrieve dictionary in use
getdic.exe binary to retrieve dictionary in use
gcc -o getdic -I../src getdic.c ../src/lexicon.c

mmsegmnt.c source to segment a chunk in simple mode
getchunk.c source to get a chunk for segmenting
mmseg.exe binary to segment words
gcc -o mmseg -I. mmseg.c search.c
segment.c b5char.c iofiles.c
lexicon.c message.c
..\dictionary\getchunk.c
..\dictionary\mmsegmnt.c
mmseg inp.txt out.txt ../lexicon/ complex verbose
usage: mmseg in out lexicon [complex|simple*] [verbose|standard|quiet*]
default options are simple and quiet

mklex1.c source to build dictionary tables CHR?.TXT
mklex1.exe binary to build dictionary tables CHR?.TXT
gcc -o mklex1 -I. ..\dictionary\mklex1.c
mklex1 TSAIWORD.TXT

mklex2.c source to build dictionary tables CHR?.LEX
mklex2.exe binary to build dictionary tables CHR?.LEX
gcc -o mklex2 -I. ..\dictionary\mklex2.c
mklex2 .

mklex3.c source to build dictionary tables CHR?.INX
mklex3.exe binary to build dictionary tables CHR?.INX
gcc -o mklex3 -I. ..\dictionary\mklex3.c
mklex3 .

mmsegwords_raw.txt retrieved dictionary in use (137546 items)
mmsegwords.txt revised dictionary (137555 items)
mmsegwords_sort.txt revised dictionary sorted (137555 items)
TSAIWORD.TXT separate sorted dictionary by Tsai (137425 items)
diff.txt difference between mmsegwords_sort.txt
and TSAIWORD.TXT

沒有留言: