Monday, May 26, 2014

NLP (Natural Language Processing) libraries for Korean language

There are mainly 2 available NLP libraries for Korean language. The first one is open source and the other is closed source with limited period of license.

1. Kookmin NLP library
(http://nlp.kookmin.ac.kr/)

This library has better dictionary, and word spacing feature which gives a nice output. It has a very long development history and considered to be one the best NLP libraries for Korean language. Its features include: automatic word spacing, morphological analyzer, noun extraction and others.
However, the license is free only for non-commercial purpose.
The download page is in :http://nlp.kookmin.ac.kr/HAM/kor/download.html
But the latest version you can find in this blog: http://cafe.daum.net/nlpk


2. Hannanum project ("http://kldp.net/projects/hannanum")

Good thing about this project is it is fully open-source.
Developed by KAIST graduates using JAVA programming language.
The dictionaries and grammatical rules are open to change and improve.
Also it is available in "R" programming language (you can use it by installing  KoNlp library in R)
I liked the simplicity and and its openness about this project, however its dictionary lacks many common korean words (I added the word 세월호 manually as it was not there), no auto spacing words and missing some grammar rules.

Conclusion:


Kookmin NLP library

Pros: better auto word spacing, rich dictionary, better recognizing of nouns, verbs, adj and etc..
Cons: closed source, license is not free, command line based

Hannanum project

Pros: open-source, Java library, free, simplicity, available in R
Cons: small dictionary, no auto spacing, missing grammar rules(e.g: 학생들 was not recognized, even the word 학생 was in dictionary)

No comments:

Post a Comment