Python provides multiple libraries to handle text data,some of the prominent libraries are
1) Natural Language Toolkit (NLTK)
One of the most commonly used library for tokenization, lemmatization ,stemming,parsing, chunking and POS tagging.
2) Gensim
This library is majorly used for topic modelling,document indexing and similarity retrieval with large corpus.Algorithms in Gensim are memory independent.
3) spaCy - Open source NLP library is majorly used for production usage for huge volumes of data.It supports tokenization for over 49 languages.
No comments:
Post a Comment