spaCy.io

Blog

Natural Language Processing Software Badly Needs Some Deprecation Notices

by on

Imagine: you try to use Google Translate, but it asks you to first select which model you want. The new, awesome deep-learning model is there, but so are lots of others. You pick one that sounds fancy, but it turns out it's a 20-year old experimental model trained on a corpus of oven manuals. You are not interested in over manuals.

Of course, this is not how Google Translate operates. They make sure the model you use is good. This is what spaCy does, too. But most natural language understanding libraries, it's just not anybody's job to delete obsolete models. There's also a real reluctance to editorialize. Some advantage can be found for every model. Like, is it really fair to call that oven-specific model obsolete? In some ways we still have a lot to learn from its principled approach. And what if someone needs to translate an oven manual?

Have a look through the GATE software. There's a lot there, developed over 12 years and many person-hours. But there's approximately zero curation. The philosophy is just to provide things. It's up to you to decide what to use.

This is bad. It's bad to provide an implementation of MiniPar, and have it just...sit there, with no hint that it's 20 years old and should not be used. The RASP parser, too. Why are these provided? Worse, why is there no warning? Unless you want to investigate the history of the field, there's no reason to execute these programs in 2015.

Check out how Dekang Lin, the author of Minipar, presents the software – with reference to a benchmark on a Pentium II. This is the right way to archive the program. In this form its status is clear.

Various people have asked me why I decided to make a new Python NLP library, spaCy, instead of supporting the NLTK project. There are many things I dislike about the NLTK code-base, but the lack of curation is really my key complaint: the project simply doesn't throw anything away, and it refuses to call any technique or implementation good or bad.

In March NLTK announced the inclusion of a more up-to-date dependency parsing algorithm, based on the linear-time algorithm everyone is now using. There was some excitement about this, as this type of parser really should get much better accuracy than the other algorithms NLTK includes. But can you tell which of these parsers is the new one?

The best parser there – the new one – is called "transition parser". But it's still not actually good. Unfortunately, the NLTK implementation is based on Nivre's original 2003 paper, instead of using the recent research; and they use external, general-purpose machine learning libraries, instead of a simple custom implementation that would perform much better. Together these limitations mean the performance of the model is terrible, relative to the current state-of-the-art.

I happened to visit the NLTK issue tracker while they were discussing the transition-based parser, so I linked them to my post explaining how to implement this parser in 500 lines of Python. I got a "thanks but no thanks", and the issue was abruptly closed. Another researcher's offer from 2012 to implement this type of model also went unanswered.

An enormous amount of work has gone into, and is still going into, making NLTK an easily accessible way for computer science students to learn a little bit about linguistics, or for linguistics students to learn a little bit about computer science. I respect that work.

But nowhere does it say that if you want to really build something, or do up-to-date research, NLTK isn't for you. NLTK claims it can serve that use-case. But it can't. The implication is that if you use the models provided in NLTK, e.g. its chunker, tagger, dependency parser etc, these will be roughly equivalent to what you'll get elsewhere. But they're not. The gulf in quality is enormous. NLTK does not even know how its POS tagger was trained. The model is just this .pickle file that's been passed around for 5 years, its origins lost to time. This is not okay.

I think open source software should be very careful to make its limitations clear. It's a disservice to provide something that's much less useful than you imply. It's like offering your friend a lift and then not showing up. It's totally fine to not do something – so long as you never suggested you were going to do it. There are ways to do worse than nothing.