Entity extraction or named entities recognition (NER) is a key technology in Natural Language Processing, especially for privacy and data protection. It consists on detecting proper names, names of places, companies, brands, etc. or special info like addresses, telephone numbers, URLs... For example, if you need to comply with the General Data Protection Regulation (GDPR) you definitely must pay attention to entity extraction technologies.

This report compares Bitext’s entity extraction software to 3 other engines (CRFSuite, Stanford and SENNA) on a well-known 2013 dataset, OntoNotes 5, a large annotated corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic).

