Vainu NER Outperforms Global Benchmarks, Setting New Standards for How AI Can Comprehend Natural Language in a B2B Environment
Vainu‘s mission is to build the most comprehensive database of all the companies in the world. A crucial part of this mission is having the ability to determine when companies are being mentioned in natural language.
Named Entity Recognition (NER) is a subset of natural language processing that focuses on understanding when an entity is mentioned in free text.
For example, a sentence like “Apple’s logo is beautiful but why has the apple been bitten?” has the named entity (Apple) and the word “apple” in different contexts.
Many companies have names that are also words of their own and therefore searching with text pattern only is not a sufficient way to detect when companies are being mentioned in text.
State-Of-The-Art Neural Network Predictions with Mind Blowing Amounts of Training Data
Earlier, Vainu used to leverage Google’s NER technology, but it soon became evident that instead of only recognizing linguistic patterns, a more advanced solution could be made using company data.
Vainu analyzes approximately 1.5 million news stories about companies on a daily basis; it houses data from about 120 million companies in its database. The raw data to create the training set was always there.
By creating effective tools — using huge amounts of human workers and carefully using cross-validation — in which the same piece of information was evaluated by different workers, the VainuLab team converted the raw data into a massive, structured training set.
From there, employing current state-of-the-art deep learning technology and validation of results enabled Vainu to create the most accurate company NER in the world today.
“Like with most of the machine learning tasks in the world, the largest task for us has been generating the training set that meets our quality standards and creating the technology to build it,” said Tuomas Rasila, CTO and co-founder of Vainu.
What Is the Value of This Technology?
“Beyond the original use case of collecting vast amounts of publicly available information about companies of the world, this technology could potentially be used for a number of tasks like searching companies in unorganized textual databases through corporate databases and emails,” said Riko Nyberg, Head of AI VainuLabs.
The technology is currently being used as a part of Vainu’s company intelligence platform and offered as a part of its technology stack for corporate customers, but it may possibly find its way to wider audiences in upcoming releases, according to Rasila.
Test Results: How Vainu’s NER Fares Compared to Global Benchmarks
As described, recognizing the correct companies is the most crucial element for Vainu’s company-centered service. Thus, to serve the overall mission of Vainu, the one measure where Vainu’s NER must crush all other services is the recognition of companies. And this is the case in all the languages in which Vainu processes unorganized textual data.
Widely respected benchmarks for NER services are the Google NER and the Stanford NER. However, they do not provide solutions in Finnish, Swedish or Norwegian, so based on purely that fact Vainu NER performs better in these languages and sets the bar. For context, in Swedish, Vainu NER’s F1 score in recognizing companies was already as high as 85.89%.
And how about English – the most researched language in named entity recognition? The overall F1 score of Vainu NER vs. Stanford NER – using the English test set provided by Stanford – was 94.20% vs. 92.99% – Vainu outperformed Stanford’s NER.