Working with Nicknames: Dictionary or Thesaurus?
Posted by Mary Holstege on 27 November 2018 09:58 AM |
|
Robbie, Bobby, Rob, and Bob derive from Robert. Johnny, John, and Jon derive from Jonathan. When dealing with person names, nicknames can make it hard to tell if two people are indeed the same person, unless you had a tool to help you identify these names. But do you use a custom stemming dictionary? Stemming thesaurus? Are there other options? Here, we compare options for stemming person names in MarkLogic to help you decide which is the right approach for you. Stemming DictionaryWhen stemming names using a dictionary, all of the following apply:
Stemming ThesaurusWhen stemming names using a thesaurus, consider:
And…Entity Extraction?It would be overkill for this person name stemming use case, but it is worth pointing out a trick using entity extraction. Feed in query strings to
Bottom LineIf you have a large set of alternatives, or care about language context, go with the stemming dictionary. Additional Resources
The post Working with Nicknames: Dictionary or Thesaurus? appeared first on MarkLogic. | |