Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
How do spell:suggest() and spell:suggest-detailed() arrive at their suggestions?
29 August 2013 02:44 PM

Introduction

spell:suggest() and spell:suggest-detailed aren't simply looking for character differences between the provided strings and the strings in your dictionaries - they're also factoring in differences in the resulting phonetics represented by these strings.

Detail

There is an undocumented option that can be passed along to increase the phonetic-distance threshold (which is 1, by default). For example, consider the following:

xquery version "1.0-ml";

spell:suggest-detailed(('customDictionary.xml'),'acknowledgment', <options xmlns="http://marklogic.com/xdmp/spell"> <phonetic-distance>2</phonetic-distance> </options> )

=>

<spell:suggestion original="acknowledgment"
dictionary="customDictionary.xml"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:spell="http://marklogic.com/xdmp/spell"> <spell:word distance="9" key-distance="2" word-distance="45"
levenshtein-distance="1">acknowledgement</spell:word> </spell:suggestion>

Note that the option "distance-threshold" corresponds to "distance" in the result, and "phonetic-distance" corresponds to "key-distance."

Also note that increasing the phonetic-distance may cause spell:suggest() and spell:suggest-detailed() to use significantly more CPU. Metaphones are short keys, so a larger distance may match a very large fraction of the dictionary, which would then mean each of those matches would need to be checked in the distance algorithms.

(2 vote(s))
Helpful
Not helpful

Comments (0)