Best Practices for Using BI Tools with MarkLogic
05 June 2020 07:34 PM
This knowledgebase article contains critical tips and best practices you'll need to know to best use MarkLogic Server with your favorite BI Tools.
BI Tool Q&A
Q: What's a TDE? Is that a Tableau Data Extract?
A: In MarkLogic terms, TDE stands for Template Driven Extraction. A template is a document (XML or JSON) that declares how a view is to be populated. It defines a context -- the root path of all the documents that are involved in this view -- then, for each column in the view, it defines a column name, type, and a path to the data inside the document. You can define the value of a column using several pieces of data in the document, plus some functions, even some programming operations such as IF. For example, if your documents have the "last-updated" year and month and day in different parts of the document, your Template can pull in those three pieces, concatenate them, then cast the result as a date.
Q: When modifying TDEs, do I need to reindex?
A: TDEs map an SQL-like view on top of MarkLogic. If you change an existing view, you do need to reindex the database. Before kicking off a resource- and time- intensive reindex, however, be aware that there are some TDE configurations that cannot be updated. You can read more about exactly which kinds of TDEs may or may not be updated at the following knowledgebase article: Updating a TDE View.
Q: Can MarkLogic handle queries that require a large number of columns?
A: Yes, but you'll want to pay attention to potential performance impacts. In general, it's much better to spread a large number of columns across multiple TDEs, instead of having a single TDE containing all those same columns. Data modeling is also important here - TDEs should be meaningful with regard to their intended use. Definitely check out MLU's Data Modeling Series, in particular Progressive Transformation using the Envelope Pattern and Impact of Normalization: Lessons Learned.
Q: What are some common patterns and antipatterns for good performance with BI tools?
A: First, avoid using Nullable columns in filters and drilldowns. There are optimizations in MarkLogic Server's SQL engine to detect patterns with "null" - but different BI tool generate their code in different ways and can sometimes result in code that circumvents those optimizations. In general, if performance is a priority, it's usually better to use an actual value such as "N/A" or "0".
Second, enable Query Reduction or similar options in your BI tool of choice. Without this option, if you choose to filter on a year - say "2018" - and then also select "2019", multiple SQL queries will be sent to MarkLogic in quick succession unnecessarily.
Q: What do I need to watch out for when connecting my BI tool to MarkLogic?
A: If performance is a priority, exercise caution when using joins. In general, the best practice is to create collections of data in MarkLogic that represent the subsets of data needed externally as closely as possible. You can learn more about what tools are available to see how many and what kind of joins are being used by your query in the What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server? knowledgebase article, and you can learn more about how to create more meaningful data models and subsets of your data models in the aforementioned MLU's Data Modeling Series, as well as in the MarkLogic World presentation Getting the Most from MarkLogic Semantics (also available in video form).