What is MarkLogic Data Hub?
MarkLogic’s Data Hub increases data integration agility, in contrast to time consuming upfront data modeling and ETL. Grouping all of an entity’s data into one consolidated record with that data’s context and history, a MarkLogic Data Hub provides a 360° view of data across silos. You can ingest your data from various sources into the Data Hub, standardize your data - then more easily consume that data in downstream applications. For more details, please see our Data Hub documentation.
Note: Prior to version 5.x, Data Hub was previously known as Data Hub Framework (DHF)
Takeaways:
- In contrast to previous versions, Data Hub 5 is largely configuration-based. Upgrading to Data Hub 5 will require either:
- Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
- Executing your legacy flows with the “hubRunLegacyFlow” Gradle task
- It’s very important to verify the “Version Support” information on the Data Hub GitHub README.md before installing or upgrading to any major Data Hub release
Pre-requisites:
One of the pre-requisites for installing Data Hub is to check for the supported/compatible MarkLogic Server version. For details, see our version compatibility matrix. Other pre-requisites can be seen here.
New installations of Data Hub
We always recommend installing the latest Data Hub version compatible with your current MarkLogic Server version. For example:
-If a customer is running MarkLogic Server 9.0-7, one should install the most recent compatible Data Hub version (5.0.2), even if the previous Data Hub versions (such as 5.0.1, 5.0.0, 4.x and 3.x) also work with server version 9.0-7.
-Similarly, if a customer is running 9.0-6, the recommended Data Hub version would be 4.3.1 instead of previous versions 4.0.0, 4.1.x, 4.2.x and 3.x.
Note: A specific MarkLogic server version can be compatible with multiple Data Hub versions and vice versa, which allows independent upgrades of either Data Hub or MarkLogic Server.
Upgrading from a previous version
- To determine your upgrade path, first find your current Data Hub version in the “Can upgrade from” column in the version compatibility matrix.
- While Data Hub should generally work with future server versions, it’s always best to run the latest Data Hub version that's also explicitly listed as compatible with your installed MarkLogic Server version.
- If required, make sure to upgrade your MarkLogic Server version to be compatible with your desired Data Hub version. You can upgrade MarkLogic Server and Data Hub independently of each other as long as you are running a version of MarkLogic Server that is compatible with the Data Hub version you plan to install. If you are running an older version of MarkLogic Server, then you must upgrade MarkLogic Server first, before upgrading Data Hub.
Note: Data Hub is not designed to be 'backwards' compatible with any version before the MarkLogic Server version listed with the release. For example, you can’t use Data Hub 3.0.0 on 9.0-4 – you’ll need to either downgrade to Data Hub 2.0.6 while staying on MarkLogic Server 9.0-4, or alternatively upgrade MarkLogic Server to version 9.0-5 while staying on Data Hub 3.0.0.
- Example 1 - Scenario where you DO NOT NEED to upgrade MarkLogic Server:
- Current Data Hub version: 4.0.0
- Target Data Hub version: 4.1.x
- ML server version: 9.0-9
- The “Can upgrade from” value for the target version shows 2.x which means you need to be at least be on Data Hub 2.x. Since, the current Data Hub version is 4.0.0, this requirement has been met.
- Unless there is a strong reason for choosing 4.1.x, we highly recommend to upgrade to the latest version compatible with MarkLogic Server 9.0-9 in 4.x - which in this example is 4.3.2. Consequently, the recommended upgrade path here becomes 4.0.0-->4.3.2 instead of 4.0.0-->4.1.x.
- Since 9.0-9 is supported by the recommended Data Hub version 4.3.2, there is no need to upgrade ML server.
- Hence, recommended path will be Data Hub 4.0.0-->4.3.2
- Example 2 - Scenario where you NEED to upgrade MarkLogic Server:
- Current Data Hub version: 3.0.0
- Target Data Hub version: 5.0.2
- ML server version: 9.0-6
- The “Can upgrade from” value for the target version shows Data Hub version 4.3.1 which means you need to be at least be on 4.3.x (4.3.1 or 4.3.2 depending on your MarkLogic Server version). Since the current Data Hub version 3.0.0 doesn’t satisfy this requirement, upgrade path after this step becomes Data Hub 3.0.0-->4.3.x
- As per the matrix, the latest compatible Data Hub version for 9.0-6 is 4.3.1, so the path becomes 3.0.0-->4.3.1
- From the matrix, the minimum supported MarkLogic Server version for 5.0.2 is 9.0-7, so you will have to upgrade your MarkLogic Server version before upgrading your Data Hub version to 5.0.2.
- Because 9.0-7 is supported by all 3 versions under consideration (3.0.0, 4.3.1 and 5.0.2), recommended path can be either
- 3.0.0-->4.3.1-->upgrade MarkLogic Server version to at least 9.0-7-->upgrading Data Hub version to 5.0.2
- Upgrading MarkLogic Server version to at least 9.0-7-->upgrade Data Hub from 3.0.0 to 4.3.1-->upgrade Data Hub version to 5.0.2
- Recall that Data Hub 5 moved to a configuration-based approach from previous versions’ code-based approach. Upgrading to Data Hub 5 from a previous major version will require either:
- Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
- Executing your legacy flows with the “hubRunLegacyFlow” Gradle task
Links for Reference:
https://docs.marklogic.com/datahub/upgrade.html