Apache NiFi: The Easiest Way to Ingest Relational Data to MarkLogic®
Posted by Matt Allen on 13 September 2018 11:00 PM |
|||||||||||||||||||
We are excited to announce support for using Apache NiFi to ingest data into MarkLogic. Apache NiFi is an open source tool for distributing and processing data. When used alongside MarkLogic, it’s a great tool for building ingestion pipelines. NiFi has an intuitive drag-and-drop UI and has over a decade of development behind it, with a big focus on security and governance. Ingesting Relational Data to NoSQL with NiFiOne of the historical challenges to adopting new NoSQL databases is getting legacy relational data migrated over. Relational databases store data in rows and columns in a highly normalized form. MarkLogic, a multi-model NoSQL database, stores data as JSON and XML documents and RDF triples. Typically, you group data into natural “entities” that are modeled as documents, and you add RDF triples to capture meaningful relationships among the entities. NiFi helps to naturally group your data by either converting relational rows to small documents or joining groups of rows together into hierarchical structures using primary/foreign key relationships. With new MarkLogic processors, this data then moves quickly into MarkLogic with minimal configuration and high performance. The NiFi approach uses the data model that already exists in the relational database to the extent possible, avoiding costly, fragile and slow ETL jobs. Existing approaches such as MarkLogic Content Pump (mlcp) still work well for getting data into MarkLogic. But, NiFi makes the whole process of ingesting relational data to MarkLogic faster and easier. And, you don’t need to buy a separate ETL tool. If you are interested and want to become an expert, read the white paper that discusses why you should Rethink Data Modelingatch, or watch the presentation on Becoming a Document Modeling Guru. Here is an example: The above screenshot shows a simple process for getting relational data into MarkLogic. An SQL query is executed to get data out of a relational system. Then, a NiFi processor converts the resulting Avro serialized data to JSON, and the JSON data is put into MarkLogic. Watch this five-minute demo that shows how to get relational data ingested into MarkLogic using NiFi. Main Benefits of Using Apache NiFiNiFi is designed and built to handle real-time data flows at scale. But, NiFi is not advertised as an ETL tool, and we don’t think it should be used for traditional ETL. The sweet spot for NiFi is handling the “E” in ETL. It extracts data easily and efficiently. If necessary, it can do some minimal transformation work along the way. We think it’s better to let the database (i.e., MarkLogic) take care of the data transformation and harmonization. The main benefits of NiFi include the following:
Key Concepts with Apache NiFiThe main concepts to understand when using NiFi are dataflows, processors and connections. You create a dataflow by wiring together processors with connections. A dataflow can be saved as a template, and these templates can be combined into more complex flows and reused or replicated across servers. The following table from Hortonworks provides a very nice summary of the individual components and how they map to dataflow programming:
Source: Hortonworks How NiFi Works with MarkLogicUsing NiFi with MarkLogic is similar to using NiFi with any other database—you just need to use the processors specifically built for getting data in and out of MarkLogic. There are currently two processors built for MarkLogic – the PutMarkLogic processor for ingesting data into MarkLogic and the QueryMarkLogic processor for querying documents in MarkLogic. Both of these processors are built on top of MarkLogic’s Data Movement SDK. The below list of capabilities provides a general idea of what each processor is capable of. Capabilities of the PutMarkLogic Processor
Capabilities of the QueryMarkLogic Processor
Getting Started with NiFi and MarkLogicThe steps below illustrate how fast and easy it is to get started using NiFi with MarkLogic. Download NiFiDownload the NiFi binaries from http://nifi.apache.org/download.html. Make sure you’re on the latest release of NiFi (1.7). Unpack (i.e., unzip) the tar or zip files in a directory of your choice (for example: /abc). Download ProcessorsClone the MarkLogic/nifi-nars repository to get the MarkLogic-specific processors located in the GitHub repository. Organize FilesPlace the MarkLogic-specific processor files in the correct directory. To do this, copy the two .nar files provided by MarkLogic in the zip folder into the lib folder (nifi-1.7.0/lib) of the unpacked NiFi distribution. Start NiFiGo to the Apache NiFi Development Quickstart and follow the commands in the Decompress and Launch sections. Note that you do not need to follow the decompress instructions. Also, make sure that you are in the directory of your NiFi installation. If not, change your directory using a command (e.g., “cd /abc/nifi-1.7.0”). Now, you are ready to follow the launch instructions provided in the Apache NiFi Development Quickstart for your particular environment. Run NiFiNow, you’re ready to run NiFi using your browser. You can point to a web browser at http://localhost:8080/nifi/ to run NiFi. Make sure you are running MarkLogic version 9.0+. MarkLogic Resources
Apache NiFi Resources
The post Apache NiFi: The Easiest Way to Ingest Relational Data to MarkLogic® appeared first on MarkLogic. | |||||||||||||||||||