Text Extraction Security | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 145MarkLogic Server 53Performance Tuning

Knowledgebase:

Text Extraction Security 08 November 2016 02:03 PM
Introduction Binary documents often have various associated metadata. For example, an image may have metadata like a timestamp of when and where it was taken, and so on. MarkLogic Server server offers the ability to extract this metadata information from binary documents (e.g. Images, MS Office and Adobe PDF) using XQuery built-in functions and conversion pipelines using third party software. The following article gives details about the security vulnerabilities reported for text extraction and MarkLogic releases containing the resolution. Details MarkLogic Server's Admin API function xdmp: document-filter will allow you to extract metadata and text from binary documents as XHTML. Additionally, the server’s xdmp:pdf-convert() and Content Processing Framework (CPF) helps convert HTML, Adobe PDF and Microsoft Office documents to XML. However, these mechanisms utilize and rely on a third-party softwares like Iceni "Argus PDF converter" and “Perceptive Document Filters” from Lexmark to extract text and metadata from a wide variety of document formats. Recently, both Iceni and Lexmark have issued security alerts for vulnerabilities in these product and have incorporated fixes into their most recent release. They have published the following CVEs: For Iceni: CVE-2016-8333 and CVE-2016-8335 An exploitable stack-based buffer overflow vulnerability The latest version of Iceni (v6.6.5) patches the security issues listed above. For Lexmark: CVE-2016-5646 An exploitable heap overflow vulnerability exists in the Compound Binary Format (CBFF) parser functionality of the Lexmark Perceptive Document Filters Library. CVE-2016-4336 An exploitable out of bounds write vulnerability exists in the Bzip2 parsing of the Perceptive Document Filters CVE-2016-4335 An exploitable buffer overflow vulnerability exists in the XLS parsing of the Perceptive Document Filters conversion functionality These are considered to be vulnerabilities of "High" severity based on CVSS base scores in excess of 7.0. A carefully crafted pdf, CBFF, Bzip2, or XLS file could be used to cause a buffer overflow which can result in arbitrary code execution. The latest version of Lexmark Isys (v11.3) patches the security issues listed above. Resolution MarkLogic has issued an update which includes these fixes. The latest releases of MarkLogic Server versions 7 (7.0-6.8) and 8 (8.0-6) are available for download from our Community website that incorporates the latest fix for Iceni and Lexmark Isys. References For more information on the Lexmark security issues, see http://support.lexmark.com/index?page=content&id=TE811&modifiedDate=08/26/16&userlocale=EN_US&locale=en Further details on Iceni issues can be found at: https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2016-8333 https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2016-8335
(0 vote(s)) Helpful Not helpful

Comments (0)