News Categories
(6)Press Releases (12)MarkLogic World (25)Big Data (33)Uncategorized (4)Dynamic Publishing (17)Agile Development (1)cloud (5)Hadoop (20)NoSQL (2)semantics (34)Enterprise NoSQL (1)HTML5 (2)Mobile (1)data enrichment (4)defense (1)geospatial (4)intelligence (3)search (4)use case (10)Analytics (12)ACID compliant (5)Defense (8)Search (2)alerting (1)query (1)schema (1)variety (1)velocity (6)Security (4)Content Platform (1)Migration (1)Serialized Search (1)Springer (5)Financial Services (1)Fraud (1)Big Data Nation Dallas (1)Big Data Nation (1)Chris Anderson (2)Fernando Mesa (1)Reed Construction Data (1)Reed Elsevier (1)Tony Jewitt (2)Situational Awareness (2)Dan McCreary (1)LexisNexis (1)Mark Rodgers (1)David Gorbet (1)David Leeming (1)MUGL (2)Publishing (1)Royal Society of Chemistry (1)RSC (1)Science (1)User Group (2)Intel (1)Sony (2)Amir Halfon (1)AML (1)Anti-Money Laundering (1)BDN Boston (1)Temis (1)DAM (1)Condé Nast (1)Digital Asset Management (1)Henry Stewart (3)Book Publishing (1)XQuery (1)Direct Digital (1)Typeswitch (1)Permissions (1)AIP (1)Digital Media (1)James Wonder (1)mission-based publishing (1)STM associations (1)STM publishing (4)Media (2)Media & Marketing (1)Facets (1)mongoDB (6)Semantic Web (1)Amazon Web Service (1)Cloud (1)BBC (2)MarkLogic 7 (1)Mike Bowers (1)Sanjay Anand (1)Software Upgrade (1)Zynx Health (1)Multi-Version Concurrency (5)Marketing (2)The Real Scoop (1)Frank Rubino (1)Operational Trade Store (2)Linked Data (1)Philip Fennell (1)RDF (1)Adam Fowler (1)Range Indexes (1)range indexing scoring (1)Journey to Sanity (1)Jason Hunter (1)Loading As Is (1)MapReduce (1)HDFS (1)ASTM (2)Learning Management System (2)LMS (1)Intelligence (1)Healthcare (1)Enterprise Reference Data management (1)Reference data (2)Tableau (2)JSON (2)AngularJS (2)jQuery (1)Education (1)LRS (1)TinCan (1)Events (1)San Francisco (2)Data Management (1)MarkLogic World Tour (1)Government (1)Semantics
Supported Server Versions
Version Current Release End of Life Date
MarkLogic Server 7.0 7.0-2.3 In Circulation
MarkLogic Server 6.0 6.0-5.1 In Circulation
MarkLogic Server 5.0 5.0-6.1 In Circulation
MarkLogic Server 4.2 Dec. 21, 2013
Latest Updates
Apr
17
Founder’s Online: A Lesson in Performance
Posted by Matt Allen on 17 April 2014 10:00 AM

foundersonline
This post is a snapshot of the talk given at MarkLogic World, titled, “Planning for Growth with and without Performance Metering,” given by David Sewell, Editorial and Technical Manager for University of Virginia Press, with support from Tim Finney, Lead Programmer out of Perth, Australia.

MarkLogic is great for large enterprises running large applications, but MarkLogic is also great for small shops that want to do great things. Founder’s Online launched in summer 2013, and providing public access to almost 150,000 searchable documents from six of the founding fathers: George Washington, Benjamin Franklin, John Adams, Thomas Jefferson, Alexander Hamilton, and James Madison. The site, a joint venture between University of Virginia Press in cooperation with National Archives, and powered by MarkLogic, is incredibly fast and scalable, delivering sub-second response times to thousands of concurrent users. Surprisingly, however, Founder’s Online was developed by an amazingly small team of people – on a relatively small budget.

Here are some quick facts about the project:

  • Small Team:  1.5 dedicated FTE to develop the site
  • Big Data:  150,000 searchable documents with an average size of 2MB
  • Fast Queries:  15,300 documents in 0.02 seconds
  • Serious Scale:  120 ms response time with five thousand concurrent users

So, how did the Founder’s Online team achieve such high performance? According to David Sewell, there were three key elements that helped Founder’s Online achieve the great performance results:

1.  Leverage the XML Data Model

All of the text from the letters was transcribed and transformed to XML. Each letter was then stored as an individual document within MarkLogic, making up a collection of 150,000 documents. For querying the XML, the team avoided using XPath node traversal, which was too slow and created hard-coded links and expansions. An example of the simple code in production for search queries is below:

search:search(
	$q-full,
	c:map-search-options($map),
	$start,
	$length
)

2.  Rethink the Code

The team had to get away from legacy code and strategies and embrace new approaches. To help, they relied heavily on MarkLogic’s documentation onQuery Performance and Tuning Guide. The team also used the XQMVC framework, and is like many of the other MVC frameworks for languages such as Java, Python, PHP, Ruby, etc., except XQMVC is designed specifically for building complex applications in XQuery. Some of the other key things that the team did included:

  • Using maps instead of session fields
  • Used run-time switches
  • Ignored bottlenecks possibly deriving from search internals

With the new architecture, they were able to query 15,300 documents in 0.02 seconds.

An example of the application code showing a lexicon function is below:

let $publ :='JSMN'
let $duplicate :=
	cts:element-attribute-values(
		xs:QName('FGEA:mapData'),
		fn:QName(",'id'),
		(),
		(),
		cts:collection-query($publ)
	)[cts:frequency(.) gt 1]
return count($duplicates)

3. Rely Heavily on Caching

The team moved from dynamic to static wherever possible, both in rendering and search results, by relying on caching. They did this by developing a front-end caching proxy called Nginx; creating an HTML cache in MarkLogic to avoid the need for run-time XSLT rendering; and, developing a cache output from searches, facets, and result pages in the database for potential re-use. The documents in the search cache are stored as binaries in MarkLogic to avoid index overhead. By avoiding indexes, a document call simply pulls it in as XHTML, which is very efficient. An example of the code is below:

Binary {
  xs:hexBinary(
    xs:base64Binary(
       xdmp:base^64-encode(
          xdmp:quote($HTML-node)
       )
     )
   )

Using this approach to caching, the site showed serious improvements in query speeds. A 90-page document that originally took 19 seconds of query time on the old platform could be delivered in as little as 1.86 ms. IBM’s Global Technology Services even did some testing on the application and found that even with 5,00 concurrent users, average response time was still only 120 ms.

foundersonlineperformancetesting

*Load testing by IBM Global Technology Services using SOASTA, Inc.

Using these tactics to optimize performance, the Founder’s Online team was able to build a successful app that eventually will go on to support 90 volumes
of over 175,000 of founder’s letters.

Founder’s Online: A Lesson in Performance from MarkLogic.


Read more »



Apr
15
Uses of Semantic Technology in Financial Services
Posted by Alicia Saia on 15 April 2014 01:29 PM

My colleague Amir Halfon – MarkLogic’s CTO of Global Financial Services – just posted a new addition to his “Big Data Blog” describing how Financial Services organizations can benefit from Semantic Web Technology.

In the post, he lays out five different use cases – Customer 360, Data Provenance, Reference Data, Pre-Trade Analytics and Decision Support, and Compliance – and gives a high-level overview of the reasons why (and how) this type of non-relational technology is useful for each. Common to all the examples is the way Semantic Technology can help add meaning and context to data, without extensive human intervention, rigid data modeling, or costly ETL cycles.

If you’re interested in a quick introduction to uses of Semantic Web Technology, I recommend you check it out.

Uses of Semantic Technology in Financial Services from MarkLogic.


Read more »




Copyright © 2014 MarkLogic Corporation. All Rights Reserved. MARKLOGIC® is a registered trademark of MarkLogic Corporation.   Terms of Use  |  Privacy Policy  |  Careers  |  Sitemap