News Categories
(6)Press Releases (14)MarkLogic World (28)Big Data (39)Uncategorized (7)Dynamic Publishing (22)Agile Development (1)cloud (8)Hadoop (34)NoSQL (11)semantics (44)Enterprise NoSQL (1)HTML5 (4)Mobile (1)data enrichment (4)defense (2)geospatial (4)intelligence (5)search (4)use case (12)Analytics (14)ACID compliant (5)Defense (9)Search (3)alerting (1)query (1)schema (1)variety (1)velocity (6)Security (8)Content Platform (1)Migration (1)Serialized Search (1)Springer (9)Financial Services (1)Fraud (1)Big Data Nation Dallas (1)Big Data Nation (1)Chris Anderson (2)Fernando Mesa (1)Reed Construction Data (1)Reed Elsevier (1)Tony Jewitt (2)Situational Awareness (2)Dan McCreary (1)LexisNexis (1)Mark Rodgers (1)David Gorbet (1)David Leeming (1)MUGL (5)Publishing (1)Royal Society of Chemistry (1)RSC (1)Science (1)User Group (2)Intel (1)Sony (2)Amir Halfon (1)AML (1)Anti-Money Laundering (1)BDN Boston (1)Temis (4)DAM (1)Condé Nast (1)Digital Asset Management (1)Henry Stewart (4)Book Publishing (1)XQuery (1)Direct Digital (1)Typeswitch (1)Permissions (1)AIP (1)Digital Media (1)James Wonder (1)mission-based publishing (1)STM associations (2)STM publishing (8)Media (6)Media & Marketing (1)Facets (1)mongoDB (10)Semantic Web (1)Amazon Web Service (1)Cloud (1)BBC (2)MarkLogic 7 (1)Mike Bowers (1)Sanjay Anand (1)Software Upgrade (1)Zynx Health (1)Multi-Version Concurrency (7)Marketing (3)The Real Scoop (1)Frank Rubino (1)Operational Trade Store (2)Linked Data (1)Philip Fennell (2)RDF (1)Adam Fowler (1)Range Indexes (1)range indexing scoring (2)Journey to Sanity (1)Jason Hunter (1)Loading As Is (2)MapReduce (2)HDFS (1)ASTM (2)Learning Management System (2)LMS (1)Intelligence (11)Healthcare (1)Enterprise Reference Data management (1)Reference data (2)Tableau (2)JSON (3)AngularJS (2)jQuery (2)Education (1)LRS (1)TinCan (1)Events (1)San Francisco (15)Data Management (1)MarkLogic World Tour (6)Government (1)Decision Support (9)Semantics (2)hiring (2)jobs (2)skill set (3)REST API (2)C++ REST Wrapper (2)narrative (2)Polling (2)Unstructured (2)Early Access (1)Open Source (1)free developer license (1)Java Client API (1)open source (1)proprietary (2)metadata (1)Women in Technology (1)Grace Hopper (1)Mary Hostege (1)frankfurt book fair (1)Klopotek (1)Larry (1)OpenWorld (1)Oracle (1)DaaS (1)Data as a Service (1)women in technology (4)ACID transactions (4)Government-grade security (2)rapid application development (5)RDBMS (5)Community (2)Bitemporal (5)MarkLogic 8 (1)Turkey (1)Santa's List (2)data integration (1)geospatial data (1)Patient 360 (1)EHR interoperability (1)HIE (2)semantic data (1)semantic interoperability (1)technology interoperability (1)Time-Series (3)Angular JS (1)Ember (1)JEE (1)document database (2)NBC (3)SNL app (3)Martin Fowler (3)microservices (3)polyglot persistence (2)polyglot persistance (2)Risk Management (3)Samplestack (2)Java (2)multi-statement transactions (2)product management (3)samplestack (2)Innovation (2)MarkLogic History (2)Timeline (1)Operational Data Warehouse (9)Retail and Consumer (1)Enterprise (3)Retail (1)Healthcare apps (1)Healthcare reform (8)Big Data in Retail (1)Omnichannel 360 (6)Omnichannel in Retail (1)Saturday Night Live app (2)Consumer 360 (1)Loyalty Programs in Retail (2)Big data in Government (2)Transformational leadership (2)E-Commerce in Retail (1)infographic (1)E-Commerce (1)Online sales (2)multi-model database (4)big data (1)Tech Summit (1)Business Data (1)Herold (1)precision search (1)Compliance (1)patient-centered care (2)Data Modeling (5)Relational Databases (1)CIO (1)data modeling (1)E-R diagram (1)entity relational diagram (4)relational databases (1)EHR (1)structured data (1)unstructured data (1)scalabilty (4)Retail & Consumer (1)360 view (2)cybersecurity (1)data variety (1)mixed workloads (1)OLAP (1)OLTP (1)shadow IT (1)agile (1)app development (1)impedance mismatch (2)JavaScript (1)ORM (1)relational database (1)Mainframe (2)Healthcare & Life Sciences (1)big data in healthcare (1)mlcp (2)Media & Entertainment (1)Cybersecurity (1)ACID (1)certification (1)common criteria (1)FIPS 140 (1)RBAC (1)security (2)Intelligence & Analysis
Supported Server Versions
Version Original Release Date Current Release Windows Version End of Life Date
MarkLogic Server 8.0 February 6, 2015 8.0-4.2 8.0-4.2 In Circulation
MarkLogic Server 7.0 November 14, 2013 7.0-6 7.0-6 In Circulation
MarkLogic Server 6.0 September 12, 2012 6.0-6 6.0-6.1 June 26, 2016
Latest Updates
Jan
29
Data Strategy Factors for Threat Management
Posted by Billy Sokol on 29 January 2016 09:54 AM

In the wake of the extremist attacks in Paris and in San Bernadino, California, some suggested the tragedies were a complete failure of intelligence services. That would be hyperbole. However, both of these attacks do serve to highlight the difficulty in protecting modern democratic societies from small group and lone gunman threats. The countries that most treasure “openness” and civil liberties walk a particularly fine line.

Short of creating a STASI-like surveillance state, government and industry have to ask themselves, is there something we can do better in the area of screening and risk assessment from an information technology standpoint. As it is now, by the time you get to watch-listing, the aperture of situational awareness is often too small.

Over the last year MarkLogic Global Public Sector has engaged with National Security and Public Safety leadership in Europe, North America, Asia, and Australia. Increasingly, post-attack investigations of acts of extremism — whether committed by homegrown or transnational extremists — are beginning to reveal data strategy shortcomings that impact threat management and the screening processes. These limitations slow down the ability of agencies whether at the country or regional level, to quickly evolve and adjust to shifting threats. Here are the four most pressing challenges.


System proliferation impacts threat management

Challenge #1: Data and IT Systems Proliferation

Various agencies’ have many systems from which information must be aggregated. Each of these systems was developed organically and were bought for specific reasons along a different timeline by different departments. These were procured rationally with market studies and analysis of alternatives. Requests for Information were issued; industry days may have been conducted. Tenders were posted and perhaps even a narrowed list of choices was evaluated based on functionality and cost reasonableness. These systems were then implemented largely according to plan and met or exceeded the stated requirements.

For example, an intelligence agency may have procured Geospatial Information Systems, Link Analysis, Case Management, Biometrics, and search tools, all at different times and reasons, with different funding. These systems and tools perform exactly or nearly what is expected of them. However, what was never considered was for these systems to be integrated in a way so new information applications could be created; applications that could be agile enough to respond to new threats, integrate new sensors or screening methods, and match evolving analytical intelligence techniques.

The solution to this problem can’t be “replace everything.” Not only is that economically untenable, but it’s techno centric – ignoring the reality that with their old systems, users are trained, relatively productive, and are used to all of the quirks. Equally, the answer can’t be incremental and point-to-point integration of these systems – which dooms organizations to a forever-loop of engineering and increasingly complex maintenance and quality assurance. You’re fighting against the arithmetic of systems engineering at that point.

System proliferation, especially when the result of department to department, or agency to agency bureaucracy directly impacts a government’s ability to protect its borders by creating the likelihood of that some of the data is orphaned or even unmanaged.


System proliferation impacts threat management

Challenge #2: Silos of “Excellence”

Sometimes the above-described proliferation takes the form of stove-piped analytical environments. Organized around applications or integrated systems such as statistics, link analysis, GeoINT, Signals, OSINT — these environments, present the organization with a double-edge sword. In exchange for a robust user experience and productivity within that single discipline, data feeds, works-in-progress, and even finished intelligence products are trapped in their own silos. This solution ends up complicating interoperability, creating synchronization and data consistency issues, and reduces the return on investment from initiatives such as data center consolidation and adoption of cloud architectures (whether private, commercial, or hybrid).

Using individual applications – each with their own databases to be the focus of fused information – particularly objects and entities related to people, organizations, events, places, and chronologies, greatly limits counter-extremism and border control organizations’ ability to adapt to threats. It also diverts money, time, and resources to the care and feeding of infrastructure rather than operations and analysis.


System proliferation impacts threat management

Challenge #3: Multiple Communities of Interest

If you look at a complex function such as Threat Management, Screening, and the subsequent watch-listing, the reality is that the interests of multiple groups or communities of interest are at play. Besides Public Safety and Law Enforcement at local, state, National, and International levels, those responsible for monitoring and operating critical infrastructure all may have reasons to see similar data, but they will use it in vastly different ways. Furthermore, information coming from these stakeholders may require granular security controls at the attribute or value level so the cooperating organizations can live up to “need to share” but be able to safeguard sensitive content such as sources and methods. Even beyond this, when you look at end-to-end activities related to extremism such as Organized Crime, Human Smuggling & Trafficking, and illegal commerce around drugs and weapons, the interplay between poverty, education, availability of social services, and transportation are extremely complex. It’s hard enough for Defense, Intelligence, and Law Enforcement to share information (just from a political standpoint). But what does the architecture look like that allows you to bring together the indicators and warnings from the social fabric that would provide even the roughest idea of what puts young men and women, already at risk due to hopelessness, economic strife, isolation, and issues of language, culture, and religion? The greatest care has to be taken to respect Personal Identification Information, Privacy, and even health records.

If threat management systems aren’t designed from the ground up with the expectation that there are multiple communities of interest involved in combatting extremism, true information sharing and collaboration will be elusive.


System proliferation impacts threat management

Challenge #4: Making Data Science Operational

There’s no doubt that innovation in the area of big data and data science is going to transform many different aspects of Security and Public Safety. However, looking broadly across industries including government, financial services, healthcare and life sciences – it is clear that current investments in data science are experimental in nature. This characterization is not meant to denigrate any of these efforts including those specifically focused on security, border control, and public safety. However there seem to be two gaps that need to be better addressed:

  1. Data Science has to conform to the scientific method. The rigor applied to creating a screening algorithm for border control has to be the same as what you would apply to any other experiment. Over time, this will undoubtedly improve as the “science” of data science matures. However, today, in many organizations data science is being pursued as if you’re seeing what will grow in a petri dish without isolating one variable at a time.
  2. The IT architecture surrounding data science, frequently a collection of open source tools anchored by Hadoop, requires so much effort and time to wire together, that instead of being a platform on which to conduct experiments, becomes the experiment itself.

What’s needed is a platform that can both support the scientific process unimpeded, but also be a platform to operationalize the algorithms, models, filters, and pattern detectors created ‘on the workbench.’

The system architectures and tools thus far evangelized and adopted for Data Science are, likely, not sufficient to transfer these insights to an operational environment. For threat management, the feedback loop between algorithm and model creation and real-world application has to be dependable and rigorously real-time.

The answer to the above challenges is not purely technical. There are significant organizational, culture, and process changes that have to be addressed. However, based on what we’ve seen in working with dozens of National Security and Public Safety organizations around the world there are aspects of threat management and watch-listing processes that may be better managed by taking a fresh look at Data Strategy.


What Do We Mean by Data Strategy?

Because of the above four challenges as well as the variety and volatility of the data that is critical to providing situational awareness to counter-extremism operations, a coherent approach to data curating, integration, governance, sharing, and security is difficult. When we think about data strategy it must encompass all of this and do so in a way that spans applications, systems, communities of interest, organizations, and even nations. Data should be managed independently of individual applications. The strategy has to incorporate data life cycle, stakeholder attributes, access controls, master data management, and providing geospatial, temporal, and semantic context.

When we look broadly at data strategy it’s clear we are at a crossroads. Across National Security, Defense, and Public Safety organizations, there are dozens of systems implemented and procured in an unsynchronized way. As we discussed above, there’s no way to do wholesale modernization that requires “rip and replace” of all systems.

There is another approach.


The Operational Data Hub

One answer may lie in an enterprise architectural pattern known as an Operational Data Hub (ODH).

An ODH brings all of the relevant mission data together regardless of format or schema. It provides the ability to index all structured, unstructured, semantic, geospatial, temporal, metadata, and security information and securely expose all of this information for search, data matching/alerting, and exploration via tools such as link analysis, geospatial information systems, and statistical packages. Integration and dissemination are made simpler by providing hooks into data and functionality via RESTful web services.

The ODH is not designed for analytics or business intelligence, but the ETL, aggregation, and related data management time, resources, and complexity associated with analytics or data science will be greatly reduced. The ODH is a way to avoid all of the point-to-point integration typical of complex information environments. The ODH also reduces the need for costly wholesale IT modernization efforts.


Object-Based Intelligence & Production

While an ODH provides one mechanism to organize, search, and tag all of the relevant data, the way that entities such as people, organizations, events, observations, and chronologies, are central to counter-extremism and threat management work, means that something more is needed. Those involved with counter-terrorism and Threat management need to be able to create, share, discovery, and relate these entities or objects. Each of these is comprised of multiple attributes each potentially with multiple values. Specialized metadata denoting pedigree, provenance, timespan validity, analyst comments, and security tags can also be included. Object Based Production provides the counter-extremism and security community with a way make the intelligence life-cycle more dynamic.

Both ODH and these Object & Entity services expose shortcomings in RDBMS platforms to a point that they would crumble under the variability of the content and user types.

There’s another equally powerful reason to consider moving to an Object or Entity Services Approach to Counter-Extremism. The existing intelligence life cycle, characterized by:

  • Collection
  • Processing
  • Exploitation
  • Dissemination

Some countries spend billions on Collection, hundreds of millions on Processing, put their best people and analytical tools on Exploitation, then shove everything into PDFs and PPTs for Dissemination, trapping important insights and data about threats, people, organizations, and locations inside these files that are hard to discover and relate.

The promise of an Object or Entity based approach is liberating facts from from the confines their underlying sources or summary documents. This means all of the cooperating agencies and even countries can more flexibly and securely share the information they need to combat extremism.

The False Dichotomy – Enterprise RDBMS Products Versus NoSQL Projects

In general there is little debate that the Relational Database Management System (RDBMS) epoch is far closer to its end than it’s beginning. Innovation has slowed and those areas where investments are made by the leading RDBMS vendors provide only diminishing marginal utility. These frequently reflect defensive moves to keep the product seemingly competitive with newer approaches like NoSQL. Under examination the embrace of newer modalities are cursory and are little more than a way to pull unstructured data into the RDBMS core.

Built upon RDBMS, enterprise architectural patterns such as data warehouses or data marts do address some of the data challenges. However, they begin and end with high structured sources. RDBMS-based data warehouses and data marts are inflexible and brittle. This is largely due to the need to express and organize data in harmonized and normalized ‘star’ schema. This ends up flattening out the kind of ad hoc and nested document-based information that is just so vital to the counter-extremism and threat management mission.

Certainly many NoSQL databases can answer Challenges 1 and 2, with their promise to take in any type of data. But the bulk of NoSQL options available are open source projects, not enterprise products. Open source is very alluring: Seemingly lower licensing costs, the ability to tailor for a particular organization or mission, and the innovation of entire communities working on common problems. However, these NoSQL databases are anemic when it comes to data consistency, disaster recovery and backup, replication, ability to handle all data types, and government-grade security. Which leaves Challenge #3 unmet – unless organizations take on the heavy software engineering tasks akin to those typically done by independent software vendors, not customers.

The bottom line is that counter-extremism and security operations presents data management challenges both for legacy RDBMS and for less-than-enterprise ready NoSQL. The panacea would be a databases that has the agility of NoSQL and the reliability of enterprise relational.

To Reboot Your Data Strategy

If (rethinking) your data strategy is at the center to rethinking counter-extremism and security operations, then how do you get started? Well, there’s a few things your organization can do right away:

Consider the variety of data needed to do Threat Management. Is it documents? Video? Biometrics?
Are the sources volatile and variable?
Before you build a relational data warehouse, consider how many of the sources started out in JSON or XML and then were deconstructed into tables
If you have a data warehouse and it doesn’t include all documents, PDFs, PowerPoint slides, and other content on your network share drive – imagine the work the users need to do to bring together all data
If your business processes revolve around entity management, and geospatial analysis consider how operations could be improved if entities and geospatial features were application-independent and managed by an operational data hub
If business processes rely on business intelligence reports to answer “can I find…” type questions – consider what impact integrated search & DBMS services can have on individual productivity
Lastly, much change is often focused on data strategy initiatives: “if it isn’t broken, don’t fix it…” This is an important mantra when considering an Operational Data Hub (ODH). Frequently, investments in legacy systems can be extended with an ODH because it removes the burden of point-to-point interoperability from the individual application or system

Summary

When considering the scope of challenges associated with Threat Management the answer is rarely ‘just buy and implement product X’. These are tough problems that have to be addressed at many levels: Laws, policies, funding, organizational culture, process changes, and technology. However, as governments and agencies consider how to address Data Strategy challenges, the advantages of implementing an Operational Data Hub or moving away from static intelligence life cycles via Object or Entity methods become more apparent — and more crucial.

Data Strategy Factors for Threat Management from MarkLogic.


Read more »



Jan
22
How Healthcare.gov Dared Greatly and Won
Posted by Jonathan Bakke on 22 January 2016 08:55 PM

Since the third annual Healthcare.gov open enrollment season began on Nov. 1, Centers for Medicare and Medicaid officials said, 2.8 million people have selected plans in the federal marketplace, and one million of them are new customers.1

– Robert Pear, New York Times


As open enrollment continues, millions of Americans are visiting Healthcare.gov. After its inauspicious start, today’s health insurance marketplace has come a long way from the cost sinkhole many predicted it would become. Making its debut to a great deal of political and technical skepticism, the marketplace, also known as “Obamacare,” has been transformed from the poster child of political controversy into a model digital data hub that impressively represents the federal government’s ability to leverage modern technologies to effectively manage exponentially growing volumes of complex, unstructured data with 99.9 percent up time and user response times under .01 of a second.2

Now, that’s nothing to laugh at.


Necessity Is the Mother of Invention

Born of necessity, the federal government’s turn toward new generation database technology might not have ever happened if Healthcare.gov hadn’t launched directly into crisis mode when it debuted in 2013. You may remember, or think you remember, the story, but for the sake of posterity, let’s wind back the clock to the beginning.

President Obama’s Affordable Care Act (ACA) passed in 2010, mandating that an online health insurance marketplace be up, running and available to the public by October 2013. Healthcare.gov – would be a data hub capable of pulling together a wide array of information to enable consumer eligibility and verification processes. Having never before attempted a data integration project of this scope and magnitude, the federal government tasked The Centers for Medicare and Medicaid Services (CMS) to develop and launch the marketplace.

Healthcare.gov soon became the biggest challenge in CMS’s 50-year history. What made it so difficult to pull off? Behind its deceptively consumer-friendly veneer, the site is built upon a data hub of amazingly complex information, including user identity and IRS records, insurance plans offerings from multiple providers, and information stored in legacy systems hosted and maintained in various states. All of this data flows into the marketplace in variable formats from diverse systems but has to be delivered to end-users seamlessly and securely. The integrated data also needs to be fully operationalized into a helpful, frustration-free experience for consumers – many of whom initially only reluctantly visit the online marketplace to avoid being fined for lack of health insurance coverage.

Compounding these data integration challenges was an ambitious timeline. The ACA mandated the system would go live on October 1, 2013, however, the broad scope of this project made the timeline extremely unrealistic given the state of data integration technology supported by the big database software companies.

As you may have heard, the initial launch of Healthcare.gov wasn’t exactly smooth sailing. So, what went wrong and how was it fixed?


Business as Usual Wasn’t Going to Cut It

From the start, CMS approached the Healthcare.gov project like most other federal projects, awarding contracts to suppliers of proven and safe, yet older technologies. This was a big mistake. Once development began, it quickly became obvious that the older generation database technology underlying the online marketplace wouldn’t handle the wide variety of data formats used by all of the states, insurers and other federal agencies flowing into the system. It also became clear that these database technologies wouldn’t be able to scale up quickly once consumers hit the site in droves on the first day of open enrollment. Ultimately, these shortcomings meant that applications built on top of the existing relational database wouldn’t function in a way that adequately supported the CMS’s mission.

A year into development, Henry Chao, the Deputy CIO at CMS, made a daring decision (remember, we’re talking about a federal agency not Google!) to change course and adopt new technologies that effectively supported CMS’s modern marketplace initiative saying, “When things were bad, we had the option to pivot, to scale out of a poorly written application.” The new database technology “gave us a set of options that would not have been possible with other technologies.”

To make up for lost time, CMS brought in an entire emergency team of hundreds of people from many of the most forward-thinking tech companies in the U.S., including VMware, MarkLogic and Red Hat, along with tech professionals from United Healthcare and Accenture. Some of the Silicon Valley’s best and brightest came aboard, too.

With a more agile, scalable Enterprise NSQL database now underlying the system, the newly formed team was able to jump right in and reconfigure or replace applications – all without incurring an increase in database interference, data loss or security risks. This would ultimately prove critical to shaving months off the development time in late 2013.

After a much-reported difficult launch, the site was quickly stabilized by Thanksgiving of 2013. And within a year’s time, the new team succeeded in transforming Healthcare.gov into the “Amazon.com” of the health insurance world – a feat that wouldn’t have been possible without Henry Chao’s savvy decision to try an innovative approach to modern data management.


Millions Now Have Access to Affordable Healthcare

Over the past two years, the operations of the health insurance marketplace have only gotten better. At the close of open enrollment in February 2015, Healthcare.gov had enabled over 11.7 million enrollments or automatic re-enrollments.3

This year, 38 states will rely on Healthcare.gov. When the new enrollment period began this last November, the site offered some highly useful new features, including a cost calculator feature that makes it easier for consumers to understand their total costs under various health plans. Jocelyn A. Guyer, a health policy analyst at the law firm Manatt, Phelps & Phillips, told The New York Times, “The new website helps consumers understand the importance of looking beyond premiums to consider deductibles and other out-of-pocket costs when selecting a plan.”4

General perception of health insurance marketplace has changed dramatically since its first rollout in 2013. Today, Healthcare.gov is running on some of the most modern technology in government, if not in business on the whole. And CMS is proving what can be accomplished when historically unchanging federal agencies dare to adopt modern technology and innovative thinking. It doesn’t have to be rocket science. You just have to have the right people and the right technology in place – and the willingness to try something new.

Sources
1. NYTimes.com: Affordable Care Act Plans Get 1 Million New Subscribers
2. MarkLogic.com: A New Data Prescription: How Healthcare.gov Was Delivered
3. NYTimes.com: Affordable Care Act Plans Get 1 Million New Subscribers
4. NYTimes.com: Revamped HealthCare.Gov Opens With New Tools for Gauging True Cost of Insurance

How Healthcare.gov Dared Greatly and Won from MarkLogic.


Read more »




Copyright © 2016 MarkLogic Corporation. All Rights Reserved. MARKLOGIC® is a registered trademark of MarkLogic Corporation.   Terms of Use  |  Privacy Policy  |  Careers  |  Sitemap