MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More


Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up


Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up


Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up


Descriptive Metadata: Media and Entertainment’s Digital Twin Blockbuster
Posted by Michael Malgeri on 07 October 2019 12:55 PM

Before anyone could prove it by experiment, the great Albert Einstein postulated that if one of a pair of twins travels at near the speed of light and then returns, he or she will have aged less than the twin that stayed behind. This came to be known as the “Twin Paradox.” Basically, the twins’ lives diverged because the environment surrounding one twin became out of sync with the other. Specifically, time slowed among the things, people, processes and relationships that existed among one twin “relative” to the other, causing one’s environment to wander from its counterpart. While there are heartwarming stories of separated twins meeting up later in life only to find they’ve maintained uncanny similarities, the effects of dislocation are inescapable.

In our world of global enterprises, products and services have multiple representations, one might say “twins,” throughout the organization. For reasons other than Einstein’s time dilation, information about these products and services can get out of sync, causing problems in production, distribution and consumer use of these goods. Broadly speaking, problems can range from increased costs throughout the enterprise to lost revenue from missed opportunities.

The notion of a “digital twin” has already emerged in the manufacturing sector. One of its goals is to facilitate the monitoring, reporting and integrating of information pertaining to products and services in order to keep their respective environments in sync. Another goal is to maintain constant “situational awareness” about the environment, in order to act in an agile fashion. It accomplishes this by collecting data through ubiquitous embedded sensors (IOT devices). The net results are to mitigate cost overruns, reduce risk and capture additional revenue.

Applying the digital-twin concept to media and entertainment might seem like a stretch. Today’s M&E enterprises are already mostly digitized, so where does IOT play a role? While the industries differ in many ways, this article will explore how extending manufacturing’s digital-twin concept can deliver benefits to M&E.

More on Digital Twins

Before making the case for a borrowed metaphor, it helps to understand digital twins in greater depth. In one of its simpler forms, a digital twin can be a computer model of something physical. For years, engineers have used computer-aided design (CAD), computer-aided manufacturing (CAM) and computer-aided engineering (CAE) software, collectively known as CAD/CAM/CAE. This software allows users to digitally design, build and analyze a countless variety of parts and assemblies, and simulate their behavior in real-world environments.  Such modeling and simulations are far less costly than building actual prototypes and allows for iterative development. The net result not only impacts cost, but lowers risk and provides deeper insight about the product and processes throughout its lifecycle. Opportunities emerge from this greater situational awareness.

Digital twins can model not just physical assets, but other elements in the surrounding environment. IOT sensors embedded in parts, assemblies, manufacturing equipment and test equipment produce timely information. Information also comes from people with cyber-physical devices acting as participants across the supply and logistics chain. Trends are spotted, warning signs are detected, mistakes are corrected and innovative insight is mined. The entire product life cycle can be virtually modeled, observed and subsequently influenced, for the benefits derived as previously suggested, through the agile use of technology for information management.

With that background, let’s see how some of these ideas carry over in the media and entertainment industry.

Digital Twins in M&E

We’re not at the point where CGI can create a digital representation of “insert your favorite actor here,” and have it act indecipherably from its real counterpart. However, “Gemini Man,” if not for Will Smith providing motion to the CGI character, makes us consider the possibility of that time arriving not too far in the future.

Assuming all intellectual property rights have been negotiated, and the CGI character gets a properly outfitted, on-location Winnebago, it may become routine for movie scenes to be reasonably prototyped and tested before audiences and critics, many times at significantly less cost, before real actors are called onto the set.

What about digital twins in distribution? Nowadays, nearly every organization in every industry represents some aspect of their product or service in digital form and has some automated process for distributing it. From sales literature and support tickets to product downloads (software, books, music, entertainment, reports, etc.), the entire process can be tracked at multiple points for accuracy and efficiency, and analyzed for opportunity.

The same can be said for packaging, invoicing and payment of licensable material. Content licensing in media and entertainment follows a complex chain of events, beginning with the terms of sale that is captured in legal contracts and proceeding through “release packages” that consist of various titles and their entitlements and culminate in complex revenue recognition through invoicing and payment processing. Digitizing as much of this distribution lifecycle as possible brings benefits already enumerated.

There’s yet another player in the world of digital twins in M&E that arguably has the largest impact on this dynamic industry. That player is descriptive metadata.

What Is Descriptive Metadata?

This is rudimentary for some in M&E, however, let’s review it. Descriptive metadata captures the non-technical, non-administrative aspects of content. It’s primarily concerned with the relationship with the audience, i.e., what they see, hear, feel (emotionally), surmise and anticipate and put another way, what they experience.

Descriptive metadata consists of information not limited to the following dimensions of media:

  • Characters
  • Actors
  • Genre
  • Location
  • Gender
  • Setting
  • Tone
  • Theme
  • Plot
  • Emotions
  • General descriptive keywords and phrases

While in the past, most of this metadata was, and still is, captured by humans placing metadata tags on each time-coded scene, machine learning in the form of automated image and audio processing is taking on some of the workload. Image processing has advanced to the point of recognizing not just a face, but whose face. Audio processing, along with motion detectors and camera-outfitted athletes, can realistically replace an announcer at sporting events. “Lebron dribbles to the top of the key, he steps back beyond the three-point line, he stops, jumps, shoots, and the Lakers win it by one! The crowd roars!”

The time-coding aspect is important because it not only greatly aids in finding any given frame of content, but it enables interesting, time-relative queries to be fulfilled.Here’s an example query:

Find all of the scenes across our catalog, in action movies, where a male hero says, “I love you,” then, within five seconds after saying it, he’s kissing the heroine.

A query like this could be further restricted with location (in the jungle, in Europe, in NYC), time period (roaring twenties, WWII, 60s) or even additional time criteria such as, “the heroine slaps the hero within five seconds after being kissed.”

In certain streaming video modes, how about analyzing the effects of inserting different types of pop-up ads or recommendations shortly after a specified sequence of events. The query scenarios and their associated benefits grow once a platform is created that can readily leverage this type of information.

Descriptive Metadata and Digital Twins

How does descriptive metadata remain true to the digital-twin analogy?  When an audience watches a feature film or TV show, they see, hear, feel (emotionally), surmise, anticipate and experience through their senses. Some may remember their experiences longer than others and recall them with greater or less fidelity. Descriptive metadata serves as a reminder and a log of those experiences for humans and machines if captured in a machine-readable way (more on this later).

Descriptive metadata is a conceptual replica of what producers believe audiences will experience when viewing their content—just like a human twin is a somewhat physical replica of a sibling, and the CAD/CAM/CAE model is a digital replica of a physical part or assembly.

Because descriptive metadata is granular, information that is semantically relatable across dimensions of time, location, characters, actors and various emotions and story components, it enables content providers to be successful at “know your product (KYP)” initiatives. It meshes nicely with “know your customer (KYC)” and “direct to consumer (DTC)” initiatives. In the spirit of Einstein, one could express this as a “direct to consumer equation:

Direct to Consumer = (Know Your Product) + (Know Your Consumer)

Or more concisely:


That’s an equation with more symbolism than real quantitative use, however, it does convey a path towards business value.

Business Value of Descriptive Metadata and the DTC Equation

Information that fully describes a content provider’s product, typically called a “title,” can help deliver value as follows:

  • Reduce cost by not having to rely on select individuals (experts) to locate content.
  • Reduce cost and increase efficiency by eliminating redundant work, e.g., when looking for something that was already found.
  • Increase revenue by searching the entire studio catalog to suggest related contact to licensees.
  • Increase revenue by finding content quickly for reuse on new projects.
  • Increase revenue by pursuing opportunities not previously possible.
  • Increase revenue by addressing speculative use cases and experiment.
  • Increase revenue by rapidly finding memorable moments for prospective licensees.
  • Reduce the risk and associated costs of non-compliance in IP rights and regulations. For instance, about 20% of tagging is AI/machine learning/image recognition, and 80% is human. Certain compliance issues require perfection (e.g., checking for nudity), which is more efficiently achieved through a combination of machine learning and human checking.

These are real benefits extracted when an enterprise knows its product. The M&E industry can knock the ball out of the park by knowing its consumer. Knowing your customer (KYC) comes from a variety of sources. Some typical ones are as follows:

  • Demographic information stored in profiles from company websites
  • Financial information from sales campaigns
  • Interest information from marketing campaigns
  • Sentiment information from surveys and social media
  • Engagement activity from streaming media applications

The more enterprises can combine and leverage KYP and KYC data in meaningful analytics, the better it can market directly to consumers (DTC), with targeted products and services for greater consumer engagement and loyalty.

In the final section, let’s look at how everything we’ve discussed can be enhanced through a multi-model, semantically enabled data hub service, which avoids the twin paradox and increases the chances for creating a digital-twin blockbuster.

Data Hub Service for a Digital-Twin Blockbuster

If a digital-twin paradox is defined as unintegrated, out-of-sync digital models, a digital-twin blockbuster is the exact opposite. There are a number of platforms capable of building digital twins. One that’s gaining attention is the data hub service.

As can be seen in the figure below, when compared to other anything as a service (XaaS) offerings, the service goes beyond database as a service (DBaaS) to deliver the following out-of-the-box features:

  • Pay as you go model
  • Auto scalability, both up and down
  • High availability
  • Backup
  • Configurable security that integrates with enterprise-wide systems
  • Smart, agile data mastering
  • Data harmonization (integration) to an entity model
  • Multi-model database (document and semantic graph)
  • Geospatial indexing and search
  • BiTemporal search
  • Data provenance and data lineage
  • Time-based descriptive metadata modeling
  • Semantically enriched integrated search

The first five are table stakes in this XaaS category. The remaining features, however, when delivered at scale with the others, can bring a digital twin to blockbuster status. Let’s see how.

As the number and type of data sources grow within and outside an organization, the need to match and merge semantically identical entities becomes more important and complex. Smart, agile data mastering bypasses traditional enterprise MDM initiatives that are time- and resource-intensive and often fail. It provides tools for mastering as needed, with greater speed and less risk.

Data harmonization (integration) to an entity model is a pre-cursor to mastering, as different data sources more often than not represent semantically identical information as different data models. Data types and structures need to be denormalized into a business entity, such as a title, which can then be semantically related to other entities in the pursuit of 360 views.

In the case of time-based, descriptive metadata, dimensions such as character, plot, theme and tone, to name a few, need to be matched with a content’s time slices and semantically related to other dimensions. This requires separate data models. For instance, see the following figure:

The value “Karen Gillan,” from the ActorName dimension, is captured in a document model with a five second time slice into the feature with a TitleID of G77777777. Karen Gillan is also related to her IMDB ID, using a subject->predicate->object model, known as a “semantic triple.”

This brings us to a discussion of a key technology, the data hub, which enables descriptive metadata to become M&E’s digital-twin blockbuster.

A data hub with this type of multi-modeling, along with the power to query both models simultaneously, delivers targeted, 360 views in a variety of use cases (more to follow on this).

A data hub with geospatial indexing and search could have been put into the table stakes category, given its importance as a feature in location-driven applications, popularized by mobile devices. However, not all XaaS offerings deliver this feature in an integrated way at scale.

A data hub with biTemporal search is more esoteric and popular in use cases when proof must be provided not only about when something occurred in the real world, but when the SYSTEM that tracks such a fact(s) knew about it. While this feature is often required in financial services, insurance and medical industries, it may find applicability in M&E in regulatory compliance.

A data hub with data provenance and lineage are important aspects of governance that play an important role in intellectual property and entitlements tracking and regulatory compliance.

A data hub with semantically enhanced search against semantically enriched, time-coded descriptive metadata models can retrieve precise information for targeted use cases. Some examples are:

  • Role-based searches – Finance, marketing, sales and creative team members will receive different results from the same query that are semantically related to their individual roles.
  • Targeted licensing – Searches by or for licensees are enhanced by relationships with licensee profiles, past searches or pertinent dynamic information, such as current events in a licensee’s location.
  • New project development – Creative team members can find useful assets among clips, images, sound bites and script lines to help with new projects.
  • Asset reuse – Whether for licensing or new project development, asset reuse not only cuts cost but it extends the life of existing assets and brings in new revenue.
  • Consumer targeting – Whether through recommendation engines, sales or marketing campaigns, individual consumers or precisely segmented groups can be effectively targeted by matching the right product and consumer information through semantic search.

Finally, a data hub with a feature that is becoming increasingly important in creating and leveraging all kinds of data, including descriptive metadata, is machine learning. In M&E, a data hub service with machine learning that can train and run models, and run externally developed models, will provide an essential added service for some or all of the use cases described above.  Recommendation engines, customer segmentation and target marketing are just three machine learning use cases that implement the “direct to consumer” equation by analyzing data about products (KYP) and matching it to what is known about consumers (KYC).

That’s a Wrap

The success of digital twins in manufacturing suggests that similar ideas can be leveraged in other industries, such as media and entertainment. In all the ways we’ve described, descriptive metadata, along with a data hub service (DHS), such as what MarkLogic provides, can help M&E enterprises avoid the twin paradox and create digital twin blockbusters, which can be leveraged for success. Albert Einstein would be proud.

The post Descriptive Metadata: Media and Entertainment’s Digital Twin Blockbuster appeared first on MarkLogic.

Comments (0)