Overview of the Employee-Department Data Model Using Semantics
Our data model (Figure 1) describes employees and departments in a company’s human resources repository. The company is the fictional GlobalCorp. Our model is based on an example included in DHF’s GitHub repository.
Figure 1: Sample data model with departments and employees
The two main classes are Department and Employee. A department has an ID (departmentId) and a name (departmentName). An employee has an ID (employeeId), name (firstName, lastName), salary (baseSalary, bonus), hire status and dates (status, effectiveDate, hireDate, title), plus addresses, phone numbers, and emails. The latter are complex types, hence the four additional classes—Address, Phone, Email, GeoCoordinates (a type within Address)—which are datatypes used by Employee.
Relationships are particularly significant in this example; for example, an employee reportsTo another employee and an employee is a memberOf a department. We might physically represent these relationships by using document references or containment. The employee document, for example, could contain an attribute called memberOf, whose value is the ID of the corresponding department. However, GlobalCorp has decided to represent these relationships using semantic triples instead for the following reasons:
Since reporting structures and department memberships change frequently, GlobalCorp prefers to keep the representation of these relationships soft and maintain current relationships by updating triples than by re-routing document references.
GlobalCorp recently acquired rival firm ACME and has decided to use the standard W3C organizational ontology to represent that merger semantically. Having already started down the semantic road, GlobalCorp has decided to use the same ontology to represent human resource relationships.
GlobalCorp realizes the potential of SPARQL to run powerful human resource queries, such as the ability to build an organizational reporting tree without having to traverse employee documents.
If you look carefully at the model, you will see it is peppered with semantic, or “sem”, stereotypes. The model makes use of the Entity Services UML profile included in the toolkit. The profile defines semantic and other stereotypes used to map UML to Entity Services. Using these stereotypes, GlobalCorp is able to describe in the model the IRIs, RDF types, and RDFS labels of employees and departments. It also relates these entities using predicates defined by the W3 organizational ontology.
Here is a breakdown of the semantic stereotypes used in GlobalCorp’s model:
semType stereotype: Both Department and Employee classes bear the stereotype semType. This stereotype associates with each class an RDF semantic type. Department’s RDF type is https://www.w3.org/ns/org#OrganizationalUnit. Thus, from a semantic perspective, a department is an organizational unit as defined by W3C’s organization definition. Employee’s RDF type is friend-of-a-friend (FOAF) ontology.
IRI definitions: The Department and Employee classes also define an IRI. The purpose of the IRI is to uniquely identify a department or employee when we use it as the subject or object for a semantic triple. In each class we nominate one attribute to serve as the IRI, stereotyping that attribute as semIRI. For Department, that attribute is deptIRI of Department; for Employee, it is empIRI. Notice that each of the IRI attributes also bears the stereotypes xCalculated and exclude. Thus, these IRI attributes are merely calculated fields, used to help construct triples. That attribute will not be included in the XML document representation of the department or employee. The concat tag indicates how the IRI’s value is calculated. For example deptIRI is the concatenation of “http://www.w3.org/ns/org#d” and the department ID.
RDFS labels: Each class also defines an RDFS label. In semantics, it is a good practice to associate a user-friendly label with the IRI . As with IRI, we nominate one attribute in each class to serve as the label; we stereotype it as semLabel. For Department that attribute is departmentName. For Employee, it is empLabel. Notice that departmentName is not a calculated field; it is a full-fledged attribute that will also appear in the department’s XML document. empLabel, on the other hand, is an excluded field whose value is calculated from the firstName and lastName attributes.
Employee reportsTo Employee: The association shown as reportsTo, which relates one employee to another, is a semProperty with the predicate https://www.w3.org/ns/org#reportsTo. Thus if employee A reports to employee B, we construct a triple whose subject is the IRI of employee A (employee A’s empIRI), whose predicate is the one given, and whose object is the IRI of employee B (employee B’s empIRI). Notice the exclude stereotype; the XML representation of an employee will not contain the reportsTo element. We will maintain the relationship solely using a triple.
Employee memberOf Department: The association between Employee and Department shown as memberOf is a semProperty with predicate https://www.w3.org/ns/org#memberOf. The triple we create has the employee’s empIRI as subject, the predicate given, and the department’s deptIRI as object. This relationship is excluded from the document.
Specifying the stereotypes in the model is beneficial because the toolkit’s transform module, which maps the UML model to Entity Services, understands these semantic stereotypes and generates code to create triples based on the content of the document. For example, here in Figure 2 is the code the toolkit generates to create employee triples showing that every aspect of this code arises from the semantic stereotypes:
Figure 2: Auto-generated code creating triples based on semantic stereotypes
Figure 3 shows some example triples describing employee 114, his superior, and his department. He is a FOAF agent named Earl Garza who reports to Ruth Shaw (employee 1) and is a member of R&D (department 4).
Subject
Object
Predicate
org#e114
rdf:type
FOAF Agent
org#e114
rdfs:label
“Earl Garza”
org#e114
foaf/name
“Earl Garza”
org#e114
org#reportsTo
org#e1
org#e114
org#memberOf
org#d4
org#e1
rdfs:label
“Ruth Shaw”
org#d1
rdfs:label
“R&D”
Besides the triples relating employees and departments, GlobalCorp also populates into the final database triples describing the merger. The triples use the W3C organizational ontology. Figure 5 summarizes how the triples are structured, representing Global and ACME as organizations and indicating the organizational change event in which ACME merged into Global.
Subject
Object
Predicate
org#Global
rdf:type
org#Organization
org#ACME
rdf:type
org#Organization
org#ACMETakeover
rdf:type
org#ChangeEvent
org#ACMETakeover
org:originalOrganization
org#ACME
org#ACMETakeover
org:resultingOrganization
org#Global
Figure 5: Example triples describing organizations and organizational change events
Further Learning
To learn more about semantics and the MarkLogic Data Hub Framework, refer to the following resources: