Logicians going back at least as far as Charles S. Peirce  – and computer scientists as early as the entity-relationship (ER) model from the mid-1970s  – have made the relations-attributes distinction for predicates relating to data objects. There are both conceptual and practical bases for these distinctions. This article elaborates upon the relations-attributes distinction in the UMBEL Attributes Ontology. I also try to precisely define my terms because terminology is overlapping and confusing amongst competing data models and standards.
In one of its first communiques in 1999 regarding the Resource Description Framework, its sponsor, the W3C (World Wide Web Consortium), noted the RDF data model was a member of the entity-relationship modeling family . Though for its own reasons the W3C chose to label the relationships between things as “properties” in RDF , for practical modeling reasons the E-R model distinction into attributes and relations provides additional explanatory power. It also provides a more useful and tractable means for modeling the connections between things, an essential requirement for efficient data interoperability. Understanding these distinctions is an important basis for understanding the structure and design of UMBEL’s Attributes Ontology.Recap of the UMBEL Attributes Ontology
My prior article introduced the Attributes Ontology (AO), a new module and extension about to be released for UMBEL (Upper Mapping and Binding Exchange Layer). UMBEL is both a vocabulary and an ontology reference structure for concepts. UMBEL’s role is to help match the discussion of topics and things across the Web.
The spoke-and-hub design  that UMBEL provides for concepts led to an obvious question of how the relationships (“properties”) of things can be similarly organized and managed. Referring again to the prior article, we put forward the following stack (as initially informed by Pietranik and Nguyen ) about how to look at the data interoperability space from a semantic technologies perspective:
The Ontology Stack
The prior article looked across the stack and noted the importance of relying on open standards. It noted the growing availability of public datasets and knowledge bases to inform our understanding and structure of the various layers in this stack. In this article, however, we mostly concentrate on the Properties layer, and its split between relations and attributes.Definitions within the Attributes Ontology
Our symbols and the referents in our language are, as Peirce pointed out, a consensual process of society to converge upon the meaning of the words in our language. This is always the tricky thing about language: We think we know what the words or terms mean because of the converging process, but in our individual interpretations we still may have slight differences about what exactly these words or terms encompass and mean. Further, when one is dealing with semantics and meaning, being explicit about the terms used is all the more essential.
Thus, here are the exact definitions as used within UMBEL and the Attributes Ontology:
To illustrate why defining terms is so important, let’s look at three of the terms above, and how they are described in other nomenclatures or terminological systems:UMBEL/AO Terminology Teminology Used Elsewhere
So, please try to keep in mind the definitions listed in the bullets above when looking at the distinctions in UMBEL or the Attributes Ontology.Pragmatic Reasons for a Properties Split
Fundamentals matter. If someone were to ask you how the world was organized, what would you say? Does it even matter?
If it does matter, as it does for many with responsibility for getting divisions or people or accounting systems or companies to work with one another — that is, data interoperability — then you have to eventually grapple with the “how the world is organized” question. In my early training in plant systematics, we learned about ‘lumpers and splitters‘, taxonomists with different world views that emphasized similarities or distinctions in how they cataloged the world. Other perspectives and worldviews provide similar dichotomies or spectra in how to define and characterize the world. The challenge is how to get multiple parties to buy into an approach to data interoperability that is somehow grounded at a very fundamental level with a basis that all parties can agree as a foundation.
For a decade, I have believed that one part of that foundation should be grounded in RDF . I like the simplicity of the RDF subject – property – object ‘triple’ statement. I like that RDF is an open standard. I like the ability of simple RDF statements to be combined into more complicated and sophisticated vocabularies and languages, and then ontologies. I like RDF’s applicability to any form of information, its expression in multiple serializations, and its ability to represent virtually any form of data in the wild. Though I think I understand the critics who want more expressiveness still, such as in concept models or higher-order logics, I think we are still working out the basics of human concepts and languages in a machine-understandable context. That is why I continue to try to work in the RDF and OWL sandbox, even though I suspect they will eventually be supplanted by more capable constructs.
Yet one of the things I don’t like about RDF is the semantics and terminology of the ‘property’ construct. I think it conflates relationships between things, which help us to organize and understand connections between objects in the world, with how we describe those things. While OWL provides some improvements in that we now can distinguish between data, object and annotation properties , those distinctions do not really get at the fundamental conflation of the ‘property’ construct. I further suspect that one of the reasons we have yet to see ABox or instance data-level ontologies to help the data interoperability question — what I first discussed in the introduction to the Attributes Ontology — is this very same conflation.
Fortunately, the extensibility of RDF and OWL via ontologies gives us a method for cleaving this conflation apart.
If we assume we can tease apart the fundamental nature and coverage of ‘properties’, what are the basic conceptual splits that represent this construct? From the aspect of data interoperability, I think we can see three: relations, attributes, and annotations.
The first conceptual split for ‘properties’ is relations. At the most fundamental level, we have things in the world and relationships or connections between those things. When we say things like dogs are a kind of mammal or Lassie is a dog, we are categorizing things by type. When we say that hair or toenails are parts of a mammal we are relating parts of an animal to a whole. When we say that mammals have hair but birds have feathers we are drawing distinctions between the two animal types. These kinds of statements tend to place the objects in our world in relation to one another. By so doing, we provide an organized view of the things in the world and give those very same things context. In all of these cases, our statements specify a relation between things that, combined with other relations, provide a schema or conceptualization of how things in the world relate to one another.
Some of these organizing principles are mental and intellectual constructs for how we group things together, such as dogs and people as mammals or mammals and birds as animals. Some of these organizing principles are ideas or concepts such as truth, beauty and conflict, a richness of terminology that gives us further explanatory power for how to place and give context to the things in our world. Relations between things are thus ultimately contextual in nature; they help to place our understanding of things in connection to other things. This is the portion of RDF ‘properties’ that we call relations, and they are explicitly excluded from our Attributes Ontology.
On the other hand, we look to separate the existence of some things different from other things by the nature of their characteristics, what we can observe and describe about that given thing. So, we describe shapes, sizes, weights, ages, colors and characteristics of things with increasingly nuanced vocabularies. We note that grasses have linear or simple leaves, oaks have serrated or wavy-shaped leaves, and carrots have branched or compound leaves. We distinguish hair color, eye color, place of birth, current location and a myriad of factors. Each one of these factors becomes an attribute for that object, with the specific values (simple v wavy v compound) distinguishing instances from one another. Attributes are the second conceptual split for ‘properties’.
These same distinctions were described by Chen in his attempt to find a common ground across network, relational and entity set models  in his E-R model. These are represented in pictorial form in the Wikipedia entity-relationship model article as follows:Relation Form of
Further, in a later elaboration of where his E-R modeling ideas arose, Chen was able to correlate these relationships to natural language , which I have updated to reflect the terminology herein:Word Sense AO Component common noun proper noun transitive verb intransitive verb adjective adverb concept / entity type / entity entity relation attribute attribute attribute (property) Mapping of Word Senses to Attributes Ontology Components
Note that attributes may also apply to the relation-type of property.
The third conceptual split for ‘properties’ is annotations, or metadata or “data about data”, which can apply to anything. Annotations give us a way to describe the circumstances and provenance of the item at hand. Annotations capture the circumstances or conditions or contexts or observations for the thing at hand. Where did we discover or find it? When did we find or elaborate upon it? By whom or when was it found or elaborated? What is our commentary about it? While these are all external elaborations of the thing at hand, and not intrinsic to the nature of the thing, they are all characterizations about a given thing. In these regards, annotations have as their focus a given object, similar to what is true for attributes. As a result, we have included annotations in the Attributes Ontology as well.
Thus, with respect to RDF ‘triples’, we can now map the three parts of the assertion statement as follows:subject property object concept entity type entity relation attribute object value Mapping of an RDF ‘Triple’ to Attributes Ontology Components
This mapping sets the overall context for how the Attributes Ontology relates to the basic RDF building blocks.From the Theoretical to the Pragmatic
These kinds of distinctions are not new. In philosophy, related distinctions have been drawn about intrinsic v extrinsic properties  or intensionality v extensionality . For conceptual models with specific reference to ontologies, Wand et al  in 1999 were making the distinction between intrinsic properties (akin to what we term attributes herein) and mutual properties between things (what we term relations). Unfortunately, at that time, the conventions of RDF had not yet become prevalent and the idea of annotation properties had not yet emerged (from OWL). These later distinctions are important, but the Wand et al discussion still is helpful to elucidate the same pragmatic and theoretical considerations.
More recently, the DERA initiative from the University of Trento has embraced these same distinctions . Unfortunately, no ontology supporting these viewpoints has yet been made public.
We are thus pretty much in virgin territory. While having a sound conceptual and theoretical basis is essential, which apparently we do, the real reasons for carving out an attribute perspective on RDF properties are pragmatic. Since attributes are the properties of an entity, we can better interoperate entity data by concentrating on those aspects that let us match data in one set of records to similar data in totally different records. By building a new vocabulary and structure upon RDF, we can provide a more sophisticated handling of ‘properties’ than RDF or OWL alone can provide in their native forms.
Specifically, an attribute focus, expressed in an Attributes Ontology, which conforms with open standards and is designed explicitly as a reference grounding, gives us these advantages:
Making these distinctions operational, in part, is the purpose of the Attributes Ontology.
As a relatively cutting-edge effort, we expect some false steps and likely hiccups as we move to put in place this reference structure. However, we think this effort to be both innovative and essential to ongoing use of semantic technologies to tackle the decades-long challenges of data interoperability.Other Related Concepts
Besides the other links mentioned in this article, here are some additional articles on Wikipedia that provide other and varied perspectives on the concepts and terminology used herein:
The next article in this series introducing the Attributes Ontology will discuss the related basis for segregating out ‘entities’ in UMBEL. The entities and attributes work closely with one another to aid data mapping and interoperability. For more discussion of Peirce in relation to semantic Web issues, see . According to the ER model entry in Wikipedia, it is in accord with philosophic and theoretical traditions from the time of the Ancient Greek philosophers: Socrates, Plato and Aristotle (428 BC) through to modern epistemology, semiotics and logic reflecting the views of Peirce, Frege, Russell and Carnap.  Peter Chen, 1976. “The Entity-Relationship Model – Toward a Unified View of Data“, in ACM Transactions on Database Systems 1 (1): 9–36, March 1976.  Ralph R. Swick and Henry S. Thompson, eds., 1999. The Cambridge Communiqué, a World Wide Web (W3C) Note, 7 October 1999.  Patrick Hayes, 2004. “RDF Semantics,” a W3C Recommendation, February 2004. See http://www.w3.org/TR/rdf-mt/.  A spoke-and-hub design (n-1) for data mapping is tremendously more efficient than the most common approach of pairwise mappings (a quadratic function). For example, ten datasets would require 9 composite mappings in a spoke-and-hub design versus 45 in a pairwise approach. And, of course, datasets themselves contain tens to thousands of attributes, compounding the map scaling problem further  Marcin Pietranik and Ngoc Thanh Nguyen, 2011. “Attribute Mapping as a Foundation of Ontology Alignment,” N.T. Nguyen, C.-G. Kim, and A. Janiak (Eds.): ACIIDS 2011, LNAI 6591, pp. 455–465, 2011.  UMBEL is based on the OpenCyc version of the Cyc knowledge base; Cyc uses the term “collections” to refer to RDF classes.  You sometimes see entities defined as “self-contained”; that is not strictly followed here. What is more important is being a nameable thing with attributes.  M. K. Bergman, 2012. “Give Me a Sign: What Do Things Mean on the Semantic Web?,” in AI3:::Adaptive Information blog, January 12, 2012. See especially the discussion of the toucan.  Some of the terminology sources are E-R, concept maps, RDF, Cyc, descriptive logics, data modeling (of various types), OWL, etc. Most particularly, sometimes entities are referred as including concepts; the Attributes Ontology does not.  I have been but one of many arguing for the benefits of RDF. For my views, see M. K. Bergman, 2009. “Advantages and Myths of RDF” in AI3:::Adaptive Information blog, April 8, 2009.  See Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F. Patel-Schneider, Sebastian Rudolph, eds., 2012. W3C Recommendation, World Wide Web Consortium, 11 December 2012, for the annotation – object – datatype property distinction.  Peter Pin-Shan Chen, 1997. “English, Chinese and ER diagrams.” in Data & Knowledge Engineering 23, no. 1 (1997): 5-16.  Stanford Encyclopedia of Philosophy, 2012. “Intrinsic vs. Extrinsic Properties“, online article, first published January 5, 2002; substantive revision December 23, 2012.  At least for Carnap, he thought “…the full meaning of a concept is constituted by two aspects, its intension and its extension. The first part comprises the embedding of a concept in the world of concepts as a whole, i.e. the totality of all relations to other concepts. The second part establishes the referential meaning of the concept, i.e. its counterpart in the real or in a possible world“.  Yair Wand, Veda C. Storey, and Ron Weber, 1999. “An Ontological Analysis of the Relationship Construct in Conceptual Modeling,” in ACM Transactions on Database Systems (TODS) 24, no. 4 (1999): 494-528, December 1999. Also see, Jeffrey Parsons and Yair Wand, 2003. “Attribute-Based Semantic Reconciliation of Multiple Data Sources,” in Journal on Data Semantics LNCS 2800.  Fausto Giunchiglia and Biswanath Dutta, 2011. “DERA: A Faceted Knowledge Organization Framework,” Technical Report # DISI-11-457, University of Trento, March 2011; submitted to the International Conference on Theory and Practice of Digital Libraries 2011 (TPDL’2011). DERA, in fact, stands for domain, entities, relations and attributes, but is mostly derived from the work of S.R. Ranganathan, the 20th century Indian library theorist and mathematician. DERA appears to have evolved into later projects, but the provenance is unclear.  This provides a logical invitation to a similar ‘relations ontology’ for capturing the role and relationship aspects of RDF properties. Such relations stipulate topographical relationships, hierarchical relationships (subClassOf, fatherOf, daughterOf), mereological relationships (partOf, isComponent), role relationships (isBossOf, hasTeacher, isKeyInfluencer) or approximation relationships (isLike, isAbout, relatesTo).  Actually, this is a bit more complex than the definition. In OWL modeling, a “concept” may also act as an instance (“individual” in RDF terminology) through what is known as metamodeling; see M. K. Bergman, 2010. “Metamodeling in Domain Ontologies” in AI3:::Adaptive Information blog, September 20, 2010.
The semantic Web does not yet have the complete infrastructure for supporting data interoperability. Most ontology mapping or alignment efforts have focused on concepts, or the class structure of the schema. Comparatively little has been done on instance mapping or predicate (property) mapping . Yet these considerations should reside at the heart of how semantic Web technologies can assist data interoperability.
We began the UMBEL (Upper Mapping and Binding Exchange Layer) vocabulary and ontology as a reference structure for concepts, a means to help match the discussion of topics and things across the Web. As such, UMBEL is part of a fairly robust library of upper ontologies that are meant to provide the grounding references for what information is about. Domains as diverse as biomedicine, banking, oil and gas, municipal governments, retail, marine organisms and the environment — among many others — have effectively leveraged upper ontologies to get diverse datasets and vocabularies to relate to one another. This is much welcomed, to be sure, and a good indicator of how semantic technologies can begin to approach getting data to interoperate.
Here is one way to look at the data interoperability space from a semantic technologies perspective (as initially informed by Pietranik and Nguyen ):The Ontology Stack
The overall semantics of the structure — indeed, how the structure itself is defined — comes from which ontology languages and vocabularies are used. From an expressiveness standpoint, particularly in conceptual relations or domain schema, there are a variety of standards and specifications from which to choose . We also have pretty good reference ontologies for many domains and what is called the upper levels. We are also starting, through efforts such as Wikipedia (DBpedia and Wikidata), schema.org, Freebase and OKKAM, to get referencable datasets of entities and their attributes, sometimes organized by type.
Reference groundings for properties, on the other hand, have received virtually no attention . SIO, the Semanticscience Integrated Ontology, is one attempt to provide a reference structure for properties in the science domain. The approach is exemplary, but still lacks the scope required of a general grounding vocabulary. QUDT, the Quantities, Units, Dimensions and Data Types Ontologies, provides a standard vocabulary for measurement quantities, but lacks the scope to capture non-quantitative measures for describing things. Both SIO and QUDT should inform and contribute to a still-needed broader treatment of how to describe entities. That is the purpose of the Attributes Ontology in the forthcoming new release of UMBEL.Attributes within the Semantic Technology Stack
The properties in RDF triples (s – p – o) relate two things, the subject and object, to one another. One pragmatic way to understand properties, which are the predicates or verbs of these triple statements, is that they fall into two broad categories. The first category are the properties between or among different things; they are extrinsic to the subject at hand. These relations stipulate hierarchical relationships (subClassOf, fatherOf, daughterOf), mereological relationships (partOf, isComponent), role relationships (isBossOf, hasTeacher, isKeyInfluencer) or approximation relationships (isLike, isAbout, relatesTo). Both subjects and objects are concepts or identifiable things (entities).
However, the second category of properties, attribute properties, has a different nature. Attribute properties — attributes for short — are characteristics of an entity or entity type (class). They describe the entity at hand in the nature of key-value pairs. The key is the attribute, and the value is the literal value or object reference. In broad terms, attributes are the specifics of what is contained in a data record for a given instance. Multiple instances, or records, make up what is known as a dataset.
Attribute properties are intrinsic or descriptive properties. The combination of possible attributes for a given entity constitutes the intensional definition of that object. This use of the term attribute is consistent with its research sense as a descriptive characteristic of an object or its computing sense as being a factor of a given object. In the spirit of this inclusive sense of how attributes describe a given thing, we also include annotations and metadata as part of the attributes category of properties as well. All attribute properties provide a description or characteristic for the entity at hand.
Here are some example key-value pairs about me, the entity Mike Bergman, to illustrate the diversity of how attributes may describe things:hair : red college : Pomona College mood : happy spouse : Wendy cat : Snuffles location : 41°41′18″N 91°35′12″W dateEntered : 02/16/2015 country : USA city : Iowa City, IA occupation : CEO avocation : flyfishing, cooking lastBook :
The infoboxes in Wikipedia are another example of such attribute types and values. Note that the values may vary widely as to units or quantities or even links to other things. Also note it really does not matter what order the value pairs are presented and some values refer to other objects (shown as links).
Virtually any data format or data serialization in existence can be expressed in such key-value pairs. Further, related types of entities have related attributes, such that attribute relationships are an alternative way to describe typologies. My attributes, as a human, are quite similar to attributes for other humans, and somewhat close to other mammals. But my attributes are very different than those for a worm or an automobile.
Even simple attributes can pose a challenge for mapping, absent a grounding framework. My name, for example, is Michael Kermit Bergman, which is often provided as Michael Bergman, Mike Bergman, M K Bergman, mkbergman or Michael K Bergman, and the fields that can capture those variants can capture one to four name parts, all called something different. References, rules, semsets (synonyms, jargon and aliases), and coherent organization are needed to ground all of these variants into a common form.
Attribute properties may be quantitative (with a quantitative, measurable value), qualitative, or descriptive or annotative. In many cases, the actual value of an attribute is a literal or numeric value, but it may also be an object, as when the value is a member of an enumerable set or its own defined entity. Describing something as having a color characteristic of red, for example, may result in a literal assignment of the string “red” or it may refer to another object definition where red is specified as to its chromatic properties. Further, if my idea of red was in context with my own personal record (as above), then the referent is more properly something like red hair. Semantics (and, thus, context) matter in data interoperability. I will describe more the rationale and importance of the relation-attribute property split in a following article .
The purpose of semantic technologies is to overcome some 40 categories of semantic heterogeneity, as I most recently discussed in . One interesting aspect is the large number of semantic differences that may be ascribed to attributes, as this table from  shows (see the yellow entries):Class Category Subcategory Examples Type  LANGUAGE Encoding Ingest Encoding Mismatch For example, ANSI v UTF-8 Concept Ingest Encoding Lacking Mis-recognition of tokens because not being parsed with the proper encoding Concept Query Encoding Mismatch For example, ANSI v UTF-8 in search Concept Query Encoding Lacking Mis-recognition of search tokens because not being parsed with the proper encoding Concept Languages Script Mismatch Variations in how parsers handle, say, stemming, white spaces or hyphens Concept Parsing / Morphological Analysis Errors (many) Arabic languages (right-to-left)
We can see that attribute heterogeneities may apply to the attribute itself (the key in a key-value pair), as to what it may contain and what it may refer to, as well as to the actual values and their units and measures. These aspects are important, in that they are the very ones we mean when we talk of data.Rationale for an Attributes Ontology
When we combine the descriptions of things, we need ways to overcome these sources of semantic heterogeneities. As with concepts, it would be extremely helpful to have a similar attributes vocabulary, and one which is organized according to some logical attribute schema. This combination of vocabulary and schema defines what constitutes an attributes ontology. It can also be a reference grounding for how to relate data from different datasets to one another. Providing this grounding is the driving rationale for UMBEL’s new Attributes Ontology.Benefits
In addition to this overarching rationale in data interoperability, a reference Attributes Ontology brings with it a number of benefits:
These benefits can be realized in any data integration or interoperability setting. However, the benefits are particularly strong for these use cases:
As we have noted many times, these uses also benefit from the incremental and open world ability to expand the scope of the data integration at any point in time .Description of the Attributes Ontology
We have recognized the importance of the attributes category going back to the first introduction of SuperTypes in UMBEL v.0.80 in 2010 . We noted then that many of the concepts in UMBEL were devoted to how to describe things and the units or quantities associated with their values. We could also see the potential value in having a reference for mapping data characteristics and values.
The first creation of the Attributes SuperType — also introduced in UMBEL v.0.80 in 2010 — aggregated into one place related OpenCyc concepts regarding these descriptors. Working with this category over time surfaced, again, the underlying coherence and use of OpenCyc. We found that UMBEL (via its OpenCyc extraction) already had a strong, logical undergirding to support an organized representation of attributes. Once we understood these patterns, we were able to go back to OpenCyc and better capture other aspects of its attribute structure that we had earlier overlooked. We then added a few aggregate categories to UMBEL to provide a cleaner organization. UMBEL now understands and organizes some 2000 different descriptive attributes.
Over a period of years we did research on exemplars in these areas, with the limited results as first mentioned, notably QUDT and SIO, and also DERA . We also enlisted input from the semantic Web mailing list and were not able to find a suitable extant reference structure . We find it perplexing more work has not been done in this area. We do abhor a vacuum!
Nonetheless, we were able to combine the 2000 attributes infrastructure of OpenCyc into the following upper level of the Attributes Ontology structure:AttributeValues StringObject StringDatatype_Unlimited List_Information FrequentlyAskedQuestionsList MailingList AlphabeticalList Index_List_Information BullettedFormat UnitOfMeasure UnitOfDistance InternationalUnitOfMeasure UnitOfMeasure_Common NaturalLanguage Encrypted AuthenticationSource Persistence Distribution Uniform_PersistenceDistribution UnitOfMeasureConcept Ratio CollectionType Phase EmptyCollection Preference Quantity AttachmentAttribute WrittenInfo StructuredInfo VisualInfo AudioInfo LogicalFieldAttribute TruthValue AttributeTypes DescriptiveAttributes Definition_PCW VisualPattern SpatialThingTypeByShape ShapeAttributes Color Name Title EnumeratedAttributes EconomicalQuantity DispositionalQuantity MentalQuantity PhysicalQuantity Quality SocialQuantity MeasurableQuantity TotallyOrderedQuantityType QuantityType NonAspectualQuantity EnvironmentalQuantity ActionAttributeLevelQuantity EmotionalQuantityType LocationAttributes OrientationAttributes GeographicalPlace MappableAttributes ContactLocation PopulatedPlace TimeAttributes HistoricTemporalThing Time_Quantity EventAttributes TimeInterval TemporalThing IdentificationAttributes ContactLocation ReferenceWork IDString UniqueID SituationAttributes Situation Upper Structure of the Attributes Ontology
Note the structure above roughly splits into two parts. The first, AttributeValues, captures the various ways and measures that may be applied to actual values. We foresee a key mapping to QUDT in this part. The second part of the structure, AttributeTypes, organizes the nature of various attributes into similar, logical categories.
We have also added some experimental predicates to the UMBEL vocabulary for mapping domains, ranges and specific external properties to reference attributes. See the ongoing specification in the UMBEL Annex L documentation for other pertinent details.
Though the Attributes Ontology has a bit more structure, it too is a module that segregates out specific attributes into its own files. About 2000 of the UMBEL reference concepts are tagged as attributes; about two-thirds of those, or 1275, are specific attributes that are assigned to the Attributes Ontology, which is also the container for the attributes module.
To our knowledge, the Attributes Ontology (AO) will be the first publicly released attempt to provide an explicit modeling framework for data attributes and values. We expect there to be hiccups and improvements to be made as we work with the system. We expect quite a few release iterations, and experimentation and change. We will retain an experimental designation of the new UMBEL properties and the Attributes Ontology itself until we gain better working comfort with the system.The Additional UMBEL Entities Module
This new UMBEL Attributes Ontology is being accompanied by the creation of another UMBEL component, the Entities Module. This new module, designed in a similar way to the Geo Module that was released in version 1.05, tags all entities as such and places another 12,000 instances into a separate module. A hierarchy of about 15,000 entity types (and their descriptions and relationships) remain in UMBEL core.
Like the Geo Module, itself comprised of entity instances, the Entities Module may be invoked or not for a given use of UMBEL. The ability to filter on entities and SuperTypes is also a powerful new feature. The fact that there is major disjunction among the SuperTypes also adds to the power of queries and retrievals.
Thus, with the attributes module that is now part of the Attributes Ontology, there are now three separate but invokable modules in addition to the UMBEL core. The Geo, Entities or Attributes modules may be included or not in any given UMBEL deployment.Pending Releases
After five years of sporadically intense thinking, Structured Dynamics is extremely pleased to first formally express our ideas about how to manage and model data and its attributes using the underlying machinery of semantic technologies. We welcome use and commentary on our approach and the Attributes Ontology.
We willl be releasing UMBEL v.1.20 by the end of March with various improvements, including the Entities Module and Attributes Ontology noted above. We are also updating the UMBEL documentation and have added Annexes K and L that describe the Clojure-based UMBEL generation process and the specifics underlying the Attributes Ontology . Shortly thereafter we expect to provide a new minor release that will provide mappings between the UMBEL Attributes Ontology and DBpedia and schema.org properties.
For the time being, we will be focused on refining our use of UMBEL for data interoperability, specifically for attributes. However, we note that the ontology structure used in this article also flags roles and relations as another possible gap. This gap is likely to be the next major focus in UMBEL’s research agenda. For example, the relative status of various ontology mapping efforts are covered, among others, in Fei Wu and Daniel S. Weld, 2008. “Automatically Refining the Wikipedia Infobox Ontology,” WWW 2008, April 21–25, 2008, Beijing, China; Lorena Otero-Cerdeira, Francisco J. Rodríguez-Martínez, and Alma Gómez-Rodríguez, 2015. “Ontology Matching: A Literature Review,” Expert Systems with Applications 42, no. 2 (2015): 949-971; and Marcin Pietranik and Ngoc Thanh Nguyen, 2011. “Attribute Mapping as a Foundation of Ontology Alignment,” N.T. Nguyen, C.-G. Kim, and A. Janiak (Eds.): ACIIDS 2011, LNAI 6591, pp. 455–465, 2011. Also, I also discuss the relative poor state of mapping predicates between entities in many articles. See, for example, commentary on sameAs in M.K. Bergman, 2011. “Making Connections Real,” in AI3:::Adaptive Information, January 31, 2011. See also reference  and the follow-on discussion in .  The basic approach to this stack diagram was suggested by a figure in Marcin Pietranik and Ngoc Thanh Nguyen, 2011. “Attribute Mapping as a Foundation of Ontology Alignment,” N.T. Nguyen, C.-G. Kim, and A. Janiak (Eds.): ACIIDS 2011, LNAI 6591, pp. 455–465, 2011.  W3C standards exist for RDF, RDFS and OWL; also, Common Logic and conceptual graphs provide higher-order capabilities. We use OWL 2 in our efforts. Some rationale for this choice is provided in M.K. Bergman, 2010. “Metamodeling in Domain Ontologies,” in AI3:::Adaptive Information, September 20, 2010.  One relevant effort, but which has not yet posted details or an ontology, is Fausto Giunchiglia and Biswanath Dutta, 2011. “DERA: A Faceted Knowledge Organization Framework,” Technical Report # DISI-11-457, University of Trento, March 2011; submitted to the International Conference on Theory and Practice of Digital Libraries 2011 (TPDL’2011).  When posted, the reference to the follow-on article will be listed here.  First posted in M.K. Bergman, 2014. “Big Structure and Data Interoperability,” in AI3:::Adaptive Information, August 18, 2014.  Concept is the shorthand used for the schema or classes or TBox. Attribute is the shorthand used for instance data or entities and their ABox. I segregate class-relation properties (predicates) from instance-describing properties (attributes).  There are more than 100 converters of various record and data structure types to RDF. These converters — also sometimes known as translators or ‘RDFizers’ — generally take some input data records with varying formats or serializations and convert them to a form of RDF serialization (such as RDF/XML or N3), often with some ontology matching or characterizations. See this listing of known RDFizers.  See M. K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” in AI3:::Adaptive Information, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Anothe way to say is it that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.  See this earlier (2010) version of Annex G: UMBEL SuperTypes Documentation to the UMBEL specifications.  See this thread on the linked open data (LOD) mailing list from July 2014.  See further the UMBEL Annex K: UMBEL Generator and UMBEL Annex L: Attributes Ontology and Version 1.20 to the UMBEL specifications (still being completed).
Distant supervision, earlier or alternatively called self-supervision or weakly-supervised, is a method to use knowledge bases to label entities automatically in text, which is then used to extract features and train a machine learning classifier. The knowledge bases provide coherent positive training examples and avoid the high cost and effort of manual labelling. The method is generally more effective than unsupervised learning, though with similar reduced upfront effort. Large knowledge bases such as Wikipedia or Freebase are often used as the KB basis.
The first acknowledged use of distant supervision was Craven and Kumlien in 1999 (#11 below, though they used the term weak supervision); the first use of the formal term distant supervision was in Mintz et al. in 2009 (#21 below). Since then, the field has been a very active area of research.
Here are forty of the more seminal papers in distant supervision, with annotated comments for many of them:
Something very broad and profound has been happening over the recent past. It is not something that can be tied to a single year or a single event. It is also something that is quite complex in that it is a matrix of forces, some causative and some derivative, all of which tend to reinforce one another to perpetuate the trend. The trend that I am referring to is openness, and it is a force that is both creative and destructive, and one that in retrospect is also inevitable given the forces and changes underlying it.
It is hard to gauge exactly when the blossoming of openness began, but by my lights the timing corresponds to the emergence of open source and the Internet. Early bulletin board systems (BBS) often were distributed with source code, and these systems foreshadowed the growth of the Internet. While the Internet itself may be dated to ARPA efforts from 1969, it is really more the development of the Web around 1991 that signaled the real growth of the medium.
Over the past quarter century, the written use of the term “open” has increased more than 40% in frequency in comparison to terms such as “near” or “close” , a pretty remarkable change in usage for more-or-less common terms, as this figure shows:
Though the idea of “openness” is less common than “open”, its change in written use has been even more spectacular, with its frequency more than doubling (112%) over the past 25 years. The change in growth slope appears to coincide with the mid-1980s.
Because “openness” is more of a mindset or force — a point of view, if you will — it is not itself a discrete thing, but an idea or concept. In contemplating this world of openness, we can see quite a few separate, yet sometimes related, strands that provide the weave of the “openness” definition :
In looking at the factors above, we can ask two formative questions. First, is the given item above primarly a causative factor for “openness” or something that has derived from a more “open” environment? And, second, does the factor have an overall high or low impact on the question of openness. Here is my own plotting of these factors against these dimensions:
Early expressions of the “openness” idea help cause the conditions that lead to openness in other areas. As those areas also become more open, a positive reinforcement is passed back to earlier open factors, all leading to a virtuous circle of increased openness. Though perhaps not strictly “open,” other various and related factors such as the democratization of knowledge, broader access to goods and services, more competition, “long tail” access and phenomenon, and in truly open environments, more diversity and more participation, also could be plotted on this matrix.
Once viewed through the umbrella lens of “openness”, it starts to become clear that all of these various “open” aspects are totally remaking information technology and human interaction and commerce. The impacts on social norms and power and governance are just as profound. Though many innovations have uniquely shaped the course of human history — from literacy to mobility to communication to electrification or computerization — none appear to have matched the speed of penetration nor the impact of “openness”.Separating the Chicken from the Egg
So, what is driving this phenomenon? From where did the concept of “openness” arise?
Actually, this same matrix helps us hypothesize one foundational story. Look at the question of what is causative and what might be its source. The conclusion appears to be the Internet, specifically the Web, as reinforced and enabled by open-source software.
Relatively open access to an environment of connectivity guided by standard ways to connect and contribute began to fuel still further connections and contributions. The positive values of access and connectivity via standard means, in turn, reinforced the understood value of “openness”, leading to still further connections and engagement. More openness is like the dropped sand grain that causes the entire sand dune to shift.
The Web with its open access and standards has become the magnet for open content and data, all working to promote derivative and reinforcing factors in open knowledge, education and government:
The engine of “openness” tends to reinforce the causative factors that created “openness” in the first place. More knowledge and open aspects of collaboration lead to still further content and standards that lead to further open derivatives. In this manner “openness” becomes a kind of engine that promotes further openness and innovation.
There is a kind of open logic (largely premised on the open world assumption) that lies at the heart of this engine. Since new connections and new items are constantly arising and fueling the openness engine, new understandings are constantly being bolted on to the original starting understandings. This accretive model of growth and development is similar to the depositive layers of pearls or the growth of crystals. The structures grow according to the factors governing the network effect , and the nature of the connected growth structures may be represented and modeled as graphs. “Openness” appears to be a natural force underlying the emerging age of graphs .Openness is Both Creative and Destructive . . .
“Openness”, like the dynamism of capitalism, is both creative and destructive . The effects are creative — actually transformative — because of the new means of collaboration that arise based on the new connections between new understandings or facts. “Open” graphs create entirely new understandings as well as provide a scaffolding for still further insights. The fire created from new understandings pulls in new understandings and contributions, all sucking in still more oxygen to keep the innovation cycle burning.
But the creative fire of openness is also destructive. Proprietary software, excessive software rents, silo’ed and stovepiped information stores, and much else are being consumed and destroyed in the wake of openness. Older business models — indeed, existing suppliers — are in the path of this open conflagration. Private and “closed” solultions are being swept before the openness firestorm. The massive storehouse of legacy kindling appears likely to fuel the openness flames for some time to come.
“Openness” becomes a form of adaptive life, changing the nature, value and dynamics of information and who has access to it. Though much of the old economy is — and, will be — swept away in this destructive fire, new and more fecund growth is replacing it. From the viewpoint of the practitioner on the ground, I have not seen a more fertile innovation environment in information technology in more than thirty years of experience.. . . and Seemingly Inevitable
Once the proper conditions for “openness” were in place, it now seems inevitable that today’s open circumstances would unfold. The Internet, with its (generally) open access and standards, was a natural magnet to attract and promote open-source software and content. A hands-off, unregulated environment has allowed the Internet to innovate, grow, and adapt at an unbelievable rate. So much unconnected dry kindling exists to stoke the openness fire for some time to come.
Of course, coercive state regimes can control the Internet to varying degrees and have limited innovation in those circumstances. Also, any change to more “closed” and less “open” an Internet may also act over time to starve the openness fire. Examples of such means to slow openness include imposing Internet regulation, limiting access (technically, economically or by fiat), moving away from open standards, or limiting access to content. Any of these steps would starve the innovation fire of oxygen.Adapting to the Era of Openness
The forces impelling openness are strong. But, these observations certainly provide no proof for cause-and-effect. The correspondence of “openness” to the Internet and open source may simply be coincidence. But my sense suggests a more causative role is likely. Further, these forces are strong, and are sweeping before them much in the way of past business practices and proprietary methods.
In all of these regards “openness” is a woven cord of forces changing the very nature and scope of information available to humanity. “Openness”, which has heretofore largely lurked in the background as some unseeing force, now emerges as a criterion by which to judge the wisdom of various choices. “Open” appears to contribute more and be better aligned with current forces. Business models based on proprietary methods or closed information generally are on the losing side of history.
For these forces to remain strong and continue to contribute material benefits, the Internet and its content in all manifestations needs to remain unregulated, open and generally free. The spirit of “open” remains just that, and dependent on open and equal access and rights to the Internet and content. The data is from Google book trends data based on this query (inspect the resulting page source to obtain the actual data); the years 2009 to 2014 were projected based on prior actuals to 1980; percentage term occurrences were converted to term frequencies by 1/n.  All links and definitions in this section were derived from Wikipedia.  See M.K. Bergman, 2014. “The Value of Connecting Things – Part I: A Foundation Based on the Network Effect,” AI3:::Adaptive Information blog, September 2, 2014.  See M.K. Bergman, 2012. “The Age of the Graph,” AI3:::Adaptive Information blog, August 12, 2012; and John Edward Terrell, Termeh Shafie and Mark Golitko, 2014. “How Networks Are Revolutionizing Scientific (and Maybe Human) Thought,” Scientific American, December 12, 2014.  Creative destruction is a term from the economist Joseph Schumpeter that describes the process of industrial change from within whereby old processes are incessantly destroyed and replaced by new ones, leading to a constant change of economic firms that are winners and losers.
It is not unusual to see articles touting one programming language or listing the reasons for choosing another, but they are nearly always written from the perspective of the professional developer. As an executive with much experience directing software projects, I thought a management perspective could be a useful addition to the dialog. My particular perspective is software development in support of knowledge management, specifically leveraging artificial intelligence, semantic technologies, and data integration.
Context is important in guiding the selection of programming languages. C is sometimes a choice for performance reasons, such as in data indexing or transaction systems. Java is the predominant language for enterprise applications and enterprise-level systems. Scripting languages are useful for data migrations and translations and quick applications. Web-based languages of many flavors help in user interface development or data exchange. Every one of the hundreds of available programming languages has a context and rationale that is argued by advocates.
We at Structured Dynamics have recently made a corporate decision to emphasize the Clojure language in the specific context of knowledge management development. I’d like to offer our executive-level views for why this choice makes sense to us. Look to the writings of SD’s CTO, Fred Giasson, for arguments related to the perspective of the professional developer.Some Basic Management Premises
Languages wax and wane in popularity, and market expectations and requirements shift over time. Twenty years ago, Java brought forward a platform-independent design well-suited for client-server computing, and was (comparatively) quickly adopted by enterprises. At about the same time Web developments added browser scripting languages to the mix. Meanwhile, hardware improvements overcame many previous performance limitations in favor of easier to use syntaxes and tooling. No one hardly programs for an assembler anymore. Sometimes, like Flash, circumstances and competition may lead to a rapid (and unanticipated) abandonment.
The fact that such transitions naturally occur over time, and the fact that distributed and layered architectures are here to stay, has led to my design premise to emphasize modularity, interfaces and APIs . From the browser, client and server sides we see differential timings of changes and options. It is important that piece parts be able to be swapped out in favor of better applications and alternatives. Irrespective of language, architectural design holds the trump card in adaptive IT systems.
Open source has also had a profound influence on these trends. Commercial and product offerings are no longer monolithic and proprietary. Rather, modern product development is often more based on assembly of a diversity of open source applications and libraries, likely written in a multitude of languages, which are then assembled and “glued” together, often with a heavy reliance on scripting languages. This approach has certainly been the case with Structured Dynamics’ Open Semantic Framework, but OSF is only one example of this current trend.
The trend to interoperating open source applications has also raised the importance of data interoperability (or ETL in various guises) plus reconciling semantic heterogeneities in the underlying schema and data definitions of the contributing sources. Language choices increasingly must recognize these heterogeneities.
I have believed strongly in the importance of interest and excitement by the professional developers in the choice of programming languages. The code writers — be it for scripting or integration or fundamental app development — know the problems at hand and read about trends and developments in programming languages. The developers I have worked with have always been the source of identifying new programming options and languages. Professional developers read much and keep current. The best programmers are always trying and testing new languages.
I believe it is important for management within software businesses to communicate anticipated product changes within the business to its developers. I believe it is important for management to signal openness and interest in hearing the views of its professional developers in language trends and options. No viable software development company can avoid new upgrades of its products, and choices as to architecture and language must always be at the forefront of version planning.
When developer interest and broader external trends conjoin, it is time to do serious due diligence about a possible change in programming language. Tooling is important, but not dispositive. Tooling rapidly catches up with trending and popular new languages. As important to tooling is the “fit” of the programming language to the business context and the excitement and productivity of the developers to achieve that fit.
“Fitness” is a measure of adaptiveness to a changing environment. Though I suppose some of this can be quantified — as it can in evolutionary biology — I also see “fit” as a qualitative, even aesthetic, thing. I recall sensing the importance of platform independence and modularity in Java when it first came out, results (along with tooling) that were soon borne out. Sometimes there is just a sense of “rightness” or alignment when contemplating a new programming language for an emerging context.
Such is the case for us at Structured Dynamics with our recent choice to emphasize Clojure as our core development language. This choice does not mean we are abandoning our current code base, just that our new developments will emphasize Clojure. Again, because of our architectural designs and use of APIs, we can make these transitions seamlessly as we move forward.
However, for this transition, unlike prior ones I have made noted above, I wanted to be explicit as to the reasons and justifications. Presenting these reasons for Clojure is the purpose of this article.Brief Overview of Clojure
Clojure is a relatively new language, first released in 2007 . Clojure is a dialect of Lisp, explicitly designed to address some Lisp weaknesses in a more modern package suitable to the Web and current, practical needs. Clojure is a functional programming language, which means it has roots in the lamdba calculus and functions are “first-class citizens” in that they can be passed as arguments to other functions, returned as values, or assigned as variables in data structures. These features make the language well suited to mathematical manipulations and the building up of more complicated functions from simpler ones.
Clojure was designed to run on the Java Virtual Machine (JVM) (now expanded to other environments such as ClojureScript and Clojure CLR) rather than any specific operating system. It was designed to support concurrency. Other modern features were added in relation to Web use and scalability. Some of these features are elaborated in the rationales noted below.
As we look to the management reasons for selecting Clojure, we can really lump them into two categories: a) those that arise mostly from Lisp, as the overall language basis; and b) specific aspects added to Clojure that overcome or enhance the basis of Lisp.Reasons Deriving from Lisp
Lisp (defined as a list processing language) is one of the older computer languages around, dating back to 1958, and has evolved to become a family of languages. “Lisp” has many variants, with Common Lisp one of the most prevalent, and many dialects that have extended and evolved from it.
Lisp was invented as a language for expressing algorithms. Lisp has a very simple syntax and (in base form) comparatively few commands. Lisp syntax is notable (and sometimes criticized) for its common use of parentheses, in a nested manner, to specify the order of operations.
Lisp is often associated with artificial intelligence programming, since it was first specified by John McCarthy, the acknowledged father of AI, and was the favored language for many early AI applications. Many have questioned whether Lisp has any special usefulness to AI or not. Though it is hard to point to any specific reason why Lisp would be particularly suited to artificial intelligence, it does embody many aspects highly useful to knowledge management applications. It was these reasons that caused us to first look at Lisp and its variants when we were contemplating language alternatives:
Clojure was invented by Rich Hickey, who knew explicitly what he wanted to accomplish leveraging Lisp for new, more contemporary uses . (Though some in the Lisp community have bristled against the idea that dialects such as Common Lisp are not modern, the points below really make a different case.) Some of the design choices behind Clojure are unique and quite different from the Lisp legacy; others leverage and extend basic Lisp strengths. Thus, with Clojure, we can see both a better Lisp, at least for our stated context, and one designed for contemporary environments and circumstances.
Here are what I see as the unique advantages of Clojure, again in specific reference to the knowledge management context:
I’m sure had this article been written from a developer’s perspective, different emphases and different features would have arisen. There is no perfect programming language and, even if there were, its utility would vary over time. The appropriateness of program languages is a matter of context. In our context of knowledge management and artificial intelligence applications, Clojure is our due diligence choice from a business-level perspective.
There are alternatives to the points raised herein, like Scheme, Erlang or Haskell. Scala offers some of the same JVM benefits as noted. Further, tooling for Clojure is still limited (though growing), and it requires Java to run and develop. Even with extensions and DSLs, there is still the initial awkwardness of learning Lisp’s mindset.
Yet, ultimately, the success of a programming language is based on its degree of use and longevity. We are already seeing very small code counts and productivity from our use of Clojure. We are pleased to see continued language dynamism from such developments as Transit  and transducers . We think many problem areas in our space — from data transformations and lifting, to ontology mapping, and then machine learning and AI and integrations with knowledge bases, all under the control of knowledge workers versus developers — lend themselves to Clojure DSLs of various sorts. We have plans for these DSLs and look forward to contribute them to the community.
We are excited to now find an aesthetic programming fit with our efforts in knowledge management. We’d love to see Clojure become the go-to language for knowledge-based applications. We hope to work with many of you in helping to make this happen. I have also been involved with the development of two new languages, Promula and VIQ, and conducted due diligence on C#, Ruby and Python, but none of these languages were ultimately selected.  Native apps on smartphones are likely going through the same transition.  As of the date of this article, Clojure is in version 1.60.  See M. K. Bergman, 2009. ” The Open World Assumption: Elephant in the Room,” December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems. OWA is a formal logic assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. OWA is used in knowledge representation to codify the informal notion that in general no single agent or observer has complete knowledge, and therefore cannot make the closed world assumption. The OWA limits the kinds of inference and deductions an agent can make to those that follow from statements that are known to the agent to be true. OWA is useful when we represent knowledge within a system as we discover it, and where we cannot guarantee that we have discovered or will discover complete information. In the OWA, statements about knowledge that are not included in or inferred from the knowledge explicitly recorded in the system may be considered unknown, rather than wrong or false. Semantic Web languages such as OWL make the open world assumption. Also, you can search on OWA on this blog.  Paul Graham, 1993. “Programming Bottom-Up,” is a re-cap on Graham’s blog related to some of his earlier writings on programming in Lisp. By “bottom up” Graham means “. . . changing the language to suit the problem. . . . Language and program evolve together.”  A really nice explanation of this approach is in James Donelan, 2013. “Code is Data, Data is Code,” on the Mulesoft blog, September 26, 2013.  Rich Hickey is a good public speaker. Two of his seminal videos related to Clojure are “Are We There Yet?” (2009) and “Simple Made Easy” (2011).  My Sweet Tools listing of knowledge management software is dominated by Java, with about half of all apps in that language.  See the Clojure EDN; also Transit, EDN’s apparent successor.  structEDN is a straightforward RDF serialization in EDN format.  For transducers in Clojure version 1.70, see this Hickey talk, “Transducers” (2014).
Structured Dynamics has just released version 3.1 of the Open Semantic Framework and announced the update of the OSF Web site. OSF is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is made available under the Apache 2 license.
Enhancements to OSF version 3.1 include:
OSF version 3.1 is available for download from GitHub.
More details on the release can be found on Frédérick Giasson’s blog. Fred is OSF’s lead developer. William (Bill) Anderson also made key contributions to this release.