The Ontolog Forum, the online community of practice for ontologists, has just announced a public online meeting on Cognonto. I will be giving the roughly 60 min presentation, to be followed by 30 min of open discussion, on December 7 at 9:30am Pacific / 12:30pm Eastern / 6:30pm CEST / 5:30pm GMT / 1730 UTC. The public is invited.
According to the announcement:
KBpedia is a recently announced knowledge structure that integrates six major knowledge bases (OpenCyc, Wikipedia, Wikidata, GeoNames, DBpedia, UMBEL) under the KBpedia Knowledge Ontology (KKO). KBpedia’s explicit purpose is to provide a foundation for knowledge-based artificial intelligence by supporting the (nearly) automatic creation of training corpuses and positive and negative training sets and feature sets for deep, unsupervised and supervised machine learning. KKO is the upper ontology for KBpedia, and is guided by the universal categories (Firstness, Secondness, Thirdness) of Charles S. Peirce. [This talk] will discuss what KBpedia is, how it is organized and constructed, and why KKO offers some new approaches to vexing metaphysical questions in ontology design related to the knowledge representation of entities, relations, attributes, concepts, and natural kinds. The discussion period will hopefully highlight next potentials and important open questions.
I hope to see you there!
Dataversity has just published an article on Cognonto based on an interview of me and review of our online materials. The article, “Cognonto Takes On Knowledge-Based Artificial Intelligence,” does an excellent job of summarizing the venture. The writer, Jennifer Zaino, did a fantastic job capturing our discussions and framing the Cognonto story. Thanks!
I especially like that the article begins with the simple words, knowledge-based artificial intelligence, which is the quintessential description of what Cognonto is about. Jennifer then goes on to explain the genesis of the venture, the central role of its knowledge structure KBpedia, and the basis of this knowledge graph grounded in the triadic logic of the 19th century philosopher, Charles Sanders Peirce.
We will use this article for months to come in helping others understand what we are doing and why. It is always refreshing to see intelligent, well-written journalism. Thanks, Jenny.
Cognonto today published two new use cases on how to further leverage KBpedia, its knowledge structure that integrates six major knowledge bases (Wikipedia, Wikidata, OpenCyc, GeoNames, DBpedia and UMBEL), plus mappings to another 20 leading knowledge vocabularies. KBpedia provides a foundation for knowledge-based artificial intelligence (KBAI) by supporting the (nearly) automatic creation of training corpuses and positive and negative training sets and feature sets for deep, unsupervised and supervised machine learning.
The two new use cases are in: 1) dynamic tests and refinements of machine learners enabled by KBpedia’s fast creation of training sets and corpuses and reference (‘gold’) standards; and 2) KBpedia’s unique aspects that provide context for various entity types. With these two additions, Cognonto has now published six diverse use cases:
Each use case is summarized according to the problem and our approach to solving it and the benefits that result. The use cases themselves present general workflows and code snippets for how the use case was tackled.
We will continue to publish use cases using Cognonto’s technologies and KBpedia as they arise.
I am pleased to point to a new invited article, “Wrestling Knowledge into Computable Intelligence,” published today on ODBMS.org. My article provides a high-level summary of recent trends in knowledge-based artificial intelligence and the mindsets and designs necessary to move KBAI forward. I think the article provides a pretty good summary (if I say so myself!) of the approach we take at Cognonto.
I’d like to thank Roberto Zicari for the invite to write in a style and brevity not typical of my normal articles. Enjoy!
Cognonto today released version 1.10 of its KBpedia knowledge structure. KBpedia integrates six major knowledge bases (Wikipedia, Wikidata, OpenCyc, GeoNames, DBpedia and UMBEL), plus mappings to another 20 leading knowledge vocabularies, under the KBpedia Knowledge Ontology (KKO). KBpedia’s explicit purpose is to provide a foundation for knowledge-based artificial intelligence (KBAI) by supporting the (nearly) automatic creation of training corpuses and positive and negative training sets and feature sets for deep, unsupervised and supervised machine learning.
This new release focused on two major updates. First, certain aspects of the upper structure of the KKO were streamlined. And, second, KBpedia’s core typologies, which capture the overwhelming majority of reference concepts that are classified as entity types, were further organized to create tighter taxonomic structures.
The upper portion of the KBpedia knowledge graph required cleanup because it was still using some of the abstract-tangible distinctions used in Cyc. These distinctions were no longer used with the adoption of the universal categories of Charles S. Peirce (see my earlier article for more on this architectural design). This cleanup resulted in removing nearly 25% of the upper level links from the prior version (which were superfluous to the disjoint design of KBpedia). The typology organizations are part of an ongoing effort to streamline and tighten these structures.
Last week Cognonto’s CTO, Fred Giasson, described the general build processes we have in place for KBpedia. This release is another example of that process in action.
KBpedia contains nearly 40,000 reference concepts (RCs) and about 20 million entities. The combination of these and KBpedia’s structure results in over 6 billion logical connections across the system, as these KBpedia statistics show:Measure Value No KBpedia reference concepts (RCs) 39,052 No. mapped vocabularies 27 Core knowledge bases 6 Extended vocabularies 21 No. mapped classes 138,987 Core knowledge bases 137,322 Extended vocabularies 1,665 No. typologies (SuperTypes) 63 Core entity types 33 Other core types 5 Extended 25 Typology assignments 372,967 No. of “triples” in KBpedia ontology 1,347,818 No. aspects 80 Direct entity assignments 68,026,551 Inferred entity aspects 204,704,905 No. unique entities 19,643,718 Inferred no of entity mappings 2,541,684,526 Total no. of “triples” 3,689,849,183 Total no. of inferred and direct assertions 6,251,177,427 KBpedia v. 1.10 Statistics
This release of KBpedia is part of an ongoing series of releases to improve and extend the knowledge structure, as well as to increase its mappings to still additional external vocabularies. You can inspect the upper portions of the KBpedia knowledge graph on the Cognonto Web site. Also, if you have an ontology editor, you can download and inspect the open source KKO directly.About Cognonto
The insight behind Cognonto is that existing knowledge bases can be staged to automate much of the tedium and reduce the costs now required to set up and train machine learners for knowledge purposes. Cognonto’s mission is to make knowledge-based artificial intelligence (KBAI) cheaper, repeatable, and applicable to enterprise needs.
Cognonto (a portmanteau of ‘cognition’ and ‘ontology’) exploits large-scale knowledge bases and semantic technologies for machine learning, data interoperability and mapping, and fact and entity extraction and tagging. Cognonto puts its insight into practice through a knowledge structure, KBpedia, designed to support AI, and a management framework, the Cognonto Platform, for integrating enterprise and external data to gain the advantage of KBpedia’s structure.
Cognonto automates away much of the tedium and reduces costs in many areas. Cognonto offers a number of use cases for how the Cognonto Platform and KBpedia in combination with enterprise information assets may be applied.
Knowledge is inherently dynamic and constantly changing. We learn new things; make connections between things that were previously hidden; revise our understandings in light of new discoveries; and embrace new domain relationships and facts. In the case of Cognonto‘s knowledge graph, KBpedia, and its six major contributing knowledge bases (KBs) and mappings to a further 20 ontologies, this dynamism takes place at warp speed. This dynamism is evident by simply noting the thousands of changes daily in each of Wikipedia and Wikidata, two of KBpedia’s major KBs.
Cognonto’s services are based on three capabilities. The first is KBpedia, which we have discussed elsewhere. The second is the Cognonto Platform, the means for accessing and using KBpedia in conjunction with enterprise or domain information. And the third are building and testing routines, scripts, logs and processes. It is the latter by which we keep KBpedia current, and is an essential infrastructure to our entire suite of services.
Cognonto’s CTO, Frédérick Giasson, has today published on LinkedIn an overview article on the principal components within this build and testing infrastructure. This is significant, but largely hidden, work. We have honed this infrastructure over a period of years, and are continuously adding to our roster of scripts and procedures.
What is remarkable about this infrastructure is the speed with which we can completely rebuild KBpedia from scratch (less than two hours) and the shortness of the entire cycle of producing a new major version of the system (less than two weeks). This infrastructure is all the more impressive when one considers that KBpedia has and maps to hundreds of thousands of concepts, millions of entities, and billions of assertions. Yet, despite this complexity, each new build of KBpedia is logically consistent, satisfiable, and coherent. Our build and testing scripts are what help ensure this quality.
Fred’s article explains this infrastructure in greater detail. Our build and testing infrastructure brings essential stability to Cognonto’s overall offerings. Great work, Fred!
We have expanded the search function for Cognonto’s KBpedia knowledge graph. When first released last month, the KBpedia search was limited to reference concepts (RCs) only. With today’s upgrade, all of KBpedia’s 20 million entities can now be searched.
Though you can start investigating the Knowledge Graph (KG) simply by clicking links, to really discover items of specific interest you will need to do a search. These search functions for the Knowledge Graph are described below.Two Search Options
The Knowledge Graph (KG) may be searched for either reference concepts (RCs) or entities.
On the main KG page (and for all other pages in the KG section except for actual results pages), the search box has a dropdown list function next to the search button. Via this dropdown, you may select to search either Reference Concepts or Entities. Depending on your selection, that choice also shows in the title to the search box. Whatever choice you make in the dropdown list is retained until you select a different option. The default option when you first encounter the Knowledge Graph is to search Reference Concepts.
Once your search specification is entered in the search box, you must click the Search button to invoke the actual search.Reference Concepts (RC) Search
If you pick the Reference Concepts main search option, there are two different behaviors available to you.With Autocompletion
Note our search box has Reference Concepts as its title. As you type in the search box, the system’s autocompletion will return RC candidates in a dropdown list. Only preferred labels (the canonical “name”) and terminal URI strings will match for autocomplete. Semsets and terms in the RCs description will not match.
Each newly entered character in the search box narrows the possible matching results. If your query string ceases prompting with RC candidates, there are no matches for the current substring query on preferred labels or URI fragments in the system.
There are some styling and naming conventions used to assign RC preferred labels and URI fragments you may observe over time that may make finding the right RC query more effective. Realize, however, that there are multiple ways RCs can be named and referenced. If autocomplete does not match what you think the RC name might be, try next the search without autocompletion (see next).
If you do get matches to the query string, you will be presented with one or more live link options to specific RCs in the dropdown list box. Pick the option you are seeking to go to the RC Record for your specific reference concept of interest.Without Autocompletion
The search without autocompletion is a broader one, and is often useful when you have been unable to formulate an effective query for the preferred label of an RC of interest. To conduct a search without autocompletion, simply provide a query string in the search box without picking one of the autocompletion dropdown prompts (should they occur). Clicking the Search Concepts button will take you to a standard search of the concepts in KBpedia. Doing so brings you to a paginated set of search results pages, with 20 results per page.
Once you are on a results listing page, your search options change. Again you are given a dropdown list box, whereby you may restrict the actual concept search to one of these fields:
When you pick a result from the search list, you are taken to the RC Record report (see the How to Use the Knowledge Graph page).Entities Search
The standard search box gives you a dropdown where you can choose to conduct an Entities or Reference Concepts search (see first figure above). If you choose Entities, that is so indicated in the label to the search button.
When the Entities search is chosen, you have a choice to restrict the actual search to one of these fields:
Whichever of these four choices you make, the selection appears as the title to the search box and remains in effect until you change the dropdown option. The default when you first invoke the Entities search is ‘All’.
To actually conduct the search, you need to click on the Search Entities button.
Depending on which optional field you selected, the results count will vary. Obviously, the largest number of results arise from the ‘All’ field choice.Entities Search Results
Search results for entities generally presents fewer details in the results than for RCs, as this figure shows:
Some results are limited to a title (prefLabel). Others may include altLabels or descriptions, depending on the nature of the record. Each result has an icon to the upper right indicating the source of the entity record. Clicking that icon (background highlighted text) presents the standard entity results listing as described on the How to Use the Knowledge Graph page.
Cognonto, our recently announced venture in knowledge-based artificial intelligence (KBAI), has just published three use cases. Two of these use cases are based on extending KBpedia with enterprise or domain data. KBpedia is the KBAI knowledge structure at the heart of Cognonto.
The Cognonto Web site contains longer descriptions of these use cases, with statistics and results where appropriate. We intend to continue to publish more use cases. Notable ones will be broadly announced.Use Case #1: Word Embedding Corpuses
word2vec is an artificial intelligence ‘word embedding’ model that can establish similarities between terms. These similarities can be used to cluster or classify documents by topic, or to characterize them by sentiment, or for recommendations. The rich structure and entity types within Cognonto’s KBpedia knowledge structure can be used, with one or two simple queries, to create relevant domain “slices” of tens of thousands of documents and entities upon which to train word2vec models. This approach eliminates the majority of effort normally associated with word2vec for domain purposes, enabling available effort to be spent on refining the parameters of the model for superior results.
Some key findings are:
KBpedia provides a rich set of 20 million entities in its standard configuration. However, by including relevant entity lists, which may already be in the possession of the enterprise or from specialty domain datasets, significant improvements can be achieved across all of the standard metrics used for entity recognition and tagging. Here is an example of the standard metrics applied by Cognonto in its efforts:
Cognonto’s standard methodology also includes the creation of reference, or “gold standards”, for measuring the benefits of adding more data or performing other tweaks on the entity extraction algorithms.
Some key findings from this use case in adding private data to KBpedia include:
The Cognonto Mapper includes standard baseline capabilities found in other mappers such as string and label comparisons, attribute comparisons, and the like. But, unlike conventional mappers, the Cognonto Mapper is able to leverage both the internal knowledge graph structure and its use of typologies (most of which do not overlap with one another) to add structural comparators as well. These capabilities lead to more automation at the front end of generating good, likely mapping candidates, leading to faster acceptance by analysts of the final mappings. This approach is in keeping with Cognonto’s philosophy to emphasize “semi-automatic” mappings that combine fast final assignments with the highest quality. Maintaining mapping quality is the sine qua non of knowledge-based artificial intelligence.
Some key findings from this use case are:
See the original use case links for further details, code examples, and results and statistics. As noted, we will announce additional use cases as they are published.
Every knowledge structure used for knowledge representation (KR) or knowledge-based artificial intelligence (KBAI) needs to be governed by some form of conceptual schema. In the semantic Web space, such schema are known as “ontologies”, since they attempt to capture the nature or being (Greek ὄντως, or ontós) of the knowledge domain at hand. Because the word ‘ontology’ is a bit intimidating, a better variant has proven to be the knowledge graph (because all semantic ontologies take the structural form of a graph). In Cognonto‘s KBAI efforts, we tend to use the terms ontology and knowledge graph interchangeably.
Cognonto uses the KBpedia Knowledge Ontology (KKO) as its upper conceptual schema. KKO is the structure by which Cognoto’s hundreds of thousands of reference concepts and millions of entities are organized . This article presents an overview and rationale for this KKO structure. Subsequent articles will delve into specific aspects of KKO where warranted.A Grounding in Peirce’s Triadic Logic
The upper structure of the KBpedia Knowledge Ontology (KKO) is informed by the triadic logic and basic categories of Charles Sanders Peirce. If the relation of this triadic design to knowledge representation appears a bit opaque, please refer to the introduction of this series in the prior article, The Irreducible Truth of Threes. The key point is that ‘threes’ are the fewest by which to model context and perspective, essential to capture the nature of knowledge.
Peirce’s triadic logic, or trichotomy, is also the basis for his views on semiosis (or the nature of signs). The three constituents of Peirce’s trichotomy, what he called simply the Three Categories, were in his view the most primitive or reduced manner by which to understand and categorize things, concepts and ideas. Peirce’s Three Categories can be roughly summarized as:
Understanding, inquiry and knowledge require this irreducible structure; connections, meaning and communication depend on all three components, standing in relation to one another and subject to interpretation by multiple agents. (Traditional classification schemes have a dyadic or dichotomous nature, which does not support the richer views of context and interpretation inherent in the Peircean view.)
Peirce argues persuasively that how we perceive and communicate things requires this irreducible triadic structure. The symbolic nature of Thirdness means that communication and understanding is a continuous process of refinement, getting us closer to the truth, but never fully achieving it. Thirdness is a social and imprecise mode of communication and discovery, conducted by us and other agents separate from the things and phenomena being observed. Though it is a fallibilistic process, it is one that also lends itself to rigor and methods. The scientific method is a premier example of Thirdness in action.
What constitutes the potentials, realized particulars, and generalizations that may be drawn from a query or investigation is contextual in nature. That is why the mindset of Peirce’s triadic logic is a powerful guide to how to think about and organize the things and ideas in our world (that is, knowledge representation). Peirce’s triadic logic and views on categorization are fractal in nature. We can apply this triadic logic to any level of information granularity.
Thus, KKO applies this mindset to organizing its knowledge graph. At each level in the KKO upper structure, we strive to organize each category according to the ideas of Firstness (1ns), Secondness (2ns) and Thirdness (3ns), as shown in the upper KKO structure below.Basic Structural Considerations
Now armed with a basic conceptual and logical grounding, what are the main kinds of distinctions we want to capture in our knowledge structure? Since our purpose is to provide a means for integrating knowledge bases (KBs) of use to artificial intelligence (AI), or KBAI, the answer to this question resides in: 1) what conceptual distinctions are captured by the constituent KBs; and 2) what kinds of work (AI) we want to do with the structure.
The answers to these questions help us to define the basic vocabulary of our knowledge base, what Peirce called its speculative grammar . This base vocabulary of KKO is thus:
We will be talking about specific vocabulary items above in subsequent articles. One important distinction to draw for now, however, is the split between attributes and relations. In standard RDF and OWL ontologies these are lumped together as properties. In OWL, there is the further distinction of datatype and object properties, but these do not quite capture the difference we desire. In KBpedia, attributes are the descriptions or characteristics of a given entity (or its type); relations are the roles, connections or subsumptions between objects .
How these vocabulary terms relate to one another and the overall KBpedia knowledge structure is shown by this diagram:
Note that the three columns of this figure correspond to the three categories of potentials (1ns, left column), particulars (2ns, middle column) and generals (3ns, right column) described above. In terms of KBpedia, all of the instances of the knowledge structure, now numbering over 20 million in the standard version, are affiliated with the middle column (particulars). The classification aspects of KBpedia reside in the right column (generals). The reasoning aspects of KBpedia largely reside in the left and right columns (though reasoning work is also done on selecting and aggregating instances in the middle column) .Below the Upper Structure are Typologies
More than 85% of the classification structure of KBpedia resides in the generals, or types, in the rightmost column. These, in turn, are organized according to a set of typologies, or natural classification structures. Unlike the KKO upper structure, each typology is not necessarily organized according to Peirce’s triadic logic. That is because once we come to organize and classify the real things in the world, we are dealing with objects of a more-or-less uniform character (such as animals or products or atomic elements).
There are about 80 such typologies in the KBpedia structure, about 30 of which are deemed “core”, meaning they capture the bulk of the classificatory system. Another document presents these 30 “core” typologies in more detail.
I have written elsewhere  about the basis for “natural” classification systems. These approaches, too, are drawn from Peirce’s writings. Natural classifications may apply to truly “natural” things, like organisms and matter, but also to man-made objects and social movements and ideas. The key argument is that shared attributes, including a defining kind of “essence” (Aristotle) or “final cause” (Peirce) help define the specific class or type to which an object may belong. For Peirce, what science has to tell us, or what social consensus settles upon, holds sway.
If accomplished well, natural classification systems lend themselves to hierarchical structures that may be reasoned over. Further, if the splits between typologies are also done well, then it is also possible to establish non-overlapping (“disjoint”) relationships between typologies that provide powerful restriction and selection capabilities across the knowledge structure. We believe KBpedia already achieves these objectives, though we continue to refine the structure based on our mappings to other external systems and other logical tests.The KKO Upper Structure
We now have the pieces in hand to construct the full KKO upper structure. Here is the upper structure of KKO with its 144 concepts:Predica [1ns] Ontics [1ns] Qualities [1ns] Physical [2ns] Being [1ns] One == Haeccity [1ns] True [2ns] Good [3ns] Form [2ns] Structure [3ns] Conceptual [3ns] Absolute [1ns] SimpleRelative [2ns] Conjugative [3ns] Attribuo [2ns] Identity [1ns] Haeccity == One [1ns] Nature [2ns] Beingness [1ns] Real [2ns] Matter [1ns] SubstantialForm [2ns] AccidentalForm [3ns] Fictional [3ns] Quiddity [3ns] Intensional [2ns] Conjunctive [3ns] Quantity [1ns] Values [1ns] Numbers [1ns] Multitudes [2ns] Magnitudes [3ns] Discrete [2ns] Continuous [3ns] Roles [2ns] Typical [3ns] Relatio [3ns] Subsumption [1ns] Similar [2ns] LogicalConnection [3ns] Unary [1ns] Binary [2ns] Conditional [3ns] Particulars [2ns] Entities [1ns] SingleEntities [1ns] Objects [1ns] States [2ns] Events [3ns] PartOfEntities [2ns] Members [1ns] Parts [2ns] FunctionalComponents [3ns] ComplexEntities [3ns] CollectiveStuff [1ns] MixedStuff [2ns] CompoundEntities [3ns] Indices [2ns] Indicators [1ns] Associations [2ns] Annotations [3ns] Selectional [1ns] Referential [2ns] Directional [3ns] Continua [3ns] Space [1ns] Points [1ns] Areas [2ns] 2D Dimensions SpaceRegions [3ns] 3D Dimensions Time [2ns] Instants [1ns] Intervals [2ns] Events [3ns] Duratives [3ns] Situations [1ns] Activities [2ns] Processes [3ns] Generals [3ns] (== SuperTypes) SignElements [1ns] AttributeTypes [1ns] RelationTypes [2ns] SituationTypes Symbols [3ns] Primitives [1ns] Structures [2ns] Conventions [3ns] Constituents [2ns] NaturalPhenomena [1ns] SpaceTypes [2ns] Shapes [1ns] Places [2ns] LocationPlace AreaRegion Forms [3ns] TimeTypes [3ns] Times [1ns] EventTypes [2ns] ActivityTypes [3ns] Manifestations [3ns] NaturalMatter [1ns] AtomsElements [1ns] NaturalSubstances [2ns] Chemistry [3ns] OrganicMatter [2ns] OrganicChemistry [1ns] BiologicalProcesses LivingThings [2ns] Prokaryotes [1ns] Eukaryotes [2ns] ProtistsFungus [1ns] Plants [2ns] Animals [3ns] Diseases [3ns] Agents [3ns] Persons [1ns] Organizations [2ns] Geopolitical [3ns] Symbolic [3ns] Information [1ns] AVInfo [1ns] VisualInfo AudioInfo WrittenInfo [2ns] StructuredInfo [3ns] Artifacts [2ns] FoodDrink Drugs Products Facilities Systems [3ns] MentalProcesses [1ns] Concepts [1ns] TopicsCategories [2ns] LearningProcesses [3ns] SocialProcesses [2ns] FinanceEconomy Society Methodeutic [3ns] InquiryMethods [1ns] KnowledgeDomains [2ns] EmergentKnowledge [3ns]
Where appropriate, each entry is also labeled with one of the Three Categories (1ns, 2ns, 3ns). Note that all of the typologies are shown under the main Generals (3ns) category. The main “core” typologies are shown in orange. (Note that TopicsCategories and KnowledgeDomains are big typologies, but are not disjoint in any way. Shapes is also a big typology, but about half of all entities have that type.)A Graph View of the Structure
We can differently view this structure through a graph view of the KKO structure, also showing the “core” typologies in orange. As might be expected, the KKO “core” ontologies tend to occur at the periphery of this graph:
Of course, the purpose of all of this design is to provide a coherent, consistent, logical structure over which Cognonto may reason and link external data and schema. We will be discussing elsewhere the specific use cases and applications of this structure. For now, we wanted to set the stage for the design basis for KKO. This article will be a common reference to those subsequent discussions.
The KKO upper structure may be downloaded and inspected in greater detail. For an introduction to Cognonto and KBpedia, its knowledge structure, see M.K. Bergman, 2016. “Cognonto is on the Hunt for Big AI Game“, AI3:::Adaptive Information blog, September 20, 2016.  See M.K. Bergman, 2016. “A Speculative Grammar for Knowledge Bases“, AI3:::Adaptive Information blog, June 20, 2016.  Attributes, Relations and Annotations comprise OWL properties. In general, Attributes correspond to the OWL datatypes property; Relations to the OWL object property; and Annotations to the OWL annotation property. These specific OWL terms are not used in our speculative grammar, however, because some attributes may be drawn from controlled vocabularies, such as colors or shapes, that can be represented as one of a list of attribute choices. In these cases, such attributes are defined as object properties. Nonetheless, the mappings of our speculative grammar to existing OWL properties is quite close.  As I earlier wrote, “Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.” In the diagram above, the middle column represents particulars, or the ABox components. The definition of items and the first and third columns represent TBox components.  See M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web“, AI3:::Adaptive Information blog, July 13, 2015.
What does the idea of knowledge mean to you?
The entry for knowledge on Wikipedia says:
“Knowledge is a familiarity, awareness or understanding of someone or something, such as facts, information, descriptions, or skills, which is acquired through experience or education by perceiving, discovering, or learning. Knowledge can refer to a theoretical or practical understanding of a subject. It can be implicit (as with practical skill or expertise) or explicit (as with the theoretical understanding of a subject); it can be more or less formal or systematic. “
OK, that’s a lot of words. Rather than parse its specifics, let’s look at the basis of “knowledge” using a variety of simple examples.Perplexing Perspectives and Confusing Contexts
Let’s take for an example the statement, the sky is blue. We can accept this as a factual statement (thus, assumed knowledge). But, we also know that the sky might be dark or black, if it is night. Or the sky may be gray if it is cloudy. Indeed, when we hear the statement that the sky is blue, if we believe the source or can see the sky for ourselves, then we can readily infer that the observation is occurring during daylight, under a clear sky. Our acceptance of an assertion as factual or being true carries with it the implications of its related contexts. On the other hand, were the simple statement to be le ciel est bleu, and if we did not know French, we would not know what to make of the statement, true or false, with context or not, even if all of the assertions are still correct.
This simple example carries with it two profound observations. First, context helps to determine whether we believe or not a given statement, and if we believe it, what the related context implied by the statement might be. Second, how this information is conveyed to us is via symbols — in this case, the English language, but applicable to all human and artificial and formal notations like mathematics as well — which we may or may not be able to interpret correctly. If I am monolingual in English and I see French statements, I do not know what the symbols mean.
Knowledge is that which is “true” in a given context or perspective. Is it a duck, or is it a rabbit? Knowledge may reside solely in our own minds, and not be part of “common knowledge”. But, ultimately, even personal beliefs not held by others only become “knowledge” that we can rely upon in our discourse once others have “acknowledged” the truth. Forward-looking thinkers like Copernicus or Galileo or Einstein may have understood something in their own minds not yet shared by others, but we do not “acknowledge” those understandings as knowledge until we can share and discuss the insight. (That is, what scientists would call independent verification.) In this manner, knowledge, like language and symbol-creation, is inherently a social phenomena. If I coin a new word, but no one else understands what I am saying, that is not part of knowledge; that is gibberish.
OK, let’s take another example. This time we’ll take the simple case of a flower. What this panel of images shows is how a composite flower may be seen by first humans, then bees and then butterflies:How Humans, Bees and Butterflies See A Daisy
In this example , we are highlighting the fact that different insects see objects with different wavelengths of light than we (humans) do. Bees see much more in the ultraviolet spectrum. The daisy flower knows how to attract this most important pollinator.
In a different example focused on human perception alone, look at these two panels on the left:Color Blind, on left; What is ‘bank’?, on right
The middle picture shows us how the color-blind person “sees”, with reds and yellows washed out .
Or, let’s take another example, this case the black-and-white word for ‘bank’. We can see this word, and if we speak English, even recognize it, but what does this symbol mean? A financial institution? The shore of a river? Turning an airplane? A kind of pool shot? Tending a fire for the evening?
In all of these examples, there is an actual object that is the focus of attention. But what we “know” about this object depends on what we perceive or understand and who or what is doing the perceiving and the understanding. We can never fully “know” the object because we can never encompass all perspectives and interpretations.KR Models Need to Represent Knowledge via Context and Perspective
Every knowledge structure used for knowledge representation (KR) or knowledge-based artificial intelligence (KBAI) needs to be governed by some form of conceptual schema. In the semantic Web space, such schema are known as ontologies, since they attempt to capture the nature or being (Greek ὄντως, or ontós) of the knowledge domain at hand. Because the word ‘ontology’ is a bit intimidating, a better variant has proven to be the knowledge graph (because all semantic ontologies take the structural form of a graph). In Cognonto‘s KBAI efforts, we tend to use the terms ontology and knowledge graph interchangeably.
In general knowledge domains, such schema are also known as upper ontologies. However, one of the first things we see with existing ontologies is that they tend to be organized around a single, dyadic dimension, even though guided by a diversity of conceptual approaches. In the venerable Cyc knowledge structure, one of the major divisions is between what is tangible and what is intangible. In BFO, the Basic Formal Ontology, the split is between a “snapshot” view of the world (continuant) and its entities versus a “spanning” view that is explicit about changes in things over time (occurrent). Other upper ontologies have different dyadic splits, such as abstract v. physical, perduant v. endurant, dependent v. independent, particulars v. universals, or determinate v. indeterminate . I’m sure there are others.
Ontologies are designed for specific purposes, and the bases for these splits in other ontologies have their rationales and uses. But in Cognonto’s case of needing to design an ontology whose specific purpose is knowledge representation, we need to explicitly model the nature of knowledge. Knowledge is not black and white, nor is it shades of gray along a single dimension. Knowledge is an incredibly rich construct intimately related to context and perspective. The minimum cardinality that can provide such perspective is three.
If we return to the examples that began this article, we begin to see the interaction of three separate things. We have the actual thing itself, be it an object or a phenomenon. It is what is is. Then, we have a way that that thing is conveyed or represented. It might be an image, a sound, a perception, a finger pointing at it, or a symbol (or combination of symbols such as a description) of it. Then we have how that representation is perceived. It is in the interplay of these three separate things that something is “understood” or becomes “knowledge” (that is, a sign). This triadic view of the world was first articulated by Charles Sanders Peirce (1839-1914) (pronounced “purse”), the great American logician, philosopher and polymath.
Peirce’s logic of signs in fact is a taxonomy of sign relations, in which signs get reified and expanded via still further signs, ultimately leading to communication, understanding and an approximation of “canonical” truth. Peirce saw the scientific method as itself an example of this process .
A given sign is a representation amongst the triad of the sign itself (which Peirce called a representamen, the actual signifying item that stands in a well-defined kind of relation to the two other things), its object and its interpretant. The object is the actual thing itself. The interpretant is how the agent or the perceiver of the sign understands and interprets the sign. Depending on the context and use, a sign (or representamen) may be either an icon (a likeness), an indicator or index (a pointer or physical linkage to the object) or a symbol (understood convention that represents the object, such as a word or other meaningful signifier).
An interpretant in its barest form is a sign’s meaning, implication, or ramification. For a sign to be effective, it must represent an object in such a way that it is understood and used again. This makes the assignment and use of signs a community process of understanding and acceptance , as well as a truth-verifying exercise of testing and confirming accepted associations (such as the meanings or words or symbols).
The key aspect of signs for Peirce, though, is the ongoing process of interpretation and reference to further signs, a process he called semiosis. A sign of an object leads to interpretants, which, as signs, then lead to further interpretants.Relation to Cognonto’s KBpedia Knowledge Ontology
The essence of knowledge is that it is ever-growing and expandable. New insights bring new relations and new truths. The structures we use to represent this knowledge must themselves adapt and reflect the best of our current, testable understandings. Peirce saw the trichotomous parts of his sign logic as the fewest “decomposable” needed to model the real world; we would call these “primitives” in modern terminology. Robert Burch has called Peirce’s ideas of “indecomposability” the ‘Reduction Thesis’ . The basic thesis is that ternary relations suffice to construct any and all arbitrary relations, but that not all relations can be constructed from unary and binary relations alone. Threes are irreducible to capture the basis of knowledge.
With its express purpose to provide a sound basis for modeling knowledge, essential to knowledge-based artificial intelligence, Cognonto’s governing schema, the KBpedia Knowledge Ontology (KKO), is the first knowledge graph to explicitly embrace this triadic logic. Later articles will discuss KKO in much greater detail. Peirce’s logic of semiosis and his three universal categories provide the missing perspective of classing and categorizing the world around us. The irreducible truth of ‘threes’ is the essential foundation for representing knowledge and language. Similar senses are conveyed by the Wiktionary definition of knowledge.  From Photography of the Invisible World (http://photographyoftheinvisibleworld.blogspot.com/2013/11/today-more-about-my-last-surviving.html).  See http://allday.com/post/3919-this-is-what-its-actually-like-to-be-colorblind/ color_blind.png.  See, for example, Ludger Jansen, 2008. “Categories: The Top-level Ontology,” Applied ontology: An introduction (2008): 173-196, and Nicola Guarino, 1997. “Some Organizing Principles For A Unified Top-Level Ontology,” National Research Council, LADSEB-CNR Int. Report, V3.0, August 1997.  M.K. Bergman, 2012. “Give Me a Sign: What Do Things Mean on the Semantic Web?“, AI3:::Adaptive Information blog, January 24, 2012.  See further Catherine Legg, 2010. “Pragmaticsm on the Semantic Web,” in Bergman, M., Paavola, S., Pietarinen, A.-V., & Rydenfelt, H. eds., Ideas in Action: Proceedings of the Applying Peirce Conference, pp. 173–188. Nordic Studies in Pragmatism 1. Helsinki: Nordic Pragmatism Network.  See Robert Burch, 1991. A Peircean Reduction Thesis: The Foundations of Topological Logic, Texas Tech University Press, Lubbock, TX. Peirce’s reduction thesis is never stated explicitly by Peirce, but is alluded to in numerous snippets.
Fred Giasson and I today announced the unveiling of a new venture, Cognonto. We have been working on this venture very hard for at least the past two years. But, frankly, Cognonto represents bringing into focus ideas and latent opportunities that we have been seeing for much, much longer.
The fundamental vision for Cognonto is to organize the information in large-scale knowledge bases so as to efficiently support knowledge-based artificial intelligence (KBAI), a topic I have been writing about much over the past year. Once such a vision is articulated, the threads necessary to bring it to fruition come into view quickly. First, of course, the maximum amount of information possible in the source knowledge bases needs to be made digital and represented with semantic Web technologies such as RDF and OWL. Second, since no source alone is adequate, the contributing knowledge bases need to be connected and made to work with one another in a logical and consistent manner. And, third, an overall schema needs to be put in place that is coherent and geared specifically to knowledge representation and machine learning.
The result from achieving these aims is to greatly lower the time and cost to prepare inputs to, and improve the accuracy in, machine learning. This result applies particularly to supervised machine learning for knowledge-related applications. But, if achieved, the resulting rich structure and extensive features also lend themselves to unsupervised and deep learning, as well as to provide a powerful substrate for schema mapping and data interoperability.
Today, we’ve now made sufficient progress on this vision to enable us to release Cognonto, and the KBpedia knowledge structure at its core. Combined with local data and schema, there is much we can do with the system. But another exciting part is that the sky is the limit in terms of honing the structure, growing it, and layering more AI applications upon it. Today, with Cognonto’s release, we begin that process.
You can begin to see the power and the structure yourself via Cognonto’s online demo, as shown above, which showcases a portion of the system’s functionality.Problem and Opportunity
Artificial intelligence (AI) and machine learning are revolutionizing knowledge systems. Improved algorithms and faster graphics chips have been contributors. But the most important factor in knowledge-based AI’s renaissance, in our opinion, has been the availability of massive digital datasets for the training of machine learners.
Wikipedia and data from search engines are central to recent breakthroughs. Wikipedia is at the heart of Siri, Cortana, the former Freebase, DBpedia, Google’s Knowledge Graph and IBM’s Watson, to name just a prominent few AI question answering systems. Natural language understanding is showing impressive gains across a range of applications. To date, all of these examples have been the result of bespoke efforts. It is very expensive for standard enterprises to leverage these knowledge resources on their own.
Today’s practices pose significant upfront and testing effort. Much latent knowledge remains unexpressed and not easily available to learners; it must be exposed, cleaned and vetted. Further upfront effort needs to be spent on selecting the features (variables) used and then to accurately label the positive and negative training sets. Without “gold standards” — at still more cost — it is difficult to tune and refine the learners. The cost to develop tailored extractors, taggers, categorizers, and natural language processors is simply too high.
So recent breakthroughs demonstrate the promise; now it is time to systematize the process and lower the costs. The insight behind Cognonto is that existing knowledge bases can be staged to automate much of the tedium and reduce the costs now required to set up and train machine learners for knowledge purposes. Cognonto’s mission is to make knowledge-based artificial intelligence (KBAI) cheaper, repeatable, and applicable to enterprise needs.
Cognonto (a portmanteau of ‘cognition’ and ‘ontology’) exploits large-scale knowledge bases and semantic technologies for machine learning, data interoperability and mapping, and fact and entity extraction and tagging. Cognonto puts its insight into practice through a knowledge structure, KBpedia, designed to support AI, and a management framework, the Cognonto Platform, for integrating external data to gain the advantage of KBpedia’s structure. We automate away much of the tedium and reduce costs in many areas, but three of the most important are:
KBpedia is a computable knowledge structure resulting from the combined mapping of six, large-scale, public knowledge bases — Wikipedia, Wikidata, OpenCyc, GeoNames, DBpedia and UMBEL. The KBpedia structure separately captures entities, attributes, relations and topics. These are classed into a natural and rich diversity of types, with their meaning and relationships logically and coherently organized. This diagram, one example from the online demo, shows the topics captured for the main Cognonto page in relation to the major typologies within KBpedia:
Each of the six knowledge bases has been mapped and re-expressed into the KBpedia Knowledge Ontology. KKO follows the universal categories and logic of the 19th century American mathematician and philosopher, Charles Sanders Peirce, the subject of my last article. KKO is a computable knowledge graph that supports inference, reasoning, aggregations, restrictions, intersections, and other logical operations. KKO’s logic basis provides a powerful way to represent individual things, classes of things, and how those things may combine or emerge as new knowledge. You can inspect the upper portions of the KKO structure on the Cognonto Web site. Better still, if you have an ontology editor, you can download and inspect the open source KKO directly.
KBpedia contains nearly 40,000 reference concepts (RCs) and about 20 million entities. The combination of these and KBpedia’s structure results in nearly 7 billion logical connections across the system, as these KBpedia statistics (current as of today’s version 1.02 release) show:Measure Value No KBpedia reference concepts (RCs) 38,930 No. mapped vocabularies 27 Core knowledge bases 6 Extended vocabularies 21 No. mapped classes 138,868 Core knowledge bases 137,203 Extended vocabularies 1,665 No. typologies (SuperTypes) 63 Core entity types 33 Other core types 5 Extended 25 Typology assignments 545,377 No. aspects 80 Direct entity assignments 88,869,780 Inferred entity aspects 222,455,858 No. unique entities 19,643,718 Inferred no of entity mappings 2,772,703,619 Total no. of “triples” 3,689,849,726 Total no. of inferred and direct assertions 6,482,197,063 First Release KBpedia Statistics
About 85% of the RCs are themselves entity types — that is, 33,000 natural classes of similar entities such as ‘astronauts’ or ‘breakfast cereals’ — which are organized into about 30 “core” typologies that are mostly disjoint (non-overlapping) with one another. KBpedia has extended mappings to a further 20 other vocabularies, including schema.org, Dublin Core, and others; client vocabularies are typical additions. The typologies provide a flexible means for slicing-and-dicing the knowledge structure; the entity types provide the tie-in points to KBpedia’s millions of individual instances (and for your own records). KBpedia is expressed in the semantic Web languages of OWL and RDF. Thus, most W3C standards may be applied against the KBpedia structure, including for linked data, a standard option.
KBpedia is purposefully designed to enable meaningful splits across any of its structural dimensions — concepts, entities, relations, attributes, or events. Any of these splits — or other portions of KBpedia’s rich structure — may be the computable basis for training taggers, extractors or classifiers. Standard NLP and machine learning reference standards and statistics are applied during the parameter-tuning and learning phases. Multiple learners and recognizers may also be combined as different signals to an ensemble approach to overall scoring. Alternatively, KBpedia’s slicing-and-dicing capabilities may drive export routines to use local or third-party ML services under your own control.
Though usable in a standalone mode, only slices of KBpedia may be applicable to a given problem or domain, which then most often need to be extended with local data and schema. Cognonto has services to incorporate your own domain and business data, critical to fulfill domain purposes and to respond to your specific needs. We transform your external and domain data into KBpedia’s canonical forms for interacting with the overall structure. Such data may include other public databases, but also internal, customer, product, partner, industry, or research information. Data may range from unstructured text in documents to semi-structured tags or metadata to spreadsheets or fully structured databases. The formats of the data may span hundreds of document types to all flavors of spreadsheets and databases.Platform and Technology
Cognonto’s modular technology is based on Web-oriented architectures. All functionality is exposed via Web services and programmatically in a microservice design. The technology for Cognonto resides in three inter-related areas:
The Cognonto Web services may be manipulated directly from the command line or via cURL calls, or by simple HTML interfaces, by SPARQL, or programmatically. The Web services are written in Clojure and follow literate programming practices.
There is a lot going on with many results panels and with links throughout the structure. There is a ‘How to’? for the knowledge graph if you really want to get your hands dirty.
These platform, technology, and knowledge structure capabilities combine to enable us to offer services across the full spectrum of KBAI applications, including:
Cognonto is a foundation for doing serious knowledge-based artificial intelligence.Today and Tomorrow
Despite the years we have been working on this, it very much feels like we are at the beginning. There is so much more that can be done.
First, we need to continue to wring out errors and mis-assignments in the structure. We estimate an accuracy error rate of 1-2% currently, but that still represents millions of potential errors. The objective is not to be more accurate than alternatives, which we already are, but to be the most effective foundation possible for training machine learners. Further cleaning will result in still better standards and mappings. Throughout the interactive knowledge graph we have a button for submitting errors; please so submit if you see any problems!
Second, we are seeing the value of exposing structure, and the need to keep doing so. Each iteration of structure gets easier, because prior ones may be applied to automate much of the testing and vetting effort for the subsequent ones. Structure provides the raw feature (variable) grist used by machine learners. We have a very long punch list of where we can effectively add more structure to KBpedia.
And, last, we need to extend the mappings to more knowledge bases, more vocabularies, and more schema. This kind of integration is really what smooths the way to data integration and interoperability. Virtually every problem and circumstance requires including local and external information.
We know there are many important uses — and an upside of potential — for codifying knowledge bases for AI and machine learning purposes. Drop me a line if you’d like to discuss how we can help you leverage your own domain and business data using knowledge-based AI.
Many of us involved in semantic technologies or information science grapple with the question of categorization. How do we provide a coherent organization of the world that makes sense? Better still, how might we represent this coherent structure in a manner that informs how we can extend or grow our knowledge domains? Most problems of a practical nature require being able to combine information together so as to inform new knowledge. Categories that bring together (generalize) similar things are a key way to aid that.
Embracing semantic technologies means, among standards and other things, that the natural structural representation of domains is the graph. These are formally specified using either RDF or OWL. These ontologies have objects as nodes, and properties between those nodes as edges. I believe in this model, and have worked for at least a decade to promote its use. It is the model used by Google’s knowledge graph, for example.
Knowledge graphs that are upper ontologies typically have 80% to 85% of their nodes acting to group similar objects, mostly what could be axiomatized as ‘classes’ or ‘types’. This realization naturally shifts focus to, then, how are these groups formed? What are the bases to place multiple instances into a given class? Are types the same things as classes?
Knowledge, inherently open and dynamic, can only be used for artificial intelligence when it is represented by structures readable by machines. Digitally readable structures of knowledge and features are essential for machine learning, natural language understanding, or other AI functions. Indeed, were such structures able to be expressed in a mostly automatic way, the costs and efforts to perform AI and natural language processing and understanding functions (NLP and NLU) would be greatly lessened.
Open and dynamic also means that keeping the knowledge base current requires simple principles to educate and train those charged with keeping the structure up to date. Nothing is perfect, humans or AI. Discovery and truth only result from questioning and inspection. The entire knowledge graph is fallible and subject to growth and revision. Human editors — trained and capable — are essential to maintain the integrity of such structures, automation or AI not withstanding. Fundamentally, then, the challenge becomes how to think simply about grouping things and forming categories. Discovery of simplicity is hard without generalization and deep thought.A Peircean View in Thirdness
Scholars of Charles Sanders Peirce (“purse”) (1839 – 1913)  all acknowledge how infused his writings on logic, semiosis, philosophy, and knowledge are with the idea of “threes”. His insights are perhaps most studied with respect to his semiosis of signs, with the triad formed by object, representation, and interpretation. But Peirce recognized many prior philosophers, particularly Kant and Hegel, had also made “threes” a cornerstone of their views. Peirce studied and wrote on what makes “threes” essential and irreducible. His generalization, or abstraction if you will, he called simply the Three Categories, and to reflect their fundamental nature, called each separately as Firstness, Secondness and Thirdness. In his writings over decades, he related or described this trichotomy in dozens of contexts .
Across his voluminous writings, which unfortunately are not all available since they are still being transcribed from tens of thousands of original handwritten notes, I glean from the available materials this understanding of his three categories from a knowledge representation standpoint:
Understanding, inquiry and knowledge require this irreducible structure; connections, meaning and communication depend on all three components, standing in relation to one another and subject to interpretation by multiple agents (Peirce’s semiosis of signs). Contrast this Peircean view with traditional classification schemes, which have a dyadic or dichotomous nature and do not support such rich views of context and interpretation.
Peirce’s “surprising fact” is new knowledge that emerges from anomalies observed when attempting to generalize or to form habits. Abductive reasoning, a major contribution by Peirce, attempts to probe why the anomaly occurs. The possible hypotheses so formed constitute the Firstness or potentials of a new categorization (identification of particulars and generalization of the phenomena). The scientific method is grounded in this process and reflects the ideal of this approach (what Peirce called the “methodeutic”).Peirce at a High Altitude
Significant terms we associate with knowledge and its discovery include open, dynamic, process, representation, signification, interpretation, logic, coherence, context, reality, and truth. These were all topics of Peirce’s deep inquiry and explained by him via his triadic world view. For example, Peirce believed in the real as having existence apart from the mind (a refutation of Descartes’ view). He believed there is truth, that it can be increasingly revealed by the scientific method and social consensus (agreement of signs), but current belief as to what is “truth” is fallible and can never be realized in the absolute (it is a limit function). There is always distance and different interpretation between the object, its representation, and its interpretation. But this same logic provides the explanation for the process of categorization, also grounded in Firstness, Secondness and Thirdness .
Of course, some Peircean scholars may rightfully see these explanations as a bit of a cartoon, and a possible injustice to his lifetime of work. For more than 100 years philosophers and logicians have tried to plumb Peirce’s insights and writings. This summary by no means captures many subtleties. But, if we ourselves generalize across Peirce’s writings and his application of the Three Categories, we can gain a mindset that, I submit, is both easily grasped and applied, the result of which is a logical, coherent approach to categorization and knowledge representation.
First, we decide the focus of the categorization effort. That may arise from one of three sources. We either are trying to organize a knowledge domain anew; we are splitting an existing category that has become too crowded and difficult to reason over; or we have found a “surprising fact” or are trying to plumb an anomaly. Any of these can trigger the categorization process (and, notice, they are in 1ns, 2ns and 3ns splits). The breadth or scope of the category is based on the domain and the basis of the categorization effort.
How to think about the new category and decide its structure comes from the triad:
What constitutes the potentials, realized particulars, and generalizations that may be drawn from a query or investigation is contextual in nature. I outlined more of the categorization process in an earlier article .
Peirce’s triadic logic is a powerful mindset for how to think about and organize the things and ideas in our world. Peirce’s triadic logic and views on categorization are fractal in nature. We can apply this triadic logic to any level of information granularity. The graph structure arises from the connections amongst all of these 1ns, 2ns and 3ns factors.
We will be talking further how this 40,000 ft view of the Peircean mindset helps create practical knowledge graphs and ontological structures. We will also be showing an example suitable for knowledge-based artificial intelligence (KBAI). The exciting point is that we have found a simple grounding of three aspects that is logically sound and can be readily trained. We also will be showing how we can do so much more work against this kind of natural KBAI structure.
Stay tuned. A tremendous starting point for information on Peirce is the category about him on Wikipdia, starting with his eponymous page.  M.K. Bergman, 2016. “A Foundational Mindset: Firstness, Secondness, Thirdness,” AI3:::Adaptive Information blog, March 21, 2016.