Feed aggregator

New, Major Upgrade of UMBEL Released

AI3:::Adaptive Information (Mike Bergman) - Wed, 05/11/2016 - 13:55
Version 1.50 Fully Embraces a Typology Design, Gets Other Computability Improvements

The year since the last major release of UMBEL (Upper Mapping and Binding Exchange Layer) has been spent in a significant re-think of how the system is organized. Four years ago, in version 1.05, we began to split UMBEL into a core and a series of swappable modules. The first module adopted was in geographical information; the second was in attributes. This design served us well, but it was becoming apparent that we were on a path of multiple modules. Each of UMBEL’s major so-called ‘SuperTypes‘ — that is, major cleavages of the overall UMBEL structure that are largely disjoint from one another, such as between Animals and Facilities — were amenable to the module design. This across-the-board potential cleavage of the UMBEL system caused us to stand back and question whether a module design alone was the best approach. Ultimately, after much thought and testing, we adopted instead a typology design that brought additional benefits beyond simple modularity.

Today, we are pleased to announce the release of these efforts in UMBEL version 1.50. Besides standard release notes, this article discusses this new typology design, and explains its uses and benefits.

Basic UMBEL Background

The Web and enterprises in general are characterized by growing, diverse and distributed information sources and data. Some of this information resides in structured databases; some resides in schema, standards, metadata, specifications and semi-structured sources; and some resides in general text or media where the content meaning is buried in unstructured form. Given these huge amounts of information, how can one bring together what subsets are relevant? And, then for candidate material that does appear relevant, how can it be usefully combined or related given its diversity? In short, how does one go about actually combining diverse information to make it interoperable and coherent?

UMBEL thus has two broad purposes. UMBEL’s first purpose is to provide a general vocabulary of classes and predicates for describing and mapping domain ontologies, with the specific aim of promoting interoperability with external datasets and domains. UMBEL’s second purpose is to provide a coherent framework of reference subjects and topics for grounding relevant Web-accessible content. UMBEL presently has about 34,000 of these reference concepts drawn from the Cyc knowledge base, organized into 31 mostly disjoint SuperTypes.

The grounding of information mapped by UMBEL occurs by common reference to the permanent URIs (identifiers) for UMBEL’s concepts. The connections within the UMBEL upper ontology enable concepts from sources at different levels of abstraction or specificity to be logically related. Since UMBEL is an open source extract of the OpenCyc knowledge base, it can also take advantage of the reasoning capabilities within Cyc.

Diagram showing linked data datasets. UMBEL is near the hub, below and to the right of the central DBpedia.

UMBEL’s vocabulary is designed to recognize that different sources of information have different contexts and different structures, and meaningful connections between sources are not always exact. UMBEL’s 34,000 reference concepts form a knowledge graph of subject nodes that may be related to external classes and individuals (instances and entities). Via this coherent structure, we gain some important benefits:

  • Mapping to other ontologies — disparate and heterogeneous datasets and ontologies may be related to one another by mapping to the UMBEL structure
  • A scaffolding for domain ontologies — more specific domain ontologies can be made interoperable by using the UMBEL vocabulary and tieing their more general concepts into the UMBEL structure
  • Inferencing — the UMBEL reference concept structure is coherent and designed for inferencing, which supports better semantic search and look-ups
  • Semantic tagging — UMBEL, and ontologies mapped to it, can be used as input bases to ontology-based information extraction (OBIE) for tagging text or documents; UMBEL’s “semsets” broaden these matches and can be used across languages
  • Linked data mining — via the reference ontology, direct and related concepts may be retrieved and mined and then related to one another
  • Creating computable knowledge bases — with complete mappings to key portions of a knowledge base, say, for Wikipedia articles, it is possible to use the UMBEL graph structure to create a computable knowledge source, with follow-on benefits in artificial intelligence and KB testing and improvements, and
  • Categorizing instances and named entities — UMBEL can bring a consistent framework for typing entities and relating their descriptive attributes to one another.

UMBEL is written in the semantic Web languages of SKOS and OWL 2. It is a class structure used in linked data, along with other reference ontologies. Besides data integration, UMBEL has been used to aid concept search, concept definitions, query ranking, ontology integration, and ontology consistency checking. It has also been used to build large ontologies and for online question answering systems [1].

Including OpenCyc, UMBEL has about 65,000 formal mappings to DBpedia, PROTON, GeoNames, and schema.org, and provides linkages to more than 2 million Wikipedia pages (English version). All of its reference concepts and mappings are organized under a hierarchy of 31 different SuperTypes, which are mostly disjoint from one another. Development of UMBEL began in 2007. UMBEL was first released in July 2008. Version 1.00 was released in February 2011.

Summary of Version 1.50 Changes

These are the principal changes between the last public release, version 1.20, and this version 1.50. In summary, these changes include:

  • Removed all instance or individual listings from UMBEL; this change does NOT affect the punning used in UMBEL’s design (see Metamodeling in Domain Ontologies)
  • Re-aligned the SuperTypes to better support computability of the UMBEL graph and its resulting disjointedness
  • These SuperTypes were eliminated with concepts re-assigned: Earthscape, Extraterrestrial, Notations and Numbers
  • These new SuperTypes were introduced: AreaRegion, AtomsElements, BiologicalProcesses, Forms, LocationPlaces, and OrganicChemistry, with logically reasoned assignments of RefConcepts
  • The Shapes SuperType is a new ST that is inherently non-disjoint because it is shared with about half of the RefConcepts
  • The Situations is an important ST, overlooked in prior efforts, that helps better establish context for Activities and Events
  • Made re-alignments in UMBEL’s upper structure and introduced additional upper-level categories to better accommodate these refinements in SuperTypes
  • A typology was created for each of the resulting 31 disjoint STs, which enabled missing concepts to be identified and added and to better organize the concepts within each given ST
  • The broad adoption of the typology design for all of the (disjoint) SuperTypes also meant that prior module efforts, specifically Geo and Attributes, could now be made general to all of UMBEL. This re-integration also enabled us to retire these older modules without affecting functionality
  • The tests and refinements necessary to derive this design caused us to create flexible build and testing scripts, documented via literate programming (using Clojure)
  • Updated all mappings to DBpedia, Wikipedia, and schema.org
  • Incorporated donated mappings to five additional LOV vocabularies [2]
  • Tested the UMBEL structure for consistency and coherence
  • Updated all prior UMBEL documentation
  • Expanded and updated the UMBEL.org Web site, with access and demos of UMBEL.
UMBEL’s SuperTypes

The re-organizations noted above have resulted in some minor changes to the SuperTypes and how they are organized. These changes have made UMBEL more computable with a higher degree of disjointedness between SuperTypes. (Note, there are also organizational SuperTypes that work largely to aid the top levels of the knowledge graph, but are explicitly designed to NOT be disjoint. Important SuperTypes in this category include Abstractions, Attributes, Topics, Concepts, etc. These SuperTypes are not listed below.)

UMBEL thus now has 31 largely disjoint SuperTypes, organized into 10 or so clusters or “dimensions”:

Constituents Natural Phenomena Area or Region Location or Place Shapes Forms Situations Time-related Activities Events Times Natural Matter Atoms and Elements Natural Substances Chemistry Organic Matter Organic Chemistry Biochemical Processes Living Things Prokaryotes Protists & Fungus Plants Animals Diseases Agents Persons Organizations Geopolitical Artifacts Products Food or Drink Drugs Facilities Information Audio Info Visual Info Written Info Structured Info Social Finance & Economy Society

These disjoint SuperTypes provide the basis for the typology design described next.

The Typology Design

After a few years of working with SuperTypes it became apparent each SuperType could become its own “module”, with its own boundaries and hierarchical structure. Since across the UMBEL structure nearly 90% of the reference concepts are themselves entity classes, if these are properly organized, we can achieve a maximum of disjointness, modularity, and reasoning efficiency. Our early experience with modules pointed the way to a design for each SuperType that was as distinct and disjoint from other STs as possible. And, through a logical design of natural classes [3] for the entities in that ST, we could achieve a flexible, ‘accordion-like’ design that provides entity tie-in points from the general to the specific for each given SuperType. The design is effective for being able to interoperate across both fine-grained and coarse-grained datasets. For specific domains, the same design approach allows even finer-grained domain concepts to be effectively integrated.

All entity classes within a given SuperType are thus organized under the SuperType itself as the root. The classes within that ST are then organized hierarchically, with children classes having a subClassOf relation to their parent. Each class within the typology can become a tie-in point for external information, providing a collapsible or expandable scaffolding (the ‘accordion’ design). Via inferencing, multiple external sources may be related to the same typology, even though at different levels of specificity. Further, very detailed class structures can also be accommodated in this design for domain-specific purposes. Moreover, because of the single tie-in point for each typology at its root, it is also possible to swap out entire typology structures at once, should design needs require this flexibility.

We have thus generalized the earlier module design to where every (mostly) disjoint SuperType now has its own separate typology structure. The typologies provide the flexible lattice for tieing external content together at various levels of specificity. Further, the STs and their typologies may be removed or swapped out at will to deal with specific domain needs. The design also dovetails nicely with UMBEL’s build and testing scripts. Indeed, the evolution of these scripts via literate programming has also been a reinforcing driver for being able to test and refine the complete ST and typologies structure.

Still a Work in Progress

Though UMBEL retains its same mission as when the system was first formulated nearly a decade ago, we also see its role expanding. The two key areas of expansion are in UMBEL’s use to model and map instance data attributes and in acting as a computable overlay for Wikipedia (and other knowledge bases). These two areas of expansion are still a work in progress.

The mapping to Wikipedia is now about 85% complete. While we are testing automated mapping mechanisms, because of its central role we also need to vet all UMBEL-Wikipedia mapping assignments. This effort is pointing out areas of UMBEL that are over-specified, under-specified, and sometimes duplicative or in error. Our goal is to get to a 100% coverage point with Wikipedia, and then to exercise the structure for machine learning and other tests against the KB. These efforts will enable us to enhance the semsets in UMBEL as well as to move toward multilingual versions. This effort, too, is still a work in progress.

Despite these desired enhancements, we are using all aspects of UMBEL and its mappings to both aid these expansions and to test the existing mappings and structure. These efforts are proving the virtuous circle of improvements that is at the heart of UMBEL’s purposes.

Where to Get UMBEL and Learn More

The UMBEL Web site provides various online tools and Web services for exploring and using UMBEL. The UMBEL GitHub site is where you can download the UMBEL Vocabulary or the UMBEL Reference Concept ontology, both under a Creative Commons Attribution 3.0 license. Other documents and backup are also available from that location.

Technical specifications for UMBEL and its various annexes are available from the UMBEL wiki site. You can also download a PDF version of the specifications from there. You are also welcomed to participate on the UMBEL mailing list or LinkedIn group.

[1] See further https://en.wikipedia.org/wiki/UMBEL. [2] Courtesy of Jana Vataščinová (University of Economics, Prague) and Ondřej Zamazal (University of Economics, Prague, COSOL project). [3] See, for example, M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web,” AI3:::Adaptive Information blog, July 13, 2015.

‘Deep Graphs’: A New Framework for Network Analysis

AI3:::Adaptive Information (Mike Bergman) - Tue, 04/05/2016 - 21:28
New Method Appears Promising for Machine Learning, Feature Generation

An exciting new network analysis framework was published today. The paper, Deep Graphs – A General Framework to Represent and Analyze Heterogeneous Complex Systems Across Scales, presents the background information and derivation of methods applied to this new approach for analyzing networks [1]. The authors of the paper, Dominik Traxl, Niklas Boers and Jürgen Kurths, also released the open source DeepGraph network analysis package, written in Python, for undertaking and conducting the analysis. Detailed online documentation accompanies the entire package.

The basic idea behind Deep Graphs is to segregate graph nodes and edges into types, which form supernodes and superedges, respectively. These grouped types then allow the graph to be partitioned into lattices, which can be intersected (combinations of nodes and edges) into representing deeper graph structures embedded in the initial graph. The method can be applied to a graph representation of anything, since the approach is grounded in the graph primitives of nodes and edges using a multi-layer network (MLN) representation.

These deeper graph structures can themselves be used as new features for machine learning or other applications. A deep graph, which the authors formally define as a geometric partition lattice of the source graph, conserves the original information in the graph and allows it to be redistributed to the supernodes and superedges. Intersections of these may surface potentially interesting partitions of the graph that deserve their own analysis.

The examples the authors present show the suitability of the method for time-series data, using precipitation patterns in South America. However, as noted, the method applies to virtually any data that can be representated as a graph.

Though weighted graphs and other techniques have been used, in part, for portions of this kind of analysis in the past, this appears to be the first generalized method applicable to the broadest ways to aggregate and represent graph information. The properties associated with a given node may similarly be representated and aggregated. The aggregation of attributes may provide an additional means for mapping and relating external datasets to one another.

There are many aspects of this approach that intrigue us here at Structured Dynamics. First, we are always interested in network and graph analytical techniques, since all of our source schema are represented as knowledge graphs. Second, our specific approach to knowledge-based artificial intelligence places a strong emphasis on types and typologies for organizing entities (nodes and event relations) and we also separately segregate attribute property information [2]. And, last, finding embedded superstructures within the source graphs should also work to enhance the feature sets available for supervised machine learning.

We will later post our experiences in working with this promising framework.

[1] Dominik Traxl, Niklas Boers and Jürgen Kurths, “Deep Graphs – A General Framework to Represent and Analyze Heterogeneous Complex Systems Across Scales“, arXiv:1604.00971, April 5, 2016. To be published in Chaos: An Interdisciplinary Journal of Nonlinear Science. [2] See M. K. Bergman, 2014. “Knowledge-based Artificial Intelligence,” from AI3:::Adaptive Information blog, November 17, 2014.

Withstanding the Test of Time

AI3:::Adaptive Information (Mike Bergman) - Mon, 03/28/2016 - 16:50
Long-lost Global Warming Paper is Still Pretty Good

My first professional job was being assistant director and then project director for a fifty-year look at the future of coal use by the US Environmental Protection Agency. The effort, called the Coal Technology Assessment (CTA), was started under the Carter Administration in the late 1970s, and then completed after Reagan took office in 1981. That era also spawned the Congressional Office of Technology Assessment. Trying to understand and forecast technological change was a big deal at that time.

 We produced many, many reports from the CTA program, some of which were never published because of politics and whether they were at odds or not with official policies of one or the other administration. Nonetheless, we did publish quite a few reports. Perhaps it is the sweetness of memory, but I also recollect we did a pretty good job. Now that more than 35 years have passed, it is possible to see whether we did a good job or not in our half-century forecasts.

The CTA program was the first to publish an official position of EPA on global warming [1], which we also backed up with a more formal academic paper [2]. I have thought much of that paper on occasion over the years, but I did not have a copy myself and only had a memory, but not hard copy, of the paper.

Last week, however, I was contacted by a post-doctoral researcher in Europe trying to track down early findings and recollections of some of the earliest efforts on global climate change. She had a copy of our early paper and was kind enough to send me a copy. I have since been able to find other copies online [2].

In reading over the paper again, I am struck by two things. First, the paper is pretty good, and still captures (IMO) the uncertainty of the science and how to conduct meaningful policy in the face of that uncertainty. And, second, but less positive, is the sense of how little truly has gotten done in the intervening decades. This same sense of déjà vu all over again applies to many of the advanced energy technologies — such as fuel cells, photovoltaics, and passive solar construction — we were touting at that time.

Of course, my own career has moved substantially from energy technologies and policy to a different one of knowledge representation and artificial intelligence. But, it is kind of cool to look back on the passions of youth, and to see that my efforts were not totally silly. It is also kind of depressing to see how little has really changed in nearly four decades.

[1] M.K. Bergman, 1980. “Atmospheric Pollution: Carbon Dioxide,” Environmental Outlook — 1980, Strategic Analysis Group, U.S. Environmental Protection Agency, EPA 600/8 80 003, July 1980, pp. 225-261. [1] Kan Chen, Richard C. Winter, and Michael K. Bergman, 1980. “Carbon dioxide from fossil fuels: Adapting to uncertainty.” Energy Policy 8, no. 4 (1980): 318-330.

A Foundational Mindset: Firstness, Secondness, Thirdness

AI3:::Adaptive Information (Mike Bergman) - Mon, 03/21/2016 - 16:26

Download as PDF

Unlocking Some Insights into Charles Sanders Peirce’s Writings

I first encountered Charles Sanders Peirce from the writings of John Sowa about a decade ago. I was transitioning my research interests from search and the deep Web to the semantic Web. Sowa’s writings are an excellent starting point for learning about logic and ontologies [1]. I was particularly taken by Sowa’s presentation on the role of signs in our understanding of language and concepts [2]. Early on it was clear to me that knowledge modeling needed to focus on the inherent meaning of things and concepts, not their surface forms and labels. Sowa helped pique my interest that Peirce’s theory of semiotics was perhaps a foundational basis for getting at these ideas.

In the decade since that first encounter, I have based my own writings on Peirce’s insights on a number of occasions [3]. I have also developed a fascination into his life and teachings and thoughts across many topics. I have become convinced that Peirce was the greatest American combination of philosopher, logician, scientist and mathematician, and quite possibly one of the greatest thinkers ever. While the current renaissance in artificial intelligence can certainly point to the seminal contributions of George Boole, Claude Shannon, and John von Neumann in computing and information theory (of course among many others), my own view, not alone, is that C.S. Peirce belongs in those ranks from the perspective of knowledge representation and the meaning of information.

“The primary task of ontology, as it was practiced by its founder Aristotle, is to bridge the gap between what exists and the languages, both natural and artificial, for talking and reasoning about what exists.” John Sowa [4]

Peirce is hard to decipher, for some of the reasons outlined below. Yet I have continued to try to crack the nut of Peirce’s insights because his focus is so clearly on the organization and categorization of information, essential to the knowledge foundations and ontologies at the center of Structured Dynamics‘ client activities and my own intellectual passions. Most recently, I had one of those epiphanies from my study of Peirce that scientists live for, causing me to change perspective from specifics and terminology to one of mindset and a way to think. I found a key to unlock the meaning basis of information, or at least one that works for me. I try to capture a sense of those realizations in this article.

A Starting Point: Peirce’s Triadic Semiosis

Since it was the idea of sign-forming and the nature of signs in Peirce’s theory of semiosis that first caught my attention, it makes sense to start there. The figure to the right shows Peirce’s understanding of the basic, triadic nature of the sign. Triangles and threes pervade virtually all aspects of Peirce’s theories and metaphysics.

For Peirce, the appearance of a sign starts with the representamen, which is the trigger for a mental image (by the interpretant) of the object [20]. The object is the referent of the representamen sign. None of the possible bilateral (or dyadic) relations of these three elements, even combined, can produce this unique triadic perspective. A sign can not be decomposed into something more primitive while retaining its meaning.

A sign is an understanding of an “object” as represented through some form of icon, index or symbol, from environmental to visual to aural or written. Complete truth is the limit where the understanding of the object by the interpretant via the sign is precise and accurate. Since this limit is rarely (ever?!) achieved, sign-making and understanding is a continuous endeavor. The overall process of testing and refining signs so as to bring understanding to a more accurate understanding is what Peirce called semiosis [5].

In Peirce’s world view — at least as I now understand it — signs are the basis for information and life (yes, you read that right) [6]. Basic signs can be building blocks for still more complex signs. This insight points to the importance of the ways these components of signs relate to one another, now adding the perspective of connections and relations and continuity to the mix.

Because the interpretant is an integral component of the sign, the understanding of the sign is subject to context and capabilities. Two different interpretants can derive different meanings from the same representation, and a given object may be represented by different tokens. When the interpretant is a human and the signs are language, shared understandings arise from the meanings given to language by the community, which can then test and add to the truth statements regarding the object and its signs, including the usefulness of those signs. Again, these are drivers to Peirce’s semiotic process.

Thinking in Threes: Context for Peirce’s Firstness, Secondness, Thirdness

As Peirce’s writings and research evolved over the years, he came to understand more fundamental aspects of this sign triad. Trichotomies and triads permeate his theories and writings in logic, realism, categories, cosmology and metaphysics. He termed this tendency and its application in the general as Firstness, Secondness and Thirdness. In Peirce’s own words [7]:

“The first is that whose being is simply in itself, not referring to anything nor lying behind anything. The second is that which is what it is by force of something to which it is second. The third is that which is what it is owing to things between which it mediates and which it brings into relation to each other.” (CP 2.356)

Peirce’s fascination with threes is not unique. In my early career designing search engines, we often used threes as quick heuristics for setting weights and tuning parameters. We note that threes are at the heart of the Resource Description Framework data model, with its subject–predicate–object ‘triples’ that are its basic statements and assertions. The logic gates of transistors are based on threes. From an historical perspective prior to Peirce, scholastic philosophers, ranging from Duns Scotus and the Modists from medieval times to John Locke and Immanuel Kant with his three formulations, expressed much of their thinking in threes [8]. As Locke wrote in 1690 [9]:

“The ideas that make up our complex ones of corporeal substances are of three sorts. First, the ideas of the primary qualities of things, which are discovered by our senses, and are in them even when we perceive them not; such are the bulk, figure, number, situation, and motion of the parts of bodies which are really in them, whether we take notice of them or no. Secondly, the sensible secondary qualities which, depending on these, are nothing but the powers these substances have to produce several ideas in us by our senses; which ideas are not in the things themselves otherwise than as anything is in its cause. Thirdly, the aptness we consider in any substance to give or receive such alteration of primary qualities, as that the substance, so altered should produce in us different ideas from what it did before.”

More recently, one the pioneers of artificial intelligence, Marv Minksy, who passed away in late January, noted his penchant for threes [10]:

Marv Minksky on
the Philosophy of Thinking in Threes

But in knowledge representation, as practiced today in foundational or upper ontologies, the organizational view of the world is mostly binary. Upper ontologies often reflect one or more of these kinds of di-chotomies [11,12] (to pick up on Minksy’s joke):

  • abstract-physical — a split between what is fictional or conceptual and what is tangibly real
  • occurrent-continuant — a split between a “snapshot” view of the world and its entities versus a “spanning” view that is explicit about changes in things over time
  • perduant-endurant — a split for how to regard the identity of individuals, either as a sequence of individuals distinguished by temporal parts (for example, childhood or adulthood) or as the individual enduring over time
  • dependent-independent — a split between accidents (which depend on some other entity) and substances (which are independent)
  • particulars-universals — a split between individuals in space and time that cannot be attributed to other entities versus abstract universals such as properties that may be assigned to anything, or
  • determinate-indeterminate.

While it is true that most of these distinctions are important ones in a foundational ontology, that does not mean that the entire ontology space should be dichotomized between them. Further, with the exception of Sowa’s ontology [4], none of the more common upper ontologies embrace any semblance of Peirce’s triadic perspective. Further, even Sowa’s ontology only partially applies Peircean principles, and it has been criticized on other grounds as well [11].

The triadic model of signs was built and argued by Peirce as the most primitive basis for applying logic suitable for the real world, with conditionals, continua and context. Truthfulness and verifiability of assertions is by nature variable. The ability of the primitive logic to further categorize the knowledge space led Peirce to elaborate well a 10-sign system, followed by a 28-sign and then a 66-sign one [13]. Neither of the two larger systems were sufficiently described by Peirce before his death. Though Peirce notes in multiple places the broad applicability of the logic of semiosis to things like crystal formation, the emergence of life, animal communications, and automation, his primary focus appears to have been human language and signs used to convey concepts and thoughts. But we are still mining Peirce’s insights, with only about 25% of his writings yet published [14].

The nature needed to be the sign because that is how information is conveyed, and the trichotomy parts were the fewest “decomposable” needed to model the real world; we would call these “primitives” in modern terminology. Here are some of Peirce’s thoughts as to what makes something “indecomposable” (in keeping with his jawbreaking terminology) [7]:

“It is a priori impossible that there should be an indecomposable element which is what it is relatively to a second, a third, and a fourth. The obvious reason is that that which combines two will by repetition combine any number. Nothing could be simpler; nothing in philosophy is more important.” (CP 1.298)

“We find then a priori that there are three categories of undecomposable elements to be expected in the phaneron: those which are simply positive totals, those which involve dependence but not combination, those which involve combination.” (CP 1.299)

“I will sketch a proof that the idea of meaning is irreducible to those of quality and reaction. It depends on two main premisses. The first is that every genuine triadic relation involves meaning, as meaning is obviously a triadic relation. The second is that a triadic relation is inexpressible by means of dyadic relations alone. . . . every triadic relation involves meaning.” (CP 1.345)

“And analysis will show that every relation which is tetradic, pentadic, or of any greater number of correlates is nothing but a compound of triadic relations. It is therefore not surprising to find that beyond the three elements of Firstness, Secondness, and Thirdness, there is nothing else to be found in the phenomenon.” (CP 1.347)

Robert Burch has called Peirce’s ideas of “indecomposability” the ‘Reduction Thesis’ [15]. Peirce was able to prove these points with his form of predicate calculus (first-order logic) and via the logics of his existential graphs.

Once the basic structure of the trichotomy and the nature of its primitives were in place, it was logical for Peirce to generalize the design across many other areas of investigation and research. Because of the signs’ groundings in logic, Peirce’s three main forms of deductive, inductive and abductive logic also flow from the same approach and mindset. Using his broader terminology of the general triad, Peirce writes that when the First and Second [7]:

“. . . are found inadequate, the third is the conception which is then called for. The third is that which bridges over the chasm between the absolute first and last, and brings them into relationship. We are told that every science has its qualitative and its quantitative stage; now its qualitative stage is when dual distinctions — whether a given subject has a given predicate or not — suffice; the quantitative stage comes when, no longer content with such rough distinctions, we require to insert a possible halfway between every two possible conditions of the subject in regard to its possession of the quality indicated by the predicate. Ancient mechanics recognized forces as causes which produced motions as their immediate effects, looking no further than the essentially dual relation of cause and effect. That was why it could make no progress with dynamics. The work of Galileo and his successors lay in showing that forces are accelerations by which [a] state of velocity is gradually brought about. The words “cause” and “effect” still linger, but the old conceptions have been dropped from mechanical philosophy; for the fact now known is that in certain relative positions bodies undergo certain accelerations. Now an acceleration, instead of being like a velocity a relation between two successive positions, is a relation between three. . . . we may go so far as to say that all the great steps in the method of science in every department have consisted in bringing into relation cases previously discrete.” (CP 1.359)

My intuition of the importance of the third part of the triad comes from such terms as perspective, gradation and probability, concepts impossible to capture in a binary world.

Some Observations on the Knowledge Of and Use of the Peircean Triad

C.S. Peirce embraced a realistic philosophy, but also embedded it in a belief that our understanding of the world is fallible and that we needed to test our perceptions via logic. Better approximations of truth arise from questioning using the scientific method (via a triad of logics) and from refining consensus within the community about how (via language signs) we communicate that truth. Peirce termed this overall approach pragmatism; it is firmly grounded in Peirce’s views of logic and his theory of signs. While there is absolute truth, in Peirce’s semiotic process it acts more as a limit, to which our seeking of additional knowledge and clarity of communication with language continuously approximates. Through the scientific method and questioning we get closer and closer to the truth and to an ability to communicate it to one another. But new knowledge may change those understandings, which in any case will always remain proximate [16].

Peirce greatly admired the natural classification systems of Louis Agassiz and used animal lineages in many of his examples. He was a strong proponent of natural classification. Though the morphological basis for classifying organisms in Peirce’s day has been replaced with genetic means, Peirce would surely support this new knowledge, since his philosophy is grounded on a triad of primitive unary, binary and tertiary relations, bound together in a logical sign process seeking truth. Again, Peirce called these Firstness, Secondness, and Thirdness.

Like many of Peirce’s concepts, his ideas of Firstness, Secondness and Thirdness (which I shall hereafter just give the shorthand of ‘Thirdness‘) have proven difficult to grasp, let alone articulate. After a decade of reading and studying Peirce, I think I can point to these factors as making Peirce a difficult nut to crack:

  • First, though most papers that Peirce published during his lifetime are available, perhaps as many as three-quarters of his writings still wait to be transcribed [14];
  • Second, Peirce is a terminology junky, coining and revising terms with infuriating frequency. I don’t think he did this just to be obtuse. Rather, in his focus on language and communications (as signs) he wanted to avoid imprecise or easily confused terms. He often tried to ground his terminology in Greek language roots, and tried to be painfully precise in his use of suffixes and combinations. Witness his use of semeiosis over semiosis, or the replacement of pragmatism with pragmaticism to avoid the misuse he perceived from its appropriation by William James. That Peirce settled on his terminology of Thirdness for his triadic relations signifies its generality and universal applicability;
  • Third, Peirce wrote and refined his thinking over a written historical record of nearly fifty years, which was also a period of the most significant technological changes in human history. Terms and ideas evolved much over this time. His views of categories and signs evolved in a similar manner. In general, revisions in terminology or concepts in his later writings should hold precedence over earlier ones;
  • Fourth, he was active in elaborating his theory of signs to be more inclusive and refined, a work of some 66 putative signs that remained very much incomplete at the time of his death. There has been a bit of a cottage industry in trying to rationalize and elucidate what this more complex sign schema might have meant [17], though frankly much of this learned inspection feels terminology-bound and more like speculation than practical guidance; and
  • Fifth, and possibly most importantly, most Peircean scholarship appears to me to be more literal with an attempt to discern original intent. Many arguments seem fixated on nuance or terminology interpretation as opposed to its underlying meaning or mindset. To put it in Peircean terms, most scholarship of Peirce’s triadic signs seems to be focused on Firstness and Secondness, rather than Thirdness.

The connections of Peirce’s sign theory, his three-fold logic of deduction-induction-abduction, the role he saw for the scientific method as the proper way to understand and adjudicate “truth”, and his really neat ideas about a community of inquiry have all fed my intuition that Peirce was on to some very basic insights. My Aha! moment, if I can elevate it as such, was when I realized that trying to cram these insights into Peirce’s elaborate sign terminology and other literal aspects of his writing were self-defeating. The Aha! arose when I chose rather to try to understand the mindset underlying Peirce’s thinking and the triadic nature of his semiosis. The very generalizations Peirce made himself around the rather amorphous designations of Firstness, Secondness, Thirdness seemed to affirm that what he was truly getting at was a way of thinking, a way of “decomposing” the world, that had universal applicability irrespective of domain or problem.

Thus, in order to make this insight operational, it first was necessary to understand the essence of what lies behind Peirce’s notions of Firstness, Secondness and Thirdness.

An Expanded View of Firstness, Secondness and Thirdness

Peirce’s notions of Thirdness are expressed in many different ways in many different contexts. These notions have been further interpreted by the students of Peirce. In order to get at the purpose of the triadic Thirdness concepts, I thought it useful to research the question in the same way that Peirce recommends. After all, Firstness, Secondness and Thirdness should themselves be prototypes for what Peirce called the “natural classes” [7]:

“The descriptive definition of a natural class, according to what I have been saying, is not the essence of it. It is only an enumeration of tests by which the class may be recognized in any one of its members. A description of a natural class must be founded upon samples of it or typical examples.” (CP 1.223)

The other interesting aspect of Peirce’s Thirdness is how relations between Firstness, Secondness and Thirdness are treated. Because of the sort of building block nature inherent in a sign, not all potential dyadic relations between the three elements are treated equally. According to the ‘qualification rule’, “a First can be qualified only by a first; a Second can be qualified by a First and a Second; and a Third can be qualified by a First, Second, and a Third” [18]. Note that a Third can not be involved in either a First or Second.

Keeping these dynamics in mind, here is my personal library of Thirdness relationships as expressed by Peirce in his own writings, or in the writings of his students. Generally, references to Thirdness are scattered, and to my knowledge no where can one see more than two or three examples side-by-side. The table below is thus “an enumeration of tests by which the class may be recognized in any one of its members” [19]:

Firstness Secondness Thirdness first second third monad dyad triad point line triangle being existence external qualia particularity generality chaos order structure “past” “present” “future” sign object interpretant inheres adheres coheres attribute individual type icon index symbol quality “fact” thought sensation reaction convergence independent relative mediating intension extension information internal external conceptual spontaneity dependence meaning possibility fact law feeling effort habit chance law habit-taking qualities of phenomena actual facts laws (and thoughts) feeling consciousness thought thought-sign connected interpreted possible modality actual modality necessary modality possibles occurrences collections abstractives concretetives collectives descriptives denominatives distributives conscious (feeling) self-conscious mind words propositions arguments terms propositions inferences/syllogisms singular characters dual characters plural characters absolute chance mechanical necessity law of love symbols generality interpreter simples recurrences comprehensions idea (of) kind of existence continuity ideas determination of ideas by
previous ideas determination of ideas by
previous process what is possible what is actual what is necessary hypothetical categorical relative deductions inductions abductions clearness of conceptions clearness of distinctions clearness of practical implications speculative grammar logic and classified arguments methods of truth-seeking phenomenology normative science metaphysics tychasticism anancasticism agapasticism primitives and essences characterizing the objects transformations and reflections what may be what characterizes it what it means complete in itself, freedom, measureless variety, freshness, multiplicity, manifold of sense, peculiar, idiosyncratic, suchness idea of otherness, comparison, dichotomies, reaction, mutual action, will, volition, involuntary attention, shock, sense of change idea of composition, continuity, moderation, comparative, reason, sympathy, intelligence, structure, regularities, representation Examples from Research and the Literature of Firstness, Secondness, Thirdness

The best way to glean meaning from this table is through some study and contemplation.

Because these examples are taken from many contexts, it is important to review this table on a row-by-row basis when investigating the nature of ‘Thirdness’. Review of the columns helps elucidate the “natural classes” of Firstness, Secondness and Thirdness. Some items appear in more than one column, reflecting the natural process of semiosis wherein more basic concepts cascade to the next focus of semiotic attention. The last row is a kind of catch-all trying to capture other mentions of Thirdness in Peirce’s phenomenology.

The table spans from the fully potential or abstract, such as “first” or “third”, to entire realms of science or logic. This spanning of scope reflects the genius of Peirce’s insight wherein semiosis can begin literally at the cusp of Nothingness [20] and then proceed to capture the process of signmaking, language, logic, the scientific method and thought abstraction to embrace the broadest and most complex of topics. This process is itself mediated by truth-testing and community use and consensus, with constant refinement as new insights and knowledge arise.

Reviewing these trichotomies affirms the fulsomeness of Peirce’s semiotic model. Further, as Peirce repeatedly noted, there are no hard and fast boundaries between these categories [21]. Forces of history or culture or science are complex and interconnected in the extreme; trying to decompose complicated concepts into their Thirdness is a matter of judgment and perspective. Peirce, however, was serene about this, since the premises and assignments resulting from such categorizations are (ultimately) subject to logical testing and conformance with the observable, real world.

The ‘Thirdness’ Mindset Applied to Categorization

Our excursion into Peirce’s foundational, triadic view was driven by pragmatic needs. Structured Dynamics‘ expertise in knowledge-based artificial intelligence (KBAI) benefits from efficient and coherent means to represent knowledge. The data models and organizational schema underlying KR should be as close as possible to the logical ways the world is structured and perceived. A key aspect of that challenge is how to define a grammar and establish a logical structure for representing knowledge. Peirce’s triadic approach and mindset have come to be, in my view, essential foundations to that challenge.

As before, we will again let Peirce’s own words guide us in how to approach the categorization of our knowledge domains. Let’s first address the question of where we should direct attention. How do we set priorities for where our categorization attention should focus? [7]:

“Taking any class in whose essential idea the predominant element is Thirdness, or Representation, the self development of that essential idea — which development, let me say, is not to be compassed by any amount of mere “hard thinking,” but only by an elaborate process founded upon experience and reason combined — results in a trichotomy giving rise to three sub-classes, or genera, involving respectively a relatively genuine thirdness, a relatively reactional thirdness or thirdness of the lesser degree of degeneracy, and a relatively qualitative thirdness or thirdness of the last degeneracy. This last may subdivide, and its species may even be governed by the three categories, but it will not subdivide, in the manner which we are considering, by the essential determinations of its conception. The genus corresponding to the lesser degree of degeneracy, the reactionally degenerate genus, will subdivide after the manner of the Second category, forming a catena; while the genus of relatively genuine Thirdness will subdivide by Trichotomy just like that from which it resulted. Only as the division proceeds, the subdivisions become harder and harder to discern.” (CP 5.72)

The way I interpret this (in part) is that categories in which new ideas or insights have arisen — themselves elements of Thirdness for that category — are targets for new categorization. That new category should focus on the idea or insight gained, such that each new category has a character and scope different from the one that spawned it. Of course, based on the purpose of the KBAI effort, some ideas or insights have larger potential effect on the domain, and those should get priority attention. As a practical matter this means that categories of more potential importance to the sponsor of the KBAI effort receive the most focus.

Once a categorization target has been chosen, Peirce also put forward some general execution steps [7]:

“. . . introduce the monadic idea of »first« at the very outset. To get at the idea of a monad, and especially to make it an accurate and clear conception, it is necessary to begin with the idea of a triad and find the monad-idea involved in it. But this is only a scaffolding necessary during the process of constructing the conception. When the conception has been constructed, the scaffolding may be removed, and the monad-idea will be there in all its abstract perfection. According to the path here pursued from monad to triad, from monadic triads to triadic triads, etc., we do not progress by logical involution — we do not say the monad involves a dyad — but we pursue a path of evolution. That is to say, we say that to carry out and perfect the monad, we need next a dyad. This seems to be a vague method when stated in general terms; but in each case, it turns out that deep study of each conception in all its features brings a clear perception that precisely a given next conception is called for.” (CP 1.490)

We are basing this process of categorization upon the same triadic design noted above. However, now that our context is categorization, the nature of the triad is different than that for the basic sign, as the similar figure to the right attests.

The area of the Secondness is where we surface and describe the particular objects or elements that define this category. Peirce described it thus [7]:

“So far Hegel is quite right. But he formulates the general procedure in too narrow a way, making it use no higher method than dilemma, instead of giving it an observational essence. The real formula is this: a conception is framed according to a certain precept, [then] having so obtained it, we proceed to notice features of it which, though necessarily involved in the precept, did not need to be taken into account in order to construct the conception. These features we perceive take radically different shapes; and these shapes, we find, must be particularized, or decided between, before we can gain a more perfect grasp of the original conception. It is thus that thought is urged on in a predestined path. This is the true evolution of thought, of which Hegel’s dilemmatic method is only a special character which the evolution is sometimes found to assume.” (CP 1.491)

In Thirdness we are contemplating the category, thinking about it, analyzing it, using and gaining experience with it, such that we can begin to see patterns or laws or “habits” (as Peirce so famously put it) or new connections and relationships with it. The ideas and insights (and laws or standardizations) that derive from this process are themselves elements of the category’s Thirdness. This is where new knowledge arises or purposes are fulfilled, and then subsequently split and codified as new signs useful to the knowledge space.

As domains are investigated to deeper levels or new insights expand the branches of the knowledge graph, each new layer is best tackled via this three-fold investigation. Of course, context requires its own perspectives and slices; the listing of Thirdness options provided above can help stimulate these thoughts.

Using Peirce’s labels, but my own diagram, we can show the categorization process as having some sequential development:

But, of course, interrelationships adhere to the Peircean Thirdness and there continues to be growth and additions. Categories thus tend to fill themselves up with more insights and ideas until such time as the scope and diversity compel another categorization. In these ways categorization is not linear, but accretive and dynamic.

Like our investigations of the broad idea of Thirdness above, there are some Firstness, Secondness, and Thirdness aspects of how to think about the idea of categorization. I use this kind of mental checklist when it comes time to split a concept or category into a new categorization:

Firstness Secondness Thirdness Symbols idea of; nature of; milieu;
“category potentials” reference concepts standards Generality cross-products of Firstness language (incl. domain); computational analysis; representation; continua Interpreters
(human or machine) What are the ingredients, ideas, essences of the category? What are the new things or relationships of the category? What are the laws, practices, outputs arising from the category? General Thoughts on Using ‘Thirdness’ for Categorization

The essential point is to break free from Peirce’s often stultifying terminology and embrace the mindset behind Thirdness. Categorization, or any other knowledge representation task for that matter, can be approached logically and, yes, systematically.

The Perspective of Thirdness

Just as perspective does not occur without Thirdness, I think we will see Peirce’s contributions make a notable difference in how knowledge representation efforts move forward. A driver of this change is knowledge-based artificial intelligence. I feel like problems and questions that have stymied me for decades are lifting like so much fog as I embrace the Peircean Thirdness mindset. I think that it is possible to codify and train others to use this mindset, which is really but a specialized application of Peirce’s overall conception of semiosis [22].

Twenty five years ago Nathan Houser opined that “. . . a sound and detailed extension of Peirce’s analysis of signs to his full set of ten divisions and sixty-six classes is perhaps the most pressing problem for Peircean semioticians” [23]. I agree with the sense of this opinion, but the ten divisions and sixty-six classes are a sign classification; the greater primitive for Peirce’s thinking is the triad and his application of it across all domains of discourse. This is the better grounding for understanding Peirce.

John Sowa, mentioned in the intro, also put forward a knowledge representation, which he partially attributed to Peirce [2,4], and included the three basic elements of the sign triad. But Sowa did not infuse his design with the Peircean triad, with the amalgam criticized for its lack of coherency [11]. Peircean ideas have also informed computational approaches [24] and language parsing [25]. Nonetheless, despite important Peircean ideas and contributions across the knowledge representation spectrum, I have been unable to find any upper ontology or vocabulary based on Thirdness. Terminology can get in the way.

In the intro, I mentioned my epiphany from specifics to mindset in Peirce’s teachings. This insight has not caused me to suddenly understand everything Peirce was trying to say, nor to come to some new level of consciousness. However, what it has done is to open a door to a new way of thinking and looking at the world. I am now finding prior, knotty problems of categorization and knowledge representation are becoming (more) tractable. I am excited and eager to look at some problems that have stymied me for years. Many of these problems, such as how to model events, situations, identity, representation, and continuity or characterization through time, may sound like philosophers’ mill stones, but they often lie at the heart of the most difficult problems in knowledge modeling and representation. Even the tiniest break in the mental and conceptual logjams around such issues feels like major progress. For that, I thank Peirce’s triads.

[1] See Sowa’s Web site, especially the sections on ontology, knowledge representation, and publications. [2] See, for example, John F. Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS 2000 in Darmstadt, Germany, on August 14, 2000; published in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. May be found at http://www.jfsowa.com/ontology/ontometa.htm. Also see John F. Sowa, 2006. “Peirce’s Contributions to the 21st Century,” presented at International Conference on Conceptual Structures, Aalborg, Denmark, July 17, 2006; and [4] below. [3] I have written a number of pieces based primarily around Peirce’s insights; see, for example, http://www.mkbergman.com/category/peircean-principles/. [4] John F. Sowa, 2001. “Signs, Processes, and Language Games: Foundations for Ontology,” in Proceedings of the 9th International Conference On Conceptual Structures, ICCS’01. 2001. [5] Peirce actually spelled his approach as semeiosis, but I use the simpler version here. See also separate discussion of pragmaticism. [6] For example, Peirce said [7]: “Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there.” (CP 4.551). At first this seems rather strange. However, “thought” for Peirce in this context is the notion of the process by which the sign is recognized and interpreted. See also [20]. [7] See the electronic edition of The Collected Papers of Charles Sanders Peirce, reproducing Vols. I-VI, Charles Hartshorne and Paul Weiss, eds., 1931-1935, Harvard University Press, Cambridge, Mass., and Arthur W. Burks, ed., 1958, Vols. VII-VIII, Harvard University Press, Cambridge, Mass. The citation scheme is volume number using Arabic numerals followed by section number from the collected papers, shown as, for example, CP 1.208. [8] Also see, for example, the use of trichotomies in philosophy or some of the nature of three in mathematics or religion. [9] J. Locke, 1690. “An Essay Concerning Human Understanding”, Book II, Chapter XXXIII. Reprinted, 1964: 249. John Y. Yolton, Ed. Dutton. New York, NY. [10] See http://www.webofstories.com/play/marvin.minsky/111. [11] Ludger Jansen, 2008. “Categories: The Top-level Ontology,” Applied ontology: An introduction (2008): 173-196. [12] Nicola Guarino, 1997. “Some Organizing Principles For A Unified Top-Level Ontology,” National Research Council, LADSEB-CNR Int. Report, V3.0, August 1997  [13] P. Farias and J. Queiroz, 2003. “On Diagrams for Peirce’s 10, 28, and 66 Classes of Signs“, Semiotica 147(1/4), pp.165-184. [14] Spencer Case, 2014. “The Man with a Kink in His Brain,” from online National Review, July 21, 2014. “Over the course of Peirce’s life, that kinky brain produced a total of about 12,000 printed pages and 80,000 handwritten pages. The Peirce Edition Project, founded in 1976, is still organizing and editing the massive Peirce corpus. So far, Indiana University Press has published seven volumes of his writings — of an expected thirty.” [15] Robert Burch has called Peirce’s ideas of “indecomposability” the ‘Reduction Thesis’; see Robert Burch, 1991. A Peircean Reduction Thesis: The Foundations of Topological Logic, Texas Tech University Press, Lubbock, TX. Peirce’s reduction thesis is never stated explicitly by Peirce, but is alluded to in numerous snippets. The basic thesis is that ternary relations suffice to construct arbitrary relations, but that not all relationscan be constructed from unary and binary relations alone. [16] M.K. Bergman, 2016. “Re-thinking Knowledge Representation,” AI3:::Adaptive Information blog, March 14, 2016. [17] Amongst many, see, for example, Janos J. Sarbo and József I. Farkas, 2013. “Towards Meaningful Information Processing: A Unifying Representation for Peirce’s Sign Types,” Signs-International Journal of Semiotics 7 (2013): 1-44. In that article, the authors state: ” . . . our model has the potential of representing three types of relation, consisting of 10, 28, and 66 elements, that are analogous to Peirce’s three classifications of signs. This implies the possibility of a common representation for Peirce’s different classifications.. . . By virtue of the above relation with Peircean semiotics, and because of the fundamental nature of signs, our approach has the potential for a uniform modeling of information processing in any domain, theoretically.” Two other researchers of Peircean signs are, for example, P. Farias and J. Queiroz, 2003. “On Diagrams for Peirce’s 10, 28, and 66 Classes of Signs”, Semiotica 147(1/4), pp.165-184. Also, the Web site Minute Semiotic is dedicated to one interpretation of Peirce’s signs, including interactive descriptions (from the author’s perspective) of the 66 Peircean signs. [18] David Savan, 1987-1988. “An Introduction to C.S.Peirce’s Full System of Semeiotic,” Monograph Series of the Toronto Semiotic Circle. Vol. 1.  [19] Table sources and the order of presentation very roughly move from the primitive to the more complex and elaborative. [20] The idea of Firstness may range from something like an energetic input that causes chemicals to combine into a new structured form or ordered state to something like a new recognition in the mind occasioned by a flick of the eye or a shifting thought. The representamen is merely a potential sign until it is energized or intrudes on consciousness, wherein the object is now made apparent as interpreted. The process of reifying the sign itself produces a new reality, its Thirdness, which can then become a subject of the sign-recognizing process in its own right. In this regard, Peirce was formulating a theory of signs that could describe how more order may occur in the world, including the formation and evolution of the cosmos and the initial origins of life. [21] As one example, Peirce states [7]: “. . . it may be quite impossible to draw a sharp line of demarcation between two classes, although they are real and natural classes in strictest truth. Namely, this will happen when the form about which the individuals of one class cluster is not so unlike the form about which individuals of another class cluster but that variations from each middling form may precisely agree.” (Peirce CP 1.208) [22] Semiosis has been viewed my many as applicable to a wide variety of domains such as animal calls and language, the chemical and energetic origin of life, evolution, and language analysis and parsing. The linkage of these ideas to Peirce results from his statement such as [7]: “Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there. . . . Not only is thoughtin the organic world, but it develops there.” (Peirce CP 4.551) [23] Nathan Houser, 1992. “On Peirce’s Theory of Propositions: A Response to Hilpinen.” Transactions of the Charles S. Peirce Society 28, no. 3 (1992): 489-504. [24] See, for example, Gary Richmond’s trikonic approach: Gary Richmond, 2005. “Outline of trikonic Diagrammatic Trichotomic,” in: F. Dau, M.L. Mugnier, and G. Tumme, ed., Conceptual Structures: Common Semantics for Sharing Knowledge: 13th International Conference on Conceptual Structures, ICCS 2005, Kassel, Germany, 17–22 July 2005. Springer-Verlag GmbH, pp. 453 – 466. [25] See, for example, one of the earlier examples, John F. Sowa, 1991. “Toward the Expressive Power of Natural Language.” Principles of Semantic Networks (1991): 157-189.

Re-thinking Knowledge Representation

AI3:::Adaptive Information (Mike Bergman) - Mon, 03/14/2016 - 17:15
A New Era in Artificial Intelligence Will Open Pandora’s Box

Here’s a prediction: the new emphasis on artificial intelligence and robotics will occasion some new looks at knowledge representation. Prior to the past few years many knowledge representation (KR) projects have been more in the way of prototypes or games. But, now that we are seeing real robotics and knowledge-based AI activities take off, some of the prior warts and problems of leading KR approaches are starting to become evident.

For example, for years major upper-level ontologies have tended to emphasize dichotomous splits in how to “model” the world, including:

  • abstract-physical — a split between what is fictional or conceptual and what is tangibly real
  • occurrent-continuant — a split between a “snapshot” view of the world and its entities versus a “spanning” view that is explicit about changes in things over time
  • perduant-endurant — a split for how to regard the identity of individuals, either as a sequence of individuals distinguished by temporal parts (for example, childhood or adulthood) or as the individual enduring over time
  • dependent-independent — a split between accidents (which depend on some other entity) and substances (which are independent)
  • particulars-universals — a split between individuals in space and time that cannot be attributed to other entities versus abstract universals such as properties that may be assigned to anything
  • determinate-indeterminate.

Since the mid-1980s, description logics have also tended to govern most KR languages, and are the basis of the semantic Web data model and languages of RDF and OWL. (However, common logic and its dialects are also used as a more complete representation of first-order logic.) The trade-off in KR language design is one of expressiveness versus complexity.

Cyc was developed as one means to address a gap in standard KR approaches: how to capture and model common sense. Conceptual graphs, formally a part of common logic, were developed to handle n-ary relationships and the questions of sign processes (semiosis), fallibility and processes of pragmatic learning.

Zhou offers a new take on an old strategy to KR, which is to use set theory as the underlying formalism [1]. This first paper deals with the representation itself; a later paper is planned on reasoning.

We do not live in a dichotomous world. And, I personally find Charles Peirce’s semeiosis to be a more compelling grounding for what a KR design should look like. But as Zhou points out, and is evident in current AI advances, robotics and the need for efficient, effective reasoning are testing today’s standards in knowledge representation as never before. I suspect we are in for a period of ferment and innovation as we work to get our KR languages up to task.

[1] Yi Zhou, 2016. “A Set Theoretic Approach for Knowledge Representation: the Representation Part,” arXiv:1603.03511, 14 Mar 2016.

How Fine Grained Can Entity Types Get?

AI3:::Adaptive Information (Mike Bergman) - Tue, 03/08/2016 - 21:00

Download as PDF

A Typology Design Aids Continuous, Logical Typing

Entity recognition or extraction is a key task in natural language processing and one of the most common uses for knowledge bases. Entities are the unique, individual things in the world, and are also sometimes used to characterize some concepts [1]. Context plays an essential role in entity recognition. In general terms we may refer to a thing such as a camera; but a photographer may want more fine-grained distinctions such as SLR cameras or further sub-types like digital SLR cameras or even specific models like the Canon EOS 7D Mark II or even the name of the photographer’s favorite camera, such as ‘Shutter Sue‘. Capitalized names (as is the reference source for named entity recognition) often signals we are dealing with a true individual entity, but again, depending on context, a named automobile such as Chevy Malibu may refer to a specific car or to the entire class of Malibu cars.

The “official” practice of named entity recognition began with the Message Understanding Conferences, especially MUC-6 and MUC-7, in 1995 and 1997. These conferences began competitions for finding “named entities” as well as the practice of in-line tagging [2]. Some of these accepted ‘named entities‘ are also written in lower case, with examples such as rocks (‘gneiss’) or common animals or plants (‘daisy’) or chemicals (‘ozone’) or minerals (‘mica’) or drugs (‘aspirin’) or foods (‘sushi’) or whatever. Some deference was given to the idea of Kripke’s “rigid designators” as providing guidance for how to identify entities; rigid designators include proper names as well as certain natural kinds of terms like biological species and substances. Because of these blurrings, the nomenclature of “named entities” began to fade away. Some practitioners still use the term of named entities, though for some of the reasons outlined in this paper, Structured Dynamics prefers simply to use entity.

Much has changed in the twenty years since the seminal MUC conferences regarding entity recognition and characterization. We are learning to adopt a very fine-grained approach to entity types and a typology design suited to interoperating (“bridging”) over a broad range of viewpoints and contexts. Most broadly, the idea of fine-grained entity types has led us to a logically grounded typology design.

The Growing Trend to Fine-Grained Entity Types

Beginning with the original MUC conferences, the initial entity types tested and recognized were for person, organization, and location names [3]. However, it did not take long for various groups and researchers to want more entity types, more distinctions. BBN categories, proposed in 2002, were used for question answering and consisted of 29 types and 64 subtypes [4]. Sekine put forward and refined over many years his Extended Entity Types, which grew to about 200 types [5], as shown in this figure:

Sekine Extended Entity Types

These ideas of extended entity types helped inform a variety of tagging services over the past decade, notably including OpenCalais, Zemanta, AlchemyAPI, and OpenAmplify, among others. Moreover the research community also expanded its efforts into more and more entity types, or what came to be known as fine-grained entities [6].

Some of these produced more formal organizations of entity type classifications. This one, from Ling and Weld proposed 112 entity types in 2012 [7]:

Ling 112 Entity Types

Another one, from Gillick et al. in 2014 proposed 86 entity types [8], organized, in part, according to the same person, organization, and location types from the earliest MUC conferences:

Gillick 86 Entity Types

These efforts are also notable because machine learners have been trained to recognize the types shown. What entity types are covered, the different conceptions of the world, and how to organize entity types varies broadly across these references.

The complement to entity extraction for unstructured text is to label the text in the first place. For this, a number of schema presently exist that provide vocabularies of entity types and standard means for tagging text. These include:

  • DBpedia Ontology: 738 types [9]
  • schema.org: 636 types [10]
  • YAGO: 505 types; see also HYENA [11]
  • GeoNames: 654 “feature codes” [12]

In Structured Dynamics’ own work, we have mapped the UMBEL knowledge graph against Wikipedia content and found that 25,000 nodes, or more than 70 percent of its 35,000 reference concepts, correspond to entity types [13]. These mappings provide typing connections for millions of Wikipedia articles. The typing and organization of entity types thus appears to be of enormous importance in modeling and leveraging the use of knowledge bases.

When we track the coverage of entity types over the past two decades we see logarithmic growth [13]:

Growth in Recognition of Entity Types

This growth in entity types comes from wanting to describe and organize things with more precision. Tagging and extracting structured information from text are obviously a key driver. Yet, for a given enterprise, what is of interest — and at what depth — for a particular task varies widely.

The fact that knowledge bases, such as Wikipedia (but, the lesson applies to domain-specific ones as well), can be supported by entity-level information for literally thousands of entity types means that rich information is available for driving the finest of fine-grained entity extractors. To leverage this raw, informational horsepower it is essential to have a grounded understanding of what an entity is, how to organize them into logical types, and an intensional understanding of the attributes and characteristics that allow inferencing to be conducted over these types. These understandings, in turn, point to the features that are useful to machine learners for artificial intelligence. These understandings also can inform a flexible design for accommodating entity types from coarse- to fine-grained, with variable depth depending on the domain of interest.

Natural Classes and Typologies

We take a realistic view of the world. That is, we believe that what we perceive in the world is real — it is not just a consequence of what we perceive and can be aware of in our minds [14] — and that there are forces and relationships in the world independent of us as selves. Realism is a longstanding tradition in philosophy that extends back to Aristotle and embraces, for example, the natural classification systems of living things as espoused by taxonomists such as Agassiz and Linnaeus.

Charles Sanders Peirce, an American logician and scientist of the late 19th and early 20th centuries, embraced this realistic philosophy but also embedded it in a belief that our understanding of the world is fallible and that we needed to test our perceptions via logic (the scientific method) and shared consensus within the community. His overall approach is known as pragmatism and is firmly grounded in his views of logic and his theory of signs (called semiotics or semeiotics). While there is absolute truth, it actually acts more as a limit, to which our seeking of additional knowledge and clarity of communication with language continuously approximates. Through the scientific method and questioning we get closer and closer to the truth and to an ability to communicate it to one another. But new knowledge may change those understandings, which in any case will always remain proximate.

Peirce’s own words can better illustrate his perspective [15], some of which I have discussed elsewhere under his idea of “natural classes” [16]:

“Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there.” (Peirce CP 4.551)

“What if we try taking the term “natural,” or “real, class” to mean a class of which all the members owe their existence as members of the class to a common final cause? This is somewhat vague; but it is better to allow a term like this to remain vague, until we see our way to rational precision.” (Peirce CP 1.204)

“. . . it may be quite impossible to draw a sharp line of demarcation between two classes, although they are real and natural classes in strictest truth. Namely, this will happen when the form about which the individuals of one class cluster is not so unlike the form about which individuals of another class cluster but that variations from each middling form may precisely agree.” (Peirce CP 1.208)

“When one can lay one’s finger upon the purpose to which a class of things owes its origin, then indeed abstract definition may formulate that purpose. But when one cannot do that, but one can trace the genesis of a class and ascertain how several have been derived by different lines of descent from one less specialized form, this is the best route toward an understanding of what the natural classes are.” (Peirce CP 1.208)

“The descriptive definition of a natural class, according to what I have been saying, is not the essence of it. It is only an enumeration of tests by which the class may be recognized in any one of its members. A description of a natural class must be founded upon samples of it or typical examples.” (Peirce CP 1.223)

“Natural classes” thus are a testable means to organize the real objects in the world, the individual particulars of what we call “entities”. In Structured Dynamics’ usage, we define an entity as something that is an individual object, either real or mental such as an idea, either a part or a whole, and that has:

  • identity, which can be referred to via symbolic names
  • context in relation to other objects, and
  • characteristic attributes, with some expressing the essence of what type of object it is.

The key to classification of entities into categories (or “types” as we use herein) is based on this intensional understanding of attributes. Further, Peirce was expansive in his recognition of what kinds of objects could be classified, specifically including ideas, with application to areas such as social classes, man-made objects, the sciences, chemical elements and living organisms [17]. Again, here are some of Peirce’s own words on the classification of entities [15]:

“All classification, whether artificial or natural, is the arrangement of objects according to ideas. A natural classification is the arrangement of them according to those ideas from which their existence results.” (Peirce CP 1.231)

“The natural classification of science must be based on the study of the history of science; and it is upon this same foundation that the alcove-classification of a library must be based.” (Peirce CP 1.268)

“All natural classification is then essentially, we may almost say, an attempt to find out the true genesis of the objects classified. But by genesis must be understood, not the efficient action which produces the whole by producing the parts, but the final action which produces the parts because they are needed to make the whole. Genesis is production from ideas. It may be difficult to understand how this is true in the biological world, though there is proof enough that it is so. But in regard to science it is a proposition easily enough intelligible. A science is defined by its problem; and its problem is clearly formulated on the basis of abstracter science.” (Peirce CP 1.227)

A natural classification system is one, then, that logically organizes entities with shared attributes into a hierarchy of types, with each type inheriting attributes from its parents and being distinguished by what Peirce calls its final cause, or purpose. This hierarchy of types is thus naturally termed a typology.

An individual that is a member of a natural class has the same kinds of attributes as other members, all of which share this essence of the final cause or purpose. We look to Peirce for the guidance in this area because his method of classification is testable, based on discernable attributes, and grounded in logic. Further, that logic is itself grounded in his theory of signs, which ties these understandings ultimately to natural language.

Logic and the Typology Design

Unlike more interconnected knowledge graphs (which can have many network linkages), typologies are organized strictly along these lines of shared attributes, which is both simpler and provides an orthogonal means for investigating type class membership. Further, because the essential attributes or characteristics across entities in an entire domain can differ broadly — such as living v inanimate things, natural things v man-made things, ideas v physical objects, etc. — it is possible to make disjointedness assertions between entire groupings of natural entity classes. Disjoint assertions combined with logical organization and inference mean a typology design that lends itself to reasoning and tractability.

The idea of nested, hierarchical types organized into broad branches of different entity typologies also provides a very flexible design for interoperating with a diversity of world views and degrees of specificity. The photographer, as I discussed above, is interested in different camera types and even how specific cameras can relate to a detailed entity typing structure. Another party more interested in products across the board may have a view to greater breadth, but lesser depth, about cameras and related equipment. A typology design, logically organized and placed into a consistent grounding of attributes, can readily interoperate with these different world views.

A typology design for organizing entities can thus be visualized as a kind of accordion or squeezebox, expandable when detail requires, or collapsed to more coarse-grained when relating to broader views. The organization of entity types also has a different structure than the more graph-like organization of higher-level conceptual schema, or knowledge graphs. In the cases of broad knowledge bases, such as UMBEL or Wikipedia, where 70 percent or more of the overall schema is related to entity types, more attention can now be devoted to aspects of concepts or relations.

The idea that knowledge bases can be purposefully crafted to support knowledge-based artificial intelligence, or KBAI, flows from these kinds of realizations. We begin to see that we can tease out different aspects of a knowledge base, each with its own logic and relation to the other aspects. Concepts, entities, attributes and relations — including the natural classes or types that can logically organize them — all deserve discrete attention and treatment.

Peirce’s consistent belief that the real world can be logically conceived and organized provides guidance for how we can continue to structure our knowledge bases into computable form. We now have a coherent base for treating entities and their natural classes as an essential component to that thinking. We can continue to be more fine-grained so long as there are unique essences to things that enable them to be grouped into natural classes.

[1] The role for the label “entity” can also refer to what is known as the root node in some systems such as SUMO (see also http://virtual.cvut.cz/kifb/en/toc/229.html). In the OWL language and RDF data model we use, the root node is known as “thing”. Clearly, our use of the term “entity” is much different than SUMO and resides at a subsidiary place in the overall TBox hierarchy. In this case, and frankly for most semantic matches, equivalences should be judged with care, with context the crucial deciding factor. [2] N. Chinchor, 1997. “Overview of MUC-7,” MUC-7 Proceedings, 1997. [3] While all of these are indeed entity types, the early MUCs also tested dates, times, percentages, and monetary amounts. [4] Ada Brunstein, 2002. “Annotation Guidelines for Answer Types”. LDC Catalog, Linguistic Data Consortium. Aug 3, 2002. [5] See the Sekine Extended Entity Types; the listing also includes attributes info at bottom of source page. [6] For example, try this query, https://scholar.google.com/scholar?q=”fine-grained+entity”, also without quotes. [7] Xiao Ling and Daniel S. Weld, 2012. “Fine-Grained Entity Recognition,” in AAAI. 2012. [8] Dan Gillick, Nevena Lazic, Kuzman Ganchev, Jesse Kirchner, and David Huynh, 2104. “Context-Dependent Fine-Grained Entity Type Tagging,” arXiv preprint arXiv:1412.1820 (2014). [9] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann, 2009. “DBpedia-A Crystallization Point for the Web of Data.” Web Semantics: science, services and agents on the world wide web 7, no. 3 (2009): 154-165; 170 classes in this paper. That has grown to more than 700; see http://mappings.dbpedia.org/server/ontology/classes/ and http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics. [10] The listing is under some dynamic growth. This is the official count as of September 8, 2015, from http://schema.org/docs/full.html. Current updates are available from Github. [11] Joanna Biega, Erdal Kuzey, and Fabian M. Suchanek, 2013. “Inside YAGO2: A Transparent Information Extraction Architecture,” in Proceedings of the 22nd international conference on World Wide Web, pp. 325-328. International World Wide Web Conferences Steering Committee, 2013. Also see Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum, 2012. “HYENA: Hierarchical Type Classification for Entity Names,” in Proceedings of the 24th International Conference on Computational Linguistics, Coling 2012, Mumbai, India, 2012. [12] See https://en.wikipedia.org/wiki/GeoNames. [13] This figure and some of the accompanying text comes from a prior article, M.K. Bergman, “Creating a Platform for Machine-based Artificial Intelligence“, AI3:::Adaptive Information blog, September 21, 2015. [14] Realism is often contrasted to idealism, nominalism or conceptualism, wherein how the world exists is a function of how we think about or name things. Descartes, for example, summarized his conceptualist view with his aphorism “I think, therefore I am.” [15] See the electronic edition of The Collected Papers of Charles Sanders Peirce, reproducing Vols. I-VI, Charles Hartshorne and Paul Weiss, eds., 1931-1935, Harvard University Press, Cambridge, Mass., and Arthur W. Burks, ed., 1958, Vols. VII-VIII, Harvard University Press, Cambridge, Mass. The citation scheme is volume number using Arabic numerals followed by section number from the collected papers, shown as, for example, CP 1.208. [16] M.K. Bergman, 2015. “‘Natural’ Classes in the Knowledge Web“, AI3:::Adaptive Information blog, July 13, 2015. [17] See, for example, Menno Hulswit, 2000. “Natural Classes and Causation“, in the online Digital Encyclopedia of Charles S. Peirce.

Cooling the Heated Rhetoric on AI

AI3:::Adaptive Information (Mike Bergman) - Tue, 02/23/2016 - 19:21
Article Offers a Balanced View on AI and the Singularity

Possibly because we are sentient, intelligent beings, discussions about artificial intelligence often occupy extremes of alarm, potential or hyperbole. What makes us unique as humans, at least in our degree of intelligence, can be threatened when we start granting machines similar capabilities. Be it Skynet, Lt. Commander Data, military robots, or the singularity, it is pretty easy to grab attention by touting AI as the greatest threat to civilization, or the dawning of a new age of super intelligence.

To be sure, we are seeing remarkable advances in things like intelligent personal assistants that answer our spoken questions, or services that can automatically recognize and tag our images, or many, many other applications. It is also appropriate to raise questions about autonomous intelligence and its possible role in warfare [1] or other areas of risk or harm. AI is undoubtedly an area of technology innovation on the rise. It will also be a constant in human affairs into the future.

That is why a recent article by Toby Walsh on The Singularity May Never Be Near [2] is worth a read. Though only four pages long, it presents a nice historical backdrop on AI and why artificial intelligence may not unfold as many suspect. As he summarizes the article:

There is both much optimism and pessimism around artificial intelligence (AI) today. The optimists are investing millions of dollars, and even in some cases billions of dollars into AI. The pessimists, on the other hand, predict that AI will end many things: jobs, warfare, and even the human race. Both the optimists and the pessimists often appeal to the idea of a technological singularity, a point in time where machine intelligence starts to run away, and a new, more intelligent species starts to inhabit the earth. If the optimists are right, this will be a moment that fundamentally changes our economy and our society. If the pessimists are right, this will be a moment that also fundamentally changes our economy and our society. It is therefore very worthwhile spending some time deciding if either of them might be right. [1] Samuel Gibbs, 2015. “Musk, Wozniak and Hawking Urge Ban on Warfare AI and Autonomous Weapons,” The Guardian, 27 July 2015. [2] Toby Walsh, 2016. “The Singularity May Never Be Near,” arXiv:1602.06462, 20 Feb 2016.

Hidden Expenses Underneath Machine Learning

AI3:::Adaptive Information (Mike Bergman) - Tue, 02/16/2016 - 10:19
Technical Debts Accrue from Dependencies, Adapting to Change, and Maintenance

Machine learning has entered a golden age of open source toolkits and much electronic and labeled data upon which to train them. The proliferation of applications and relative ease of standing up a working instance — what one might call “first twitch” — have made machine learning a strong seductress.

But embedding machine learning into production environments that can be sustained as needs and knowledge change is another matter. The first part of the process means that data must be found (and labeled if using supervised learning) and then tested against one or more machine learners. Knowing how to use and select features plus systematic ways to leverage knowledge bases are essential at this stage. Reference (or “gold”) standards are also essential as parameters and feature sets are tuned for the applicable learners. Only then can one produce enterprise-ready results.

Those set-up efforts are the visible part of the iceberg. What lies underneath the surface, as a group of experienced Google researchers warns us in a recent paper, Hidden Technical Debt in Machine Learning Systems [1], dwarfs the initial development of production-grade results. Maintaining these systems over time is “difficult and expensive”, exposing ongoing requirements as technical debt. Like any kind of debt, these requirements must be serviced, with delays or lack of a systematic way to deal with the debt adding to the accrued cost.

ML code (small black box in middle) is but a fraction of total infrastructure required for machine learning; from [1]

The authors argue that ML installations incur larger than normal technical debt, since machine learning has to be deployed and maintained similar to traditional code, plus the nature of ML imposes additional and unique costs. Some of these sources of hidden cost include:

  • Complex models with indeterminate boundaries — ML learners are entangled with multiple feature sets; changing anything changes everything (CACE) say the authors
  • Costly data dependencies — learning is attuned to the input data; as that data changes, learners may need to be re-trained with generation anew of input feature sets; existing features may cease to be significant
  • Feedback loops and interactions — the best performing systems may depend on multiple learners or less than obvious feedback loops, again leading to CACE
  • Sub-optimal systems — piecing together multiple open source pieces with “glue code” or using multi-purpose toolkits can lead to code and architectures that are not performant
  • Configuration debt — set-up and workflows need to work as a system and consistently, but tuning and optimization are generally elusive to understand and measure
  • Infrastructure debt — efforts in creating standards, testing options, logging and monitoring, managing multiple models, etc., are likely all more demanding than traditional systems, and
  • A constantly changing world — the nature of knowledge is it is always under constant flux. We learn more, facts and data change, new perspectives need to be incorporated, all of which need to percolate through the learning process and then be supported by the infrastructure.

The authors of the paper do not really offer any solutions or guidelines to these challenges. However, highlighting the nature of these challenges — as this paper does well — should forewarn any enterprise considering its own machine learning initiative. These costs can only be managed by anticipating and planning for them, preferably supported by systematic and repeatable utilities and workflows.

I recommend a close read of this paper before budgeting your own efforts.

(Hat tip to Mohan Bavirisetty for posting the paper link on LinkedIn.)

[1] Sculley, D., Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison, 2015. “Hidden Technical Debt in Machine Learning Systems.” In Advances in Neural Information Processing Systems, pp. 2494-2502.

Pulse: The Starting Point for Feature Selection

AI3:::Adaptive Information (Mike Bergman) - Mon, 02/08/2016 - 14:48
A Needed Focus on the Inputs to Machine Learners

Features are the inputs to machine learners. The outputs of machine learners are predictions of outcomes, based on an inferred (or “learned”) model or function. In image recognition, as an example, the inputs are the characteristics of pixels and those adjacent to them; the output may be a prediction there is an image representation of “cat”. In NLP, as another case, the input might be the text, title and URL of emails; the output may be a prediction of “spam”. If we treat all ML learners as black boxes, features are what is fed to the box, and predicted labels or structures are what comes out.

As I recently argued, the importance of features has been overlooked in comparison to the choice of machine learners or how to lower the costs and efforts of labeling and creating training sets and standards. The complete picture needs to include feature extraction, feature selection, and feature engineering.

A recent review paper helps redress this imbalance. Feature Selection: A Data Perspective [1], surveys and provides a comprehensive and well-organized overview of recent advances in feature selection research. According to the authors, Li et al., “the objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and helping prepare, clean, and understand data.” The practical 73-page review is accompanied by an open-source feature selection library that consists of most of the popular feature selection algorithms covered in the review, and a comprehensive performance analysis of the methods and their results.

The first nine pages of the review are devoted to a broad, accessible overview. The intro provides a clear explanation of features and their role in feature selection. It also explains why the high-dimensionality of features is a challenge in its own right.

The bulk of the document is devoted to a discussion of the various methods used in feature selection, organized according to:

  • generic data
  • structure features
  • heterogeneous data, and
  • streaming data.

Each of the methods is characterized as to whether it is applicable to supervised or unsupervised learning. While I have used a different classification of the feature space, that does not affect the usefulness of Li et al.’s [1] approach. Also, in keeping with a review article, there are more than 11 pages of references containing nearly 150 citations.

The combined review nature of the paper also means that various methods have been reduced to a common symbol set, which is a handy way to relate available features to multiple learners. This common treatment enables the authors to create the open source repository, scikit-feast, written in Python and available from Github, that provides a library of 25 of the methods covered. A separate Web site presents some test datasets and performance results. Here is one example of many of the available results:

This paper deserves a permanent place on anyone’s resource shelf who has a serious interest in machine learning. I would not be surprised to see the authors’ organizational structure of feature selection methods become a standard. It is always a pleasure to encounter papers that are well-written, understandable and comprehensive. Great job!

[1] Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, Huan Liu, 2016. “Feature Selection: A Data Perspective,” arXiv:1601.07996, 29 Jan 2016.

Pulse: The Biggest of the Big Pictures on Machine Learning

AI3:::Adaptive Information (Mike Bergman) - Mon, 02/01/2016 - 17:53
A Great Introduction to ML and Its Roots

I have to admit the first I heard the title of Pedro Domingos‘ recent book, The Master Algorithm, I was off-put, similar to the way I react negatively to the singularity made famous by Ray Kurzweil. I don’t tend to buy into single answers or theories of everything.

But as a recent talk by Domingos at Google shows, he has much more insight to share about the roots and “tribes” associated with machine learning. If you are new to ML and want to learn more about the big picture underlying its main approaches and tenets, the hour spent watching this video will prove valuable:

The strength of the talk is to describe what Domingos calls the five “tribes” underlying machine learning and the lead researchers, premises and approaches underlying each:

  • Symbolists — based in logic, this approach attempts to model the composition of knowledge by inverting the deductive process
  • Connectionists — also known as neural networks or deep learning, this mindset is grounded most in trying to mimic how the brain actually works
  • Evolutionists — the biological evolution of life of mixing genes through reproduction as altered by mutations and cross-overs guides these genetic algorithms
  • Bayesians — since the world is uncertain, likely outcomes are guided by statistical probabilities, which also change as new evidence is constantly brought to bear
  • Anagolizers — this tribe attempts to reason by analogy by looking for similarities to examples or closely related factors.

You can also see the slides here to Domingos’ talk.

As Domingos emphasizes, each of these approaches has its applications, strengths and weaknesses. He posits there are shared aspects and generalities underlying all of these methods that can help point the way to perhaps more universal approaches, the master algorithm.

I have argued elsewhere about the importance of knowledge bases to recent AI breakthroughs more than algorithms, but ultimately, of course, specific calculation methods need to underpin any learning approach. Though I’m not convinced there is a “master” algorithm, there is also great value in understanding the premises and mindsets behind these main approaches to machine learning.


If Big Data is One Answer to AI, What is the Question?

AI3:::Adaptive Information (Mike Bergman) - Wed, 01/27/2016 - 15:51
Part II in Our Series on the Resurgency of Artificial Intelligence

In Part I of this series we pointed to the importance of large electronic knowledge bases (Big Data) in the recent advances in AI for knowledge- and text-based systems. Amongst other factors such as speedy GPUs and algorithm advances, we noted that electronic knowledge bases are perhaps the most important factor in the resurgence of artificial intelligence.

But the real question is not what the most important factor in the resurgence of AI may be — though the answer to that points to vetted, reference training sources. The real question is: How can we systematize this understanding by improving the usefulness of knowledge bases to support AI machine learning? Knowing that knowledge bases are important is not enough. If we can better understand what promotes — and what hinders — KBs for machine learning, perhaps we can design KB approaches that are even quicker and more effective for AI. In short, what should we be coding to promote knowledge-based artificial intelligence (KBAI)?

Why is There Not a Systematic Approach to Using KBs in AI?

To probe this question, let’s take the case of Wikipedia, the most important of the knowledge bases for AI machine learning purposes. According to Google Scholar, there have been more than 15,000 research articles relating Wikipedia to machine learning or artificial intelligence [1]. Growth in articles noting the use of Wikipedia for AI has been particularly strong in the past five years [1].

But two things are remarkable about this use of Wikipedia. First, virtually every one of these papers is a one-off. Each project stages and uses Wikipedia in its own way and with its own methodology and focus. Second, Wikipedia is able to make its contributions despite the fact there are many weaknesses and gaps within the knowledge base itself. Clearly, despite weaknesses, the availability of such large-scale knowledge in electronic form still provides significant advantages to AI and machine learning in the realm of natural language applications and text understanding. How much better might our learners be if we fed them more complete and coherent information?

Readers of this blog will be familiar with my periodic criticisms of Wikipedia as a structured knowledge resource [2]. These criticisms are not related to the general scope and coverage of Wikipedia, which, overall, is remarkable and unprecedented in human endeavor. Rather, the criticisms relate to the use of Wikipedia as is for knowledge representation purposes. To recap, here are some of the weaknesses of Wikipedia as a knowledge resource for AI:

  • Incoherency — the category structure of Wikipedia is particularly problematic. More than 60% of existing Wikipedia categories are not true (or “natural”) categories at all, but represent groupings more of convenience or compound attributes (such as Films directed by Pedro Almodóvar or Ambassadors of the United States to Mexico) [3]
  • Incomplete structure — attributes, as presented in Wikipedia’s infoboxes, are incomplete within and across entities, and links also have gaps. Wikidata offers promise to help bring greater consistency, but much will need to be achieved with bots and provenance remains an issue
  • Incomplete coverage — the coverage and scope of Wikipedia are spotty, especially across language versions, and in any case the entities and concepts covered need to meet Wikipedia’s notability guidelines. For much domain analysis, Wikipedia’s domain coverage is inadequate. It would also be helpful if there were ways to extend the KB’s coverage for local or enterprise purposes
  • Inaccuracies — actually, given its crowdsourced nature, popular portions of Wikipedia are fairly vetted for accuracy. Peripheral or stub aspects of Wikipedia, however, may retain inaccuracies of coverage, tone or representation.

As the tremendous use of Wikipedia for research shows, none of these weaknesses is fatal, and none alone has prevented meaningful use of the knowledge base. Further, there is much active research in areas such as knowledge base population [4] that promise to aid solutions to some of these weaknesses. The recognition of the success of knowledge bases to train AI machine learners is also now increasing awareness that KB design for AI purposes is a worthwhile research topic in its own right. Much is happening leveraging AI in bot designs for both Wikipedia and Wikidata. A better understanding of how to test and ensure coherency, matched with a knowledge graph for inferencing and logic, should help promote better and faster AI learners. The same techniques for testing consistency and coherence may be applied to mapping external KBs into such a reference structure.

Thus, the real question again is: How can we systematize the usefulness of knowledge bases to support AI machine learning? Simply by asking this question we can alter our mindset to discover readily available ways to improve knowledge bases for KBAI purposes.

Working Backwards from the Needs of Machine Learners

The best perspective to take on how to optimize knowledge bases for artificial intelligence derives from the needs of the machine learners. Not all individual learners have all of these needs, but from the perspective of a “platform” or “factory” for machine learners, the knowledge base and supporting structures should consider all of these factors:

  • features — are the raw input to machine learners, and may be evident, such as attributes, syntax or semantics, or may be hidden or “latent” [5]. A knowledge base such as Wikipedia can expose literally hundreds of different feature sets [5]. Of course, only a few of these feature types are useful for a given learning task, and many duplicate ones provide nearly similar “signals”. But across multiple learning tasks, many different feature types are desirable and can be made available to learners
  • knowledge graph — is the schematic representation of the KB domain, and is the basis for setting the coherency and logic and inference structure for the represented knowledge. The knowledge graph, provided in the form of an ontology, is the means by which logical slices can be identified and cleaved for such areas as entity type selection or the segregation of training sets. In situations like Wikipedia where the existing category structure is often incoherent, re-expressing the existing knowledge into an existing and proven schema is one viable approach
  • positive and negative training sets — for supervised learning, positive training sets provide a group of labeled, desired outputs, while negative training sets are similar in most respects but do not meet the desired conditions. The training sets provide the labeled outputs to which the machine learner is trained. Provision of both negative and positive sets is helpful, and the accuracy of the learner is in part a function of how “correct” the labeled training sets are
  • reference (“gold”) standards — vetted reference results, which are really validated training sets and therefore more rigorous to produce, are important to test the precision, recall and accuracy of machine learners [6]. This verification is essential during the process of testing the usefulness of various input features as well as model parameters. Without known standards, it is hard to converge many learners for effective predictions

  • keeping the KBs current — the nature of knowledge is that is it is constantly changing and growing. As a result, knowledge bases used in KBAI are constantly in flux. The restructuring and feature set generation from the knowledge base must be updated on a periodic basis. Keeping KBs current means that the overall process of staging knowledge bases for KBAI purposes must be made systematic through the use of scripts, build routines and validation tests. General administrative and management capabilities are also critical, and
  • repeatability — all of these steps must be repeatable, since new iterations of the knowledge base must retain coherency and consistency.

Thus, an effective knowledge base to support KBAI should have a number of desirable aspects. It should have maximum structure and exposed features. It should be organized by a coherent knowledge graph, which can be effectively sliced and reasoned over. It must be testable via logic and consistency and performance tests, such that training and reference sets may be vetted and refined. And it must have repeatable and verifiable scripts for updating the underlying KB as it changes and to generate new, working feature sets. Moreover, means for generally managing and manipulating the knowledge base and knowledge graph are important. These desirable aspects constitute a triad of required functionality.

Guidance for a Research Agenda

Achieving a purposeful knowledge base for KBAI uses is neither a stretch nor technically risky. We see the broad outlines of the requirements in the discussion above.

Two-thirds of the triad are relatively straightforward. First, creating a platform for managing the KBs is a fairly standard requirement for knowledge and semantics purposes; many platforms presently exist that can access and manage knowledge graphs, knowledge bases, and instance data at the necessary scales. Second, the build and testing scripts do require systematic attention, but these are also not difficult tasks and are quite common in many settings. It is true that build and testing scripts can often prove brittle, so care needs to be placed into their design to facilitate maintainability. Fortunately, these are matters mostly of proper focus and good practice, and not conceptually difficult.

The major challenges reside in the third leg of the triad, namely in the twin needs to map the knowledge base into a coherent knowledge graph and into an underlying speculative grammar that is logically computable to support the expression of both feature and training sets. A variety of upper ontologies and lexical frameworks such as WordNet have been tried as the guiding graph structures for knowledge bases [7]. To my knowledge, none of these options has been scrutinized with the specific requirements of KBAI support in mind. With respect to the other twin need, that of a speculative grammar, our research to date [8] points to the importance of segregating the KB information into topics (concepts), relations and relation types, entities and entity types, and attributes and attribute types. The possible further distinctions, however, into possibly roles, annotations (metadata), events and expressions of mereology still require further research. The role, placement and use of rules also remain to be determined.

You can see Part I of
this series here. [1] See, for example, this Google Scholar query: https://scholar.google.com/scholar?q=wikipedia+%22machine+learning%22+OR+%22artificial+intelligence%22. Growth data may be obtained by annual date range searches. [2] Over the years I have addressed this topic in many articles. A recent sampling from my AI3:::Adaptive Information blog is “Shaping Wikipedia into a Computable Knowledge Base,” (March 31, 2015); and “Creating a Platform for Knowledge-based Machine Intelligence” (September 21, 2015) . [3] See, for example, M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web,” AI3:::Adaptive Information blog, July 13, 2015. [4] Knowledge base population, or KBP, first became a topic of research with the track by the same name starting at the Text Analysis Conference sponsored by NIST in 2009. The workshop track has continued annually ever since with greater prominence, and has been the initiative of many projects mining open sources for facts and assertions. [5] See, for example, M.K. Bergman, 2015. “‘A (Partial) Taxonomy of Machine Learning Features,” AI3:::Adaptive Information blog, November 23, 2015. [6] See, for example, M.K. Bergman, 2015. “‘A Primer on Knowledge Statistics,” AI3:::Adaptive Information blog, March 18, 2015. [7] See, for example, M.K. Bergman, 2011. “‘In Search of ‘Gold Standards’ for the Semantic Web,” AI3:::Adaptive Information blog, February 28, 2011. [8] Two primary articles that I have written on my AI3:::Adaptive Information blog on the information structure of knowledge bases bear on this question; see “Creating a Platform for Knowledge-based Machine Intelligence” (September 21, 2015), and “Conceptual and Practical Distinctions in the Attributes Ontology” (March 3, 2015).

Why the Resurgence in AI?

AI3:::Adaptive Information (Mike Bergman) - Mon, 01/25/2016 - 16:48
Artificial Intelligence is in Bloom; But it Was Not Always So

Anyone beyond a certain age may recall the waning and waxing of the idea of AI, artificial intelligence. In fact, the periodic dismal prospects and poor reputation of artificial intelligence have been severe enough at times so as to warrant its own label: the “AI winters.” Clearly, today, we are in a resurgence of AI. But why is this so? Is the newly re-found popularity of AI merely a change in fashion, or is it due to more fundamental factors? And if it is more fundamental, what might those factors be that have led to this resurgence?

We only need to look at the world around us to see that the resurgence in AI is due to real developments, not a mere change in fashion. From virtual assistants that we can instruct or question by voice command to self-driving cars and face recognition, many mundane or tedious tasks of the past are being innovated away. The breakthroughs are real and seemingly at an increasing pace.

As to the reasons behind this resurgence, more than a year ago, the technology futurist Kevin Kelly got it mostly right when he posited these three breakthroughs [1]:

1. Cheap parallel computation Thinking is an inherently parallel process, billions of neurons firing simultaneously to create synchronous waves of cortical computation. To build a neural network—the primary architecture of AI software—also requires many different processes to take place simultaneously. Each node of a neural network loosely imitates a neuron in the brain—mutually interacting with its neighbors to make sense of the signals it receives. To recognize a spoken word, a program must be able to hear all the phonemes in relation to one another; to identify an image, it needs to see every pixel in the context of the pixels around it—both deeply parallel tasks. But until recently, the typical computer processor could only ping one thing at a time. . . . That began to change more than a decade ago, when a new kind of chip, called a graphics processing unit, or GPU, was devised for the intensely visual—and parallel—demands of videogames . . . . 2. Big Data Every intelligence has to be taught. A human brain, which is genetically primed to categorize things, still needs to see a dozen examples before it can distinguish between cats and dogs. That’s even more true for artificial minds. Even the best-programmed computer has to play at least a thousand games of chess before it gets good. Part of the AI breakthrough lies in the incredible avalanche of collected data about our world, which provides the schooling that AIs need. Massive databases, self-tracking, web cookies, online footprints, terabytes of storage, decades of search results, Wikipedia, and the entire digital universe became the teachers making AI smart. 3. Better algorithms Digital neural nets were invented in the 1950s, but it took decades for computer scientists to learn how to tame the astronomically huge combinatorial relationships between a million—or 100 million—neurons. The key was to organize neural nets into stacked layers. Take the relatively simple task of recognizing that a face is a face. When a group of bits in a neural net are found to trigger a pattern—the image of an eye, for instance—that result is moved up to another level in the neural net for further parsing. The next level might group two eyes together and pass that meaningful chunk onto another level of hierarchical structure that associates it with the pattern of a nose. It can take many millions of these nodes (each one producing a calculation feeding others around it), stacked up to 15 levels high, to recognize a human face. . . .

To these factors I would add a fourth: 4. Distributed architectures (beginning with MapReduce) and new performant datastores (NoSQL, graph DBs, and triplestores). These new technologies, plus some rediscovered, gave us the confidence to tackle larger and larger reference datasets, while also helping us innovate high-performance data representation structures, such as graphs, lists, key-value pairs, feature vectors and finite state transducers. In any case, Kelly also notes the interconnection amongst these factors in the cloud, itself a more general enabling factor. I suppose, too, one could add open source to the mix as another factor.

Still, even though these factors have all contributed, I have argued in my series on knowledge-based artificial intelligence (KBAI) the role of electronic data sets (Big Data) as the most important enabling factor [2]. These reference datasets may range from images for image recognition (such as ImageNet) to statistical compilations from text (such as N-grams or co-occurrences) to more formal representations (such as ontologies or knowledge bases). Knowledge graphs and knowledge bases are the key enablers for AI in the realm of knowledge management and representation.

Some also tout algorithms as the most important source of AI innovation, but Alexander Wissner-Gross in the Edge online magazine comes down squarely on the side of data in AI as the most interesting news in recent science [3]:

. . . perhaps many major AI breakthroughs have actually been constrained by the availability of high-quality training datasets, and not by algorithmic advances. For example, in 1994 the achievement of human-level spontaneous speech recognition relied on a variant of a hidden Markov model algorithm initially published ten years earlier, but used a dataset of spoken Wall Street Journal articles and other texts made available only three years earlier. In 1997, when IBM’s Deep Blue defeated Garry Kasparov to become the world’s top chess player, its core NegaScout planning algorithm was fourteen years old, whereas its key dataset of 700,000 Grandmaster chess games (known as the “The Extended Book”) was only six years old. In 2005, Google software achieved breakthrough performance at Arabic- and Chinese-to-English translation based on a variant of a statistical machine translation algorithm published seventeen years earlier, but used a dataset with more than 1.8 trillion tokens from Google Web and News pages gathered the same year. In 2011, IBM’s Watson became the world Jeopardy! champion using a variant of the mixture-of-experts algorithm published twenty years earlier, but utilized a dataset of 8.6 million documents from Wikipedia, Wiktionary, Wikiquote, and Project Gutenberg updated one year prior. In 2014, Google’s GoogLeNet software achieved near-human performance at object classification using a variant of the convolutional neural network algorithm proposed twenty-five years earlier, but was trained on the ImageNet corpus of approximately 1.5 million labeled images and 1,000 object categories first made available only four years earlier. Finally, in 2015, Google DeepMind announced its software had achieved human parity in playing twenty-nine Atari games by learning general control from video using a variant of the Q-learning algorithm published twenty-three years earlier, but the variant was trained on the Arcade Learning Environment dataset of over fifty Atari games made available only two years earlier. Examining these advances collectively, the average elapsed time between key algorithm proposals and corresponding advances was about eighteen years, whereas the average elapsed time between key dataset availabilities and corresponding advances was less than three years, or about six times faster, suggesting that datasets might have been limiting factors in the advances.

Seeing these correlations only affirms the importance of looking at knowledge bases from the specific lens of how they may best support training AI machine learners. We see the correlation; it is now time to optimize the expression of these KB potentials. We need to organize the KBs via coherent knowledge graphs and express the KBs in types, entities, attributes and relations representing their inherent, latent knowledge structure. Properly expressed KBs can support creating positive and negative training sets, promote feature set generation and expression, and create reference standards for testing AI learners and model parameters.

Past AI winters arose from lofty claims that were not then realized. Perhaps today’s claims may meet a similar fate.

Yet somehow I don’t think so. The truth is, today, we are seeing rapid progress in AI tasks of increasing usefulness and value all around us. The benefits from what will continue to be seen as ubiquitous AI should now ensure an economic and innovation engine behind AI for many years to come. One way that the AI engine will continue to be fueled is through a systematic understanding of how knowledge bases and their features can work hand in hand with machine learning to more effectively automate and meet our needs.

You can see Part II of
this series here. [1] Kevin Kelly, 2014. “The Three Breakthroughs That Have Finally Unleashed AI on the World,” in Wired.com, October 27, 2014. [2] For example, from the perspective of hardware, see Jen-Hsun Huang, 2016. “ Accelerating AI with GPUs: A New Computing Model,” Nvidia blog, January 12, 2016. [3] Alexander Wissner-Gross, 2016. “2016: What Do You Consider the Most Interesting Recent (Scientific) News? What Makes It Important?: Datasets Over Algorithms,” Edge.org, January 2, 2016.
Syndicate content