AI3:::Adaptive Information (Mike Bergman)

Syndicate content
Articles on semantic technologies and KBAI (knowledge-based artificial intelligence)
Updated: 9 hours 10 min ago

KBpedia v 210 Brings 98% Coverage to Wikidata

Mon, 04/15/2019 - 06:27
New Release Includes Manually Vetted Wikidata Mapping

One of the reasons for releasing KBpedia as open source last October was the emerging usefulness of one its main constituent knowledge bases, Wikidata. Wikidata now contains about 45 million useful entities and concepts (so-called Q identifers) and more than a quarter billion data assertions across scores of languages [1]. Many of the efforts undertaken for KBpedia’s open-source release and others since then have been to increase coverage of Wikidata in KBpedia [2]. With the release of KBpedia v 2.10, we have extended the mappings to Wikidata instances to more than 98%. We also have increased coverage of other aspects of structure and properties within Wikidata to very high percentages. In this version 2.10 release we also manually inspected all 45,000 mappings of KBpedia reference concepts to Wikidata instances, resulting in many changes and improvements. The quality of mappings in KBpedia has never been higher.

KBpedia, as you recall, is a computable knowledge graph that sits astride Wikipedia and Wikidata and other leading knowledge bases. Its baseline 55,000 reference concepts provide a flexible and expandable means for relating your own data records to a common basis for reasoning and inferring logical relations and for mapping to virtually any external data source or schema. The framework is a clean starting basis for doing knowledge-based artificial intelligence (KBAI) and to train and use virtual agents. KBpedia combines seven major public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL. KBpedia supplements these core KBs with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for KBAI tasks.

KBpedia is also a comprehensive knowledge structure for promoting data interoperability. KBpedia’s upper structure, the KBpedia Knowledge Ontology (KKO), is based on the universal categories and knowledge representation theories of the great 19th century American logician, philosopher, polymath and scientist, Charles Sanders Peirce. This design provides a logical and coherent underpinning to the entire KBpedia structure. The design is also modular and fairly straightforward to adapt to enterprise or domain purposes. KBpedia provides a powerful reference scaffolding for bringing together your own internal data stovepipes into a comprehensive whole. KBpedia, and extensions specific to your own domain needs, can be deployed incrementally, gaining benefits each step of the way, until you have a computable overlay tieing together all of your valuable information assets.

Major Activities for Version 2.10

Almost all efforts related to KBpedia v 2.10 were focused on Wikidata, though, with their close alliance, many changes also were reflected to the Wikipedia mappings. As noted with the v 2.00 release, the first effort we had was to map Q items (IDs) that have much instance coverage, but were lacking in prior mappings. This attention resulted in adding a net 973 Q IDs to KBpedia. This number is a bit misleading, however, since in the manual inspection phases many duplicates were removed from the system (approx. 2100) and earlier mappings to category Q IDs (approx. 2700) were upgraded to their more specific Q ID instance. Thus, nearly 6,000 Q IDs are now different in this version compared to the prior version 2.00. Since many of the Q IDs also have a direct mapping to a Wikipedia counterpart, these mappings were updated as well. Besides incidental improvements to definitions, linkages and labels that arise when doing such inspections, which were also attended to whenever encountered, no further major changes were made to this newest release.

We are now in very good shape with respect to our mapping and coverage of Wikidata (with a similar profile for Wikipedia). Across a breadth of measures, here is now where we stand with respect to Wikidata coverage [3], with implementation notes provided in the endnotes section:

Wikidata Item No. Items No. Mapped Items Coverage [3] Q IDs 45,306,576 45,882 00.1% [4] Q instances 45,306,576 44,458,015 98.1% [4] Q classes 2,493,795 2,312,116 92.7% [5] Properties 5,910 3,970 67.2% [6] P Statements 256,298,963 246,055,199 96.0% [7] P Qualifiers 38,866,255 31,756,937 81.7% [7,8] P References 24,582,259 20,121,794 81.9% [7,9]

One of the first observations that jumps out of the table is how relatively few mappings (~ 45 K, or 0.1%) are sufficient to capture nearly all (98%) of the instances contained in Wikidata. This is because a Q ID may be an individual instance or a parent to multiple instances. The KBpedia mappings focus on the parents, through which the individual instances may be obtained. By virtue of the additions and Q mapping improvements in this version, KBpedia has expanded its instance reach from about 30 million entities to now 45 million entities.

Another observation is that we are also capturing a significant portion of the structure of Wikidata (93%) as provided by the mappings to Q IDs with significant subClassOf connections (P279), which is where the taxonomy of the knowledge base is defined. A third summary observation is that we have similarly high levels of coverage to Wikidata properties. However, at present, this is the least developed area of KBpedia with respect to use cases or cross-knowledge base mappings.

A minor change, but useful to the KBpedia Web site, has been our downgrading of the OpenCyc and UMBEL mapped items. They are still mapped in the knowledge structure, but the Web site removes their links in order to highlight the most popular knowledge bases.

Despite these upgrades and enhancements, the coverage of KBpedia in my new book, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (Springer), remains current. The book emphasizes theory, architecture and design, which remains unchanged in this current new release of KBpedia. Also note that future areas of improvement were listed in the KBpedia v 2.00 release notice.

Getting the System

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce’s prescient theories of knowledge representation.

Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reach-throughs are straightforward to construct.)

Here are the various KBpedia resources that you may download or use for free with attribution:

  • The complete KBpedia v 210 knowledge graph (8.5 MB, zipped). This download is likely your most useful starting point
  • KBpedia’s upper ontology, KKO (332 KB), which is easily inspected and navigated in an editor
  • The annotated KKO (321 KB). This is NOT an active ontology, but is has the upper concepts annotated to more clearly show the Peircean categories of Firstness (1ns), Secondness (2ns), and Thirdness (3ns)
  • The 68 individual KBpedia typologies in N3 format
  • The KBpedia mappings to the seven core knowledge bases and the additional extended knowledge bases in N3 format
  • A version of the full KBpedia knowledge graph extended with linkages to the external resources (10.5 MB, zipped), and
  • A version of the full KBpedia knowledge graph extended with inferences and linkages (14.7 MB, zipped).

The last two resources require time and sufficient memory to load. We invite and welcome contributions or commentary on any of these resources.

All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. KBpedia’s development to date has been sponsored by Cognonto Corporation. We welcome suggestions for further enhancements or tackling your own improvements. Please let me know what ideas you may have.

Notes [1] Useful mappings exclude mappings to internal Wikimedia sources (such as templates, categories, or infoboxes on Wikipedia and Wikidata) and scholarly articles (linked in other manners). There are about 45 million ‘useful’ records in the current Wikipedia based on these filters. [2] ‘Coverage’ is understood to be the percentage of useful instances in a source knowledge base to KBpedia that are actually mapped to a specific KBpedia reference concept or property. These source instances are not included in the KBpedia distribution. They are accessed from the source knowledge base directly. Manipulation of the KBpedia knowledge graph results in the identification of this external source data. [3] The number of items shown for Wikidata does not reflect the total items on the service, but only those that are useful and relevant after administrative categories and such are removed. [4] See the text where we describe how choosing to map appropriate structural nodes in Wikidata, which themselves have many child instances, leads to large percentage coverage of all available instances. Instance relationships are obtained from the P31 Wikidata property. The Q IDs were obtained from a Feb 19, 2019 Wikidata retrieval. [5] Like the instance (P31) retrievals, the subClassOf (P279) data was obtained by a SPARQL query to the Wikidata query endpoint. Try it! [6] The properties data was obtained from the SQID Wikidata service on April 4, 2019. Note, if you try this link, be patient for all of the data to load. [7] A Wikidata statement pairs a property with a value for a given entity. It is equivalent to an assertion. It is the most basic factual statement in Wikidata. [8] Wikidata qualifiers allow statements to be expanded on, annotated, or contextualized beyond what can be expressed in just a simple property-value pair. [9] Wikidata references are used to point to specific sources that back up the data provided in a statement.

A Knowledge Representation Practionary Now in Paperback

Tue, 04/09/2019 - 20:33
$25 to SpringerLink Subscribers

After resolving some technical glitches, Springer has finally made available my new book, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce, in paperback form. The 464 pp book is available for $24.99 under Springer’s MyCopy program.

MyCopy is available to SpringerLink subscribers who have access to the computer science collection. Most universities and larger tech firms and knowledge organizations are current subscribers. You should be able to login to your local library, go to SpringerLink, and then search for A Knowledge Representation Practionary. If you are a qualified subscriber, you will see the image to the right on the results page. (If you are not a subscriber, you should be able to find a friend or colleague who is and repeat this process using their account.) After choosing to buy, you will be guided through the standard transaction screens.

As Springer states, “MyCopy books are only offered to patrons of a library with access to at least one Springer Nature eBook subject collection and are strictly for individual use only.” 

Though only printed in monochrome, my figures render well and the quality is quite high. The cover is in color and the paper quality is high. I waited to get a copy myself before I could recommend it. I think the overall quality is quite good.

To learn more about my AKRP book, including a listing of the table of contents, see the initial announcement. Also, of course, Springer subscribers have access to the free eBook, and hardcopy and other versions are available. Unfortunately, I think prices are unreasonably high for non-Springer subscribers. Please let me know if you need to find a cheaper alternative.

BTW, if you read and like my book (or even otherwise!), I encourage you to provide your rating to Goodreads.

A Walk Around the Block with KBpedia

Mon, 03/25/2019 - 03:37
After First Twitch, How to Learn More About KBpedia

My last installment in this introductory series to KBpedia discussed loading the main knowledge graph into an editor for inspection and navigation. I characterized this look as a ‘first twitch.’ Now, I’d like to accompany that view with a walk around KBpedia to gain more perspective on its use and purpose.

The standard introduction to KBpedia on its Web site states:

KBpedia is a comprehensive knowledge structure for promoting data interoperability and knowledge-based artificial intelligence, or KBAI. The KBpedia knowledge structure combines seven ‘core’ public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL — into an integrated whole. KBpedia’s upper structure, or knowledge graph, is the KBpedia Knowledge Ontology. We base KKO on the universal categories and knowledge representation theories of the great 19th century American logician, polymath and scientist, Charles Sanders Peirce.

KBpedia, written primarily in OWL 2, includes 55,000 reference concepts, mapped linkages to about 32 million entities (most from Wikidata), and 5,000 relations and properties, all organized according to about 70 modular typologies that can be readily substituted or expanded. We test candidates added to KBpedia using a rigorous (but still fallible) suite of logic and consistency tests — and best practices — before acceptance. The result is a flexible and computable knowledge graph that can be sliced-and-diced and configured for all sorts of machine learning tasks, including supervised, unsupervised and deep learning.

So, short of loading and implementing the system locally, let me outline the online resources that can help explain KBpedia more.

One Kind of Demo

One of the uses — among potentially dozens — of a controlled vocabulary is to tag content for consistent characterization and categorization. In the example below, found off of the Demo link on KBpedia.org, I have submitted my last introductory blog post on KBpedia to the online demo system:

Figure 1: KBpedia Demo Screen

The tagger uses the 55,000 concepts in KBpedia to tag content. One can also see a listing of entities and various analysis of the content. A general tutorial covers the use and interpretation of the online demo function.

Explore the Knowledge Graph

The main online resource for KBpedia is the knowledge graph itself. The knowledge graph may be navigated and searched, including with advanced search functions. The basic knowledge graph is organized under the upper-level KBpedia Knowledge Ontology, or KKO. KKO is itself organized according to about 80 typologies of similar concepts and entities, mostly organized as being distinct from one another. The high-level view of this knowledge graph is shown by:

Figure 2: Upper-level of the KBpedia Knowledge Graph

There are two detailed use guides and tutorials governing how to navigate and search the graph. The first is How to Use the Knowledge Graph and the second is Knowledge Graph Search. These provide insight into the various capabilities of the online knowledge graph and its search functions.

Check Out the Use Cases

Of course, tagging is not the only function available for leveraging the KBpedia knowledge graph. Remember, the three principal purposes for KBpedia are to support general knowledge management; to aid data interoperability; and to be a foundational basis for knowledge-based artificial intelligence (KBAI). These purposes are described more fully in a series of use cases.

Knowledge Graph

These are the KBpedia use cases related to the use and browsing of the knowledge graph:

Machine Learning Use Cases

KBpedia, combined with your own schema and data, can provide a nearly automated foundation for creating trainng corpuses and training sets for deep learning, unsupervised learning, and supervised learning. Further, these same selection capabilities, combined with the logical basis of the KBpedia knowledge graph, also aid the creation of reference standards. Reference standards are essential for tuning analysis parameters to obtain the best results for your tagging or categorization efforts. Tuning parameters are integral to most forms of natural language processing and for supervised learning.

Mapping Use Cases

Mapping is an essential consideration for two reasons. The first rationale for mapping is to create a consistent and coherent knowledge structure over which to reason and conduct machine learning. The second rationale is to consolidate local information resources, what is known as data integration. The easiest way to meet both of these rationales is to leverage off an existing knowledge structure, which itself is already proven to be logically consistent. This is the role that KBpedia plays. When extended with your local concepts and terminology, your enterprise extension of KBpedia now itself becomes a consistent structure for learning, tagging and categorization. Here are some of the current use cases published for KBpedia relevant to mapping:

Learn More

The Resources tab on the KBpedia Web site provides additional documentation on the system. Specifically, under Additional Documentation are these useful sources:

Upon review of the materials above, you should be well-armed to understand more about KBpedia and its purposes. Beyond these online resources, the next steps are to download and install the system and begin work with it!

First Twitch with KBpedia

Wed, 02/27/2019 - 06:03
Here’s the Best Starting Point to Learn About the Knowledge Graph

I’ve always favored a way to describe the first successful operation of a computer program as ‘first twitch.’ It brings to mind the stirrings of Dr. Frankenstein’s monster when first zapped with electricity. It is the moment when we first see how something works, and that it might continue to work into the future.

The ‘monster’ on the table for today’s exercise is KBpedia, which I have been talking about much recently since we released it as open source in late 2018. KBpedia is a computable knowledge graph that sits astride Wikipedia and Wikidata and other leading knowledge bases. Its baseline 55,000 reference concepts provide a flexible and expandable means for relating your own data records to a common basis for reasoning and inferring logical relations and for mapping to virtually any external data source or schema. The framework is an increasingly clean starting basis for doing knowledge-based artificial intelligence (KBAI) and to train and use virtual agents. KBpedia combines seven major public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL.

It is mighty hard to describe knowledge graphs or ontologies in the abstract. This difficulty is especially hard when trying to convey a knowledge graph to someone who is not familiar with semantic technologies and tools. Thus, while in subsequent posts over the coming weeks I will dive into more details, my purpose today is to help those with little background to put in in place a sufficient basis to gain their own ‘first twitch’ for KBpedia. We will do so with only a small portion of the KBpedia distribution package and an open-source tool called Protégé. Hopefully, with about 15-30 min of effort, you can set up your own local environment to get KBpedia to twitch. (Five min if you already have Protégé installed.)

It’s alive!

Getting Set Up

We will begin our familiarization using only two files from the open source KBpedia distribution package. First, go to the KBpedia GitHub repository and download the two files of the kko.n3 and kbpedia_reference_concepts.zip. In the case of kko.n3, which is the small upper ontology for KBpedia, you will copy-and-paste the code to a local file and name it the same. In the case of kbpedia_reference_concepts.zip, which contains the main substance of KBpedia, you should download the file and then unzip it in a directory you can find on your local machine. The unzipped file is called kbpedia_reference_concepts.n3. For simplicity, put this and kko.n3 file into the same directory. (In our production settings we use multiple sub-directories.)

Second, you will need to download and install Protégé, an open-source ontology development framework with more than 300,000 users. (There are other ontology viewers or development frameworks that can readily run KBpedia, but Protégé is the most widely used, free one.) Go to the Protégé download page and follow the instructions for your particular operating system. You should fill out the new user registration (though you can claim you are already registered and still download it directly). The version I installed for this example is version 5.5 beta (though any of the version 5.2 forward should be fine as well.) The Protégé distribution comes as a zip file, so you should unzip it into a directory of your choice. To complete the set-up you will also need the most recent version of Java installed on your machine; it you do not have it, here are installation instructions.

Next, to start up Protégé, invoke the executable in your Protégé directory. It will take a few seconds for the program to load. Once the main screen appears, go to File and then Open, and then navigate to the directory to where you stored kbpedia_reference_concepts.n3. Pick that file and click the Open button.

The first time you load KBpedia you are likely to get the following error message:

Figure 1: Possible Error Message Upon Loading

Follow the instructions on the screen to find the second needed file, kko.n3, which I just suggested you store in the same directory. (Once you save your current session, the next time you start up this error will not appear.) Also, next you work with the system, you can open KBpedia by using the File → Open Recent option. Lastly, you may encounter some performance or display issues. I conclude this article with a couple of use tips for Protégé.

Taking the Tour

Upon successful start-up, you will see the Protégé main screen as shown in Figure 2. Let me briefly cover some of the main conventions of the program. The three key structural aspects of the Protégé program are its main menu, its tab structure, and the views (or panes) shown for each tab where it appears on the standard interface ( ⓹ ). At start-up we always begin at the Active ontology tab, for which I highlight some of its key panes and functionality:

Figure 2: Main KBpedia Screen on Protégé

The ontology header section ( ⓵ ) is where all of the metadata for the knowledge graph resides. Such material includes title, creators, version notes and so forth. The metrics for the ontology resides in the second view ( ⓶ ). We see, for example, that this expression of KBpedia has more than 54,000 classes (reference concepts) and more than 5,000 properties. We also see in the third view ( ⓷ ) that KBpedia requires the SKOS and KKO ontology imports. Also note the search button ( ⓸ ), which we will use frequently, and the tab structure and order ( ⓹ ). We will modify that structure in our next set of steps.

Because Protégé, like many integrated development environments (IDEs), is highly configurable, let’s detour for a short step to see how we can modify how our program looks. I am going to delete and add tabs to make the tab structure conform to the remaining screen shots.

To change tabs in Protégé, let’s refer to Figure 3:

Figure 3: Adding Tab Views to Protégé

We effect the general layout of the system using the Window → Tabs option from the main menu. You delete a tab by clicking on the arrow shown for each tab as presented in the standard interface. You add tabs by selecting one of the options in the Tabs menu ( ⓶ ). Note that active tabs are indicated by the checkmark ( ✓ ). New tabs are added to the right of the tab sequence ( ⓷ ). Thus, to change the ordering of tabs, one must delete and then add tabs in the order desired. You can follow these steps if you want the tab ordering to reflect the screen shots below.

[This same main menu Window option is where you can change the views (panes) for each tab. However, we don’t discuss that customization further in this article.]

Discovering and Inspecting Reference Concepts

When your tabs are to your liking, let’s begin inspecting KBpedia itself. Let’s first move to the Classes tab screen, the most important to understanding the hierarchy and structure of KBpedia. Note when we change tabs that the border colors also change. Each tab in Protégé is demarked with its own color.  The actual class structure is shown in the left-hand pane ( ⓵ ) in Figure 4. The tree structure may be expanded or collapsed by clicking on the triangles shown for a given item (items without the triangle are terminal nodes).  The direction the triangle points indicates the expand or collapse mode. Depending on your Protégé settings, the default opening for this tree may be expanded (by levels) or collapsed. What we are showing in Figure 4 is the highest structure of KBpedia, which can also be separately inspected with the kko.n3 file alone. (Peirce scholars and ontologists may prefer to start there.)

Because KBpedia is an organized, computable structure of types (classes), the majority of the items in KBpedia may be found under the SuperTypes branch ( ⓵ ). This is where you will spend most of your time inspecting the existing 55 K reference concepts (RCs).

Another thing to note is the multi-paned structure of the layout ( ⓶ ), which I noted before. These panes are configurable, and may be moved and resized at any location across the tab. Figure 4 is close to the default Protégé settings.

Figure 4: Initial View from the Class Tab

Search ( ⓷ ) is one of the most important functions in the system, since it is the primary way to find specific RCs when there are thousands. Search is also useful for all other information in the system. Given this importance, let’s take another short detour to the search screen. Click search.

That brings up the search screen, as shown in the next Figure 5. There is some interesting functionality here, worth calling out individually. Let’s begin a search for ‘mammal’:

Figure 5: Class View After Doing A ‘Mammal’ Search

As we enter the search term, only ‘mamma’ so far in the case shown, there is a lookahead (auto-complete) function to match the entered text ( ⓵ ), beginning with three characters. It is also important to note there are some pretty powerful search options ( ⓶ ); I often use the Show all results choice, though sometimes lists can grow to be huge! (Using few search characters for common letter combinations, for example).

The search screen organizes its results into multiple categories ( ⓷ ) (scroll down), including descriptions and annotations. The most important matches, namely to preferred labels and IRIs, appear at the top of the listing. It is also possible to highlight results on these lists and create copies ( ⓸ ) for posting to the clipboard. I use this functionality frequently.

Once we have selected ‘Mammal’ from the search results list, the search screen remains open (useful for testing many putative matches), and the tree in the Class view updates and more RC results are automatically displayed, as Figure 6 shows (in this case, I have closed the search screen so as to not obscure the main screen):

Figure 6: Class View After Doing A ‘Mammal’ Search

We now see a much-expanded tree in the left Class hierarchy pane ( ⓵ ). We can again click the triangles to collapse or expand that portion of the tree.

For the selected item in the tree, again ‘Mammal’ in this case, we can see its annotations and linkage relationships ( ⓶ ), including labels, descriptions, notes and links. The Descriptions pane ( ⓷ ) shows us the formal relationships and other assertions for this RC in the knowledge graph. (Since we are not working with all KBpedia files, this portion may not be as complete as when all files are included.)

Thie general process can be repeated over and over to gain an understanding. You can navigate the tree via scrolling and expanding and collapsing nodes, or searching for terms or stems as you encounter then. Of course, both navigation and searching are done concurrently during discovery mode. It is this process, in my view, that best leads to first twitch for KBpedia by better understanding the structure, scope and relationships for the graph’s 55 K reference concepts.

Discovering and Inspecting Properties

These same conventions and approaches may also be used for understanding the properties (relations) in KBpedia, as I show in Figure 7. First, note ( ⓵ ) we have split our properties into three groups: object properties, data properties, and annotation properties:

Figure 7: Initial View from the Object Property Tab

These are the standard splits in the OWL language. How we use these splits and their relation to the guidance of Charles Sanders Peirce is described in another article. In essence, object properties are those that connect to an item (with a URI or IRI) already in the system; data properties are literal strings and descriptions connected to the subject item; and annotation properties are those that describe or point to the item. We’ll just use an object property example here, though the use and navigation applies to the other two property categories as well

The Object properties tab in Figure 7 also has a search function ( ⓶ ), exactly similar to what was described for classes. We also see a tree structure at the left that works the same as for classes ( ⓷ ). However, besides the relations splits due to Peirce, there are two other major property differences for KBpedia compared to most knowledge graphs or ontologies. The first difference is the sheer number of properties, more than 5 K in the case of KBpedia. The second is the logical organization of those properties, beginning with the three splits due to Peirce, but extending down to an emerging, logical hierarchy of property types.

To see some of this, let’s do a search for the property ‘color’ [( ⓶ ) in Figure 7]. The result, again working similar to what we saw for classes, I show in Figure 8:

Figure 8: Object Property View for ‘color’

Like before, we now see an expanded tree highlighting the ‘color’ property ( ⓵ ), again accompanied by metadata and other structural aspects of the Object properties ( ⓶ ).

As before, you can use a combination of scrolling, tree expansions and searching to discover the properties in KBpedia. Do make sure and check out the Data properties and Annotation properties tabs as well.

Performance and Preferences

You very well may experience some performance issues with Protégé as it comes out of the box. One likely cause are the memory settings that you may find in the run.bat file that you can find in the main directory where you installed Protégé. As a quick fix, try updating these settings in that file to these values before the next time you start the application:

-Xmx2500M -Xms2000M

Also note there are many customization options in Protégé. If you get captivated with the tool, I encourage you to explore the plugins available and the ways to modify the application interface. See especially File → Preferences, with the Renderer and Plugin tabs good places to look.

The Framework is a Beginning

These two files are at the core of KBpedia, but do not constitute its entirety. These two files are likely the simplest, adequate representation for entering the KBpedia construct.

I will be talking in coming weeks about additional aspects of discovering what is in KBpedia and how it may be used. We’ll be talking about working with the system, use cases, and how to discover more about the system using the KBpedia Web site. I do think, however, that the basic inspections of the system outlined here are one of the better ways to get familiar and feel a twitch with the system.

There are many moving parts in KBpedia and much interconnection. We are constantly finding and fixing errors in addition to improving the scope of the system. Should you encounter questionable assignments or missing relationships, please do let us know. We welcome all suggestions for improvements and commit to continued quality improvements and releases.

KBpedia v 200 Now Available

Thu, 02/07/2019 - 03:30
Release Constitutes What We Consider As First, Complete Open-Source Baseline

We first released KBpedia as open source in October 2018 with version 1.60. We needed to release it then because of the pending release of my new book, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (Springer), which has liberal ties to KBpedia. We were pleased with that first open-source release of KBpedia, but did not have time to complete our full list of what we considered to be a proper baseline for the initial release. We have spent the past few months completing that list and are now pleased to announce version 2.00 of KBpedia, what we consider to be the first complete, open-source baseline of this knowledge artifact.

KBpedia is a computable knowledge graph that sits astride Wikipedia and Wikidata and other leading knowledge bases. Its baseline 55,000 reference concepts provide a flexible and expandable means for relating your own data records to a common basis for reasoning and inferring logical relations and for mapping to virtually any external data source or schema. The framework is a clean starting basis for doing knowledge-based artificial intelligence (KBAI) and to train and use virtual agents. KBpedia combines seven major public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL. KBpedia supplements these core KBs with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks.

KBpedia is also a comprehensive knowledge structure for promoting data interoperability. KBpedia’s upper structure, the KBpedia Knowledge Ontology (KKO), is based on the universal categories and knowledge representation theories of the great 19th century American logician, philosopher, polymath and scientist, Charles Sanders Peirce. This design provides a logical and coherent underpinning to the entire KBpedia structure. The design is also modular and fairly straightforward to adapt to enterprise or domain purposes. KBpedia provides a powerful reference scaffolding for bringing together your own internal data stovepipes into a comprehensive whole. KBpedia, and extensions specific to your own domain needs, can be deployed incrementally, gaining benefits each step of the way, until you have a computable overlay tieing together all of your valuable information assets.

Major Activities to Complete the Baseline

Some areas received major attention and some were largely ignored in completing this open-source baseline of KBpedia. For example, no changes (other than minor cleanup often related to other changes) were made to the property scope of KBpedia or their mappings to Wikidata or schema.org. The typologies were also not adjusted or expanded (except for minor cleanup related to other changes). The general scope of KBpedia remained virtually unchanged. However, a number of areas were targeted for specific attention and improvement. Notably:

  • Definitions were completed for 100% of the 55,000 reference concepts. Since the decision to open source KBpedia, the number of concepts with definitions grew by nearly 40%, or new definitions for about 15,000 entries;
  • Mappings to instances and classes in Wikidata were greatly expanded. Mappings now exist to 32 million entities in Wikidata, representing over 80% of the useful data in that system [1].  Over 80% of KBpedia’s 55 K reference concepts are also now mapped to specific Wikidata entries;
  • Mappings to Wikipedia also grew and kept pace with this Wikidata mapping. Total mappings to Wikipedia only grew 10% because of the larger number of prior mappings. Still, coverage of Wikipedia is also now about 80% based on either mapped RCs or coverage of Wikipedia articles;
  • Due to early mapping choices, KBpedia was not consistent in the use of plural v singular terms. We inspected and converted about 4500 plural concepts into singular expressions, consistent with what we see as best naming practices;
  • Because of this mixed naming, and some other synonym issues, we had a pool of reference concept (RC) duplicates in the system that totaled nearly 1400 items, which were consolidated and then removed. The overall size of KBpedia, however, did not change much, since all of these inspections also resulted in the addition of about 1200 new concepts, often at intermediate layers that improved the overall graph connectivity; and
  • Since the initiation of KBpedia, about 21,000 new concepts have been added over the starting OpenCyc RC structure. Each of these 21,000 RCs was reviewed, with about 5,000 flagged for detailed scrutiny. All of these flagged items were further reviewed, frequently resulting in new definitions, new parental assignments, new altLabels, or the addition of other property relations.

Despite these massive efforts, we are certainly not claiming an error-free structure. Logic and consistency tests are a constant activity and the addition or deletion of new concepts also requires testing and sometimes changes. Nonetheless, we are proud of this version 2.00 and believe KBpedia to be the cleanest it has ever been.

Statistics on the KBpedia v 200 Release

I show in the following table the statistics and changes compared to the first open-source release of KBpedia (v. 160) and the prior and last proprietary release (v. 151). The comparison to v 151 represents the total changes in the move to open source. Please note in the table that we measure coverage as the either the larger of: a) percent of external concepts mapped; or b) percent of KBpedia RCs mapped to the external source (predominantly unique).

From 1.60 From 1.51 Structure Value % Change % Change Coverage No. of RCs 54,713 -0.3% 2.4% KKO 173 0.0% -0.6% Standard RCs 54,540 -0.3% 2.4% Std RCs w/ definitions 54,537 33.2% 38.4% 100% No. of mapped vocabularies 23 0.0% -14.8% Core KBs 7 0.0% 16.7% Extended vocabs 16 0.0% -23.8% No. of typologies 68 0.0% 7.9% Core entity types 33 0.0% 0.0% Other core types 5 0.0% 0.0% Extended types 30 0.0% 20.0% No. of properties 4,847 0.0% 92.4% RC Mappings 158,789 14.0% 38.0% Wikipedia 44,342 5.3% 9.9% 81% Wikidata 44,909 63.8% 794.4% 82% schema.org 845 0.0% 15.1% 99% DBpedia ontology 764 0.0% 0.0% 99% GeoNames 918 0.0% 0.0% 99% OpenCyc 33,372 -0.5% -0.5% 61% UMBEL 33,390 -0.3% -0.3% 61% Extended vocabs 249 0.0% -4.2% Property Mappings 4,847 0.0% 92.4% Wikidata 3,970 0.0% 57.6% 86% schema.org 877 0.0% N/A 92%

The table shows the significant improvements made to KBpedia since the decision to release it as open source. The property mappings nearly doubled, now with significant mappings to both Wikidata and schema.org properties. The amount of mappings to Wikidata entities (Q items) increased nearly eight-fold (8 x), with coverage now more than 80 percent to both Wikidata and Wikipedia. The structure is fairly clean and consistent, with all reference concepts now including a definition, and most with a slew of alternative labels to improve matching and retrieval. Through its mapped sources, KBpedia links to more than 30 million entities, most all with data attributes (Wikidata) and complete articles (Wikipedia). The system is inherently designed for expansion into multiple languages.

Moving Beyond the Baseline

Of course, a knowledge artifact like KBpedia can be bounded in many ways. It is somewhat arbitrary what we define as a proper baseline. Our general image was a clean and computable framework adhering to best practices that maps to at least 80% of both Wikipedia and Wikidata. We have accomplished this baseline in the current release.

But our ambitions for KBpedia do not end there. Here are some of the major areas we will be working on for future versions:

  • Still better definitions for many concepts, particularly those with short or limited definitions. A few thousand candidates exist for this attention;
  • Adding another 1,000 or so new Wikidata Q items will increase instance coverage to more than 97% and raise class coverage to over 80%;
  • Complete the products and services mappings to the UNSPSC (United Nations Standard Products and Services Code) classification scheme, plus the likely split of the Products typology into three distinct branches;
  • Improved automatic tests for errors and oversights. We will be documenting our mapping experiences, among other topics, in a new ‘In the Trenches’ blog series I will begin early this year;
  • Test marginal overlaps between SuperTypes (typologies) for various reference concepts in order to improve assignments and increase disjointedness assertions even further;
  • Cross-check existing mappings from external sources to Wikidata against KBpedia assignments (GeoNames features, for example) and reconcile differences;
  • Create various vector files for the KBpedia reference nodes using techniques such as explicit semantic analysis (ESA), word2vec, GloVe, and perhaps others; and
  • Open source the build code for KBpedia.

Quite a while back I estimated that KBpedia might eventually grow to 85 K reference concepts or so in order to provide an equivalent, complete baseline coverage of topics across human knowledge domains. After this most recent detailed review, I think those prior numbers to be an overestimate. After detailed inspection and comparison with Wikipedia and Wikidata, I now suspect a ‘complete’ structure may require only 60 K to 65 K reference concepts. (Of course, the depth or breadth of KBpedia are virtually expandable to capture any knowledge domain.) This reduced estimate also includes that the present KBpedia has perhaps 1000 – 2000 unduly specific items (lists of individual species, for example) that probably should be culled to bring the overall structure into balance.

In any case, we welcome suggestions for further enhancements or tackling your own improvements. Please let me know what ideas you may have.

To Get the Goodies

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce’s prescient theories of knowledge representation.

Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.)

Here are the various KBpedia resources that you may download or use for free with attribution:

  • The complete KBpedia v 200 knowledge graph (8.5 MB, zipped). This download is likely your most useful starting point
  • KBpedia’s upper ontology, KKO (332 KB), which is easily inspected and navigated in an editor
  • The annotated KKO (321 KB). This is NOT an active ontology, but is has the upper concepts annotated to more clearly show the Peircean categories of Firstness (1ns), Secondness (2ns), and Thirdness (3ns)
  • The 68 individual KBpedia typologies in N3 format
  • The KBpedia mappings to the seven core knowledge bases and the additional extended knowledge bases in N3 format
  • A version of the full KBpedia knowledge graph extended with linkages to the external resources (10.5 MB, zipped), and
  • A version of the full KBpedia knowledge graph extended with inferences and linkages (14.7 MB, zipped).

The last two resources require time and sufficient memory to load. We invite and welcome contributions or commentary on any of these resources.

All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. KBpedia’s development to date has been sponsored by Cognonto Corporation.

Notes [1] Useful mappings exclude mappings to internal Wikimedia sources (such as templates, categories, or infoboxes on Wikipedia and Wikidata) and scholarly articles (linked in other manners). There are about 44 million ‘useful’ records in the current Wikipedia based on these filters.

Knowledge Representation Practionary Book Now Released

Wed, 01/02/2019 - 18:04

Available in Print or E-book Forms

I’m pleased that shortly before Christmas my new book, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (Springer), became available in hardcopy form. The e-book had been available for about two weeks prior to that.

The 464 pp book is available from Springer or Amazon (or others). See my earlier announcement for book details and the table of contents.

Individuals with a Springer subscription may get a softcover copy of the e-book for $24.99 under Springer’s MyCopy program. The standard e-book is available for $129 and hardcover copies are available for $169; see the standard Springer order site. Students or individuals without Springer subscriptions who can not afford these prices should contact me directly for possible alternatives. I will do what I can to provide affordable choices.