Feed aggregator

Statistics on the KBpedia v 160 Release

AI3:::Adaptive Information (Mike Bergman) - Tue, 11/13/2018 - 06:42
Better Mappings, More Properties

When we released KBpedia v 1.60 as open source a couple of weeks back, I noted that I would follow-up the announcement with more details on the changes made in preparation for the release. This post provides that update.

KBpedia is a computable knowledge structure that combines seven major public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL. KBpedia supplements these core KBs with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks.

KBpedia is a comprehensive knowledge structure for promoting data interoperability and KBAI. KBpedia’s upper structure, the KBpedia Knowledge Ontology (KKO), is based on the universal categories and knowledge representation theories of the great 19th century American logician, philosopher, polymath and scientist, Charles Sanders Peirce. This design provides a logical and coherent underpinning to the entire KBpedia structure. The design is also modular and fairly straightforward to adapt to enterprise or domain purposes. KBpedia was first released in October 2016. My initial announcement provides further details on KBpedia and how to download it.

Besides prepping the KBpedia knowledge artifiact for open-source release, we also made these improvement to the base structure in comparison to the prior v 1.51, the last proprietary version:

  • The major effort was to increase the mapping to Wikidata, with most mappings represented as owl:equivalentClass. Coverage of KBpedia to Wikidata is now 50%, with 27,423 of KBpedia’s reference concepts now mapped to Wikidata. Version 1.60 has 4.5x more coverage than the previous v. 1.51
  • We also continued to increase coverage to Wikipedia, with coverage now at 77%
  • We now have essentially complete coverage to DBpedia ontology, schema.org and GeoNames
  • We doubled the number of mapped properties to nearly 5 K and added schema.org property mappings
  • We organized the properties into attributes, indexes/indices, and external relations.

Please note we measure coverage as the larger of percent of external concepts mapped or percent of KBpedia mapped to the external source. The % Change figures represent the changes from v 1.51 to the new open source v 1.60.

Besides the property organization, we made few changes in this latest v 1.60 release to the overall structure or scope of KBpedia. The emphasis was on mapping to existing sources and clean up for public release. Here are the major statistics for v 1.60:

Structure Value % Change Coverage No. of RCs 54,867 2.7% KKO 173 -0.6% Standard RCs 54,694 2.7% No. of mapped vocabularies 23 -14.8% Core KBs 7 16.7% Extended vocabs 16 -23.8% No. of typologies 68 7.9% Core entity types 33 0.0% Other core types 5 0.0% Extended types 30 20.0% No. of properties 4,847 92.4% RC Mappings 139,311 21.1% Wikipedia 42,108 4.3% 77% Wikidata 27,423 446.2% 50% schema.org 845 15.1% 99% DBpedia ontology 764 0.0% 99% GeoNames 918 0.0% 99% OpenCyc 33,526 0.0% 61% UMBEL 33,478 0.0% 99% Extended vocabs 249 -4.2% Property Mappings 4,847 92.4% Wikidata 3,970 57.6% schema.org 877 N/A

Through its mapped sources, KBpedia links to more than 30 million entities, the largest percentage coming from Wikidata. The mappings to these external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record.)

Please know that KBpedia remains under active development, with new updates anticipated in the near future. We are incorporating feedback gained from the initial open source release, and are also committed to increasing the mapping coverage for the artifact and other baseline improvements. Our plan is to complete this baseline before new external sources are added to the system.

KBpedia is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. KBpedia’s development to date has been sponsored by Cognonto Corporation.

Woohoo! KBpedia is Now Open Source

AI3:::Adaptive Information (Mike Bergman) - Tue, 10/23/2018 - 17:50
A Major Milestone in Semantic Technologies and AI After a Decade of Effort

Fred Giasson and I are very (no, make that supremely!) pleased to announce the availability of KBpedia as open source. Woohoo! The complete open source KBpedia includes its upper ontology (KKO), full knowledge graph, mappings to major leading knowledge bases, and 70 logical concept groupings called typologies. We are also today announcing version 1.60 of KBpedia, with greatly expanded mappings.

For those who have been following our work, it should be clear that this release represents the culmination of more than ten years of steady development. KBpedia is the second-generation knowledge graph successor to UMBEL, which we will now begin to retire. KBpedia, when first released in 2016, only provided its upper portion, the KBpedia Knowledge Ontology (KKO), as open source. While we had some proprietary needs in the first years of the structure, we’re really pleased to return to our roots in open source semantic technologies and software. Open source brings greater contributions and greater scrutiny, both important to growth and improvements.

KBpedia is a computable knowledge structure that combines seven major public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL. KBpedia supplements these core KBs with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks.

KBpedia is a comprehensive knowledge structure for promoting data interoperability and KBAI. KBpedia’s upper structure, KKO, is based on the universal categories and knowledge representation theories of the great 19th century American logician, polymath and scientist, Charles Sanders Peirce. This design provides a logical and coherent underpinning to the entire structure. The design is also modular and fairly straightforward to adapt to enterprise or domain purposes. KBpedia was first released in October 2016.

“We began KBpedia with machine learning and AI as the driving factors,” said Fred, also the technical lead on the project. “Those remain challenging, but we are also seeing huge demands to bring a workable structure that can leverage Wikidata and Wikipedia,” he said. “We are seeing the convergence of massive public data with open semantic technologies and the ideas of knowledge graphs to show the way,” he stated. Here are some of the leading purposes and use cases for KBpedia:

    • A coherent and computable overlay to both Wikipedia and Wikidata
    • Integrating domain data
    • Fine-grained entity identification, extraction and tagging
    • Faceted, semantic search and retrieval
    • Mapping and integration of external datasets
    • Natural language processing and computational linguistics
    • Knowledge graph creation, extension and maintenance
    • Tailored filtering, slicing-and-dicing, and extraction of domain knowledge structures
    • Data harvesting, transformation and ingest
    • Data interoperability, re-use of existing content and data assets, and knowledge discovery
    • Supervised, semi-supervised and distant supervised machine learning for:
      • Typing, classification, extraction, and tagging of entities, attributes and relations
    • Unsupervised and deep learning.

    The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce’s prescient theories of knowledge representation. I will have much further to say about the project and its relation to Peirce in the coming weeks.

    The new v 1.60 release of KBpedia has 55,000 reference concepts in its guiding knowledge graph, which ties into an estimated 30 million entities, mostly from Wikidata. The system is inherently multi-lingual, though the current release is in English only. We hope to see multiple language versions emerge, which should be straightforward given the dominance of links from Wikipedia and Wikidata. As it stands, the core structure of KBpedia provides direct links to millions of external reference sources. A subsequent post will document the changes in version 1.60 in detail.

    With this open source release, we will next shift our attention to expand the coverage of links to external sources. By moving to open source, we hope to see problems with the structure emerge as well as contributions now come from others. When you pull back the curtain with open source a premium gets placed on having clean assignments and structure that can stand up to inspection. Fortunately, Fred has designed a build system that starts with clean ‘triples’ input files.We make changes, re-run the structure against logic and consistency tests, fix the issues, and run again. We conducted tens of builds of the complete KBpedia structure in the transition from the prior versions to the current release. While we have a top-down design based on Peirce, we build the entire structure from the bottom up from these simple input specifications. The next phase in the our KBpedia release plan is also to release these build routines as open source.

    Though tremendous strides have been made in the past decade in leveraging knowledge bases for artificial intelligence, we are butting up against two limitations. Our first problem is that we are relying on knowledge sources like Wikipedia that were never designed for AI or data integration purposes. The second problem is that we do not have repeatable building blocks that can be extended to any domain or any enterprise. AI is sexy and attractive, but way too expensive. We hope the current open source release of KBpedia moves us closer to overcoming these problems.

    Downloads

    Here are the various KBpedia resources that you may download or use with attribution:

    • The complete KBpedia knowledge graph (7 MB, zipped). This download is likely your most useful starting point
    • KBpedia’s upper ontology, KKO (304 KB), which is easily inspected and navigated in an editor
    • The annotated KKO (291 KB). This is NOT an active ontology, but is has the upper concepts annotated to more clearly show the Peircean categories of Firstness (1ns), Secondness (2ns), and Thirdness (3ns)
    • The 68 individual KBpedia typologies in N3 format
    • The KBpedia mappings to the seven core knowledge bases and the additional extended knowledge bases in N3 format
    • A version of the full KBpedia knowledge graph extended with linkages to the external resources (8.7 MB, zipped), and
    • A version of the full KBpedia knowledge graph extended with inferences and linkages (11.6 MB, zipped).

    The last two resources require time and sufficient memory to load. We invite and welcome contributions or commentary on any of these resources.

    All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. KBpedia’s development to date has been sponsored by Cognonto Corporation.

Syndicate content