A secret of the semantic Web community is that energy, innovation and participation have slipped over, say, the past three or four years. This has been obvious for some time. I began collecting statistics on such things as prevalence in Google searches, attendance at SemTech or xSWC meetings, postings to user groups, blog postings, heck, even stupid and lengthy controversies on the mailing lists, or the sale and then sale and then sale of SemTech itself.
Fortunately, I realized that my observation of a decline did not depend on having documentary backup: the trend was obvious. So, I could stop collecting time-sucking statistics. I’m sure many of the participants in the formation of the semWeb know exactly of this decline in energy and focus of which I speak.
Other endeavors have kept me from worrying too much about such matters, but recent griping in public forums about the state of the semantic Web got me again thinking about premises and the state of semantic technologies. Such re-thinks are useful because they help put current circumstances into context, and because they help guide how to spot emerging opportunities.
While I am not feeling overwhelmingly passionate about such matters, there does appear to be a villain in this story, what I might term the FYN crowd . But, like all good villains and stories, villainy is mostly a matter of context, with the winners being the ones writing the history. So, accept my thoughts as arising as much from my own worldview as from anything else . . . .Galileo’s Balls
Once one embraces an intellectual domain with the premise of semantics, then meaning and context a priori become first citizens. Depending on viewpoint, what the semantic Web means to one individual can differ substantially from another individual. Moreover, the space becomes a sort of cipher for expressing any worldview, legitimately. For example, one tension at the heart of the semantic Web enterprise has been bottom up v top down; another has been anything goes v more structure and formalism. Hot buttons arise when worldviews differ, as they always surely do. The semantic Web is no exception.
Yet the stated bases for these semantic Web hot buttons, I would claim, are simplistic. What really occurs in the semantic technology space is something more akin to the Galileo thermometer, multiple viewpoints finding multiple resting points. Only in the semantic Web case, the natural resting points don’t just simply occur along a single dimension of, say, formalism, but other viewpoints as well. So, what we end up with is something more akin to a 3D- or multi-dimensional column. There are an infinite number of resting points in reasoned discourse.
Why should this be strange or threatening? Of course, upon inspection, it is not. The understanding that needs to arise is that semantics is truly about differences at all levels of human experience, perceptions and language. A pragmatic semantics must reflect this reality.
I don’t think that these sentiments will ever translate into precision or algorithms. But they can be modeled approximately with algorithms and refined with judgment. Much of their essence can also be captured by ontologies. These are viewpoints that can be captured in silico and used to help humans make better decisions. Semantics are essential to these prospects. At the heart of any pragmatic semantics must be an accommodation of viewpoints and terminology.
The real point in all of this — actually, also the major reason for semantic technologies in the first place — is that for any topic of normal human discourse there is a variety of viewpoints. Only a system expressly designed to respect these differences can be an effective digital means of interoperability.Tribal Diversities
There are many tribes within the semantic technology space. Academic researchers are the most visible tribe. Because of funding nuances and general interest and tradition (though there are real differences between the US, Queen’s countries, EU or Asia), academics have — and sometimes continue to — set the tone for the semantic Web community. This has been useful to establish a coherent and (generally) logical basis to the underpinnings of the semantic Web. But most in the community would also acknowledge this basis is not sufficient to achieve commercial breakthrough.
In the US, there is a strange mix, with many semantic researchers flying below the radar, because they work for the three-letter intelligence agencies. Also, there is a very strong biomedical community, often funded from the National Library of Medicine. The biomedical community has been an exemplar innovator. Because of this community’s efforts, we now can see how an entire domain — biomedical — can develop and leverage ontologies, establish common vocabularies or standards, or cooperate on tools development. There is no public community more advanced in semantic techology developments than the biomedical one.
Another tribe in this space is the successful hunter, able to use semantic technology capabilities to attract and secure paying customers. Most of the activities of these tribe members is hidden from view, because their paying efforts are by nature infrastructural and concentrated on enterprise and commercial customers. But, also, many individuals within this tribe actively contribute to public efforts and conferences. Many of the more visible semantic technology companies, including my own, occupy this space.
But the most enriched tribe of the semantic Web has been the background semantic orchestrator, generally through infrastructure-based initiatives like broadscale knowledge representation, statistical analysis of massive text corpora, well-considered ontologies, or knowledge structures. The semantic efforts of the search engine vendors, including Bing and Google’s knowledge graph, are members of this tribe, as is Siri, now part of Apple.
These differences in market focus and visibility have tended to play out in expected ways. Academic researchers, Web enthusiasts and those committed to open data have been most vocal about “linked data”. They tend to be the more visible participants in semantic Web mailing lists and forums. Casual followers of the semantic technology space, or those new to it, mostly hear these same voices. By default, the apparent health and status of the semantic Web is more-or-less defined by these voices.
When I said in the intro that the semantic Web has slipped over the past few years, that perception is mostly the result of the lowered volume and fewer messages coming from the vocal tribe. But there are two problems with the accuracy resulting from that. The first, as argued above, is that the vocal and visible linked data advocates are not the only representatives of the community. And, the second, which I’ll get to in a moment, is that the vocal community’s prescriptions for the semantic Web, in my opinion, are no longer the most meaningful ones.Branding, Terminology and Marketing Messages
Many early proponents of the semantic Web, I think it fair to observe, would say that two positioning mistakes (from their perspective) have kept the paradigm from grabbing greater hold. The first reason often cited is the use of XML as the initial syntax of RDF. At first blush, I agree with this observation, given that when I was first entering into the dark chambers of the semantic Web it was at times difficult to separate XML from RDF. Today, though, most semWeb practitioners prefer the use of alternative serializations. I personally don’t think that any difficulties that semantic Web understanding and adoption may pose today are any longer influenced by a decade-old XML confusion. In Web years, these are eons.
The second reason seems to have been the flat-out retreat from “semantic Web” terminology. The conscious decision to switch to the “linked data” branding began in earnest about 2008. I find this shift interesting. I think it relates to looking to the wrong measures of success. What seemed like a clever re-branding at that time has both set the focus in the wrong direction and consequently set the wrong targets for measuring success.
In the areas of standards and movements, moral authority, suasion and prominence often become the bases for who is viewed as “owning” a new concept. There has been much of this posturing around the “semantic Web” and “linked data”, with parry thrusts from “Web 3.0″ and “big data” and “open this or that”. So, I’m not surprised that branding many of the concepts of the semantic Web with a new term — “linked data” — was pushed and took hold. But why original semantic Web advocates adopted this term and its shift in focus from an ecosystem to data representation and exchange does surprise me.
The strange thing, in my opinion, is the monadic emphasis on “linked data” that acts to partially kill the semantic Web minding. Whether by design or fallout, “linked data” inexorably shifts the focus to how data is represented and transmitted. It is a royal pain in the ass for publishers to publish “linked data” and then, when done, there is surprisingly little consumption of it. The MusicBrainz announcement it was dropping RDFa last week is telliing . We are seeing the representation of structured Web data being driven on other bases, as evidenced by the success of JSON, something that linked data enthusiasts have only lately come to embrace, and the schema.org initiative of the major search engines.
Once linked data was raised as the lead banner, other branding messages followed. The first add-on message was “follow-your-nose”. FYN represents clcking from link to link following data references of interest on the Web . In order for that be facilitated, but also as a means to clear up some confusions about linked data, the quality standard of “5-star linked data” was also put forth. To achieve all five stars, linked data should conform to open standards such as RDF and link to other data for context .
Today, on virtually all “official” semantic Web forums you will see mention of the brands of linked data, FYN, 5-star linked data, and open data. Publishing of data according to best practices that enables global links from datum to datum across the Giant Global Graph has become the sort of gold standard associated with this new branding.What is the Measure of Success?
Success is always measured against our premises and values. In the case of the vocal tribe, the premises and values relate to linked and open data. By these measures, the semantic Web is a mixed bag. On the positive front, many laudable sources of quality data — most recently the Getty Museum , but also the Library of Congress and arts and humanities publishers across Europe, but also including many science realms beyond biology, and of course hundreds of others made famous by the LOD cloud  — are published as linked data. or in the process being so. Open data sets are coming from government at all levels .
On the negative front, the growth of pubished linked data has fallen behind the pace of publishing structured data in general, and notable evidence for where the consumption of linked data has made a difference is pretty hard to find. Linked data advocates only rarely discuss integration with “closed”, proprietary data or enterprise use, integration and realities. Shitty sameAs assertions abound everywhere. Markets find it hard to get excited when the arguments and reference frameworks don’t relate well to their actual problems and pain points. DBpedia can only go so far, and a mountain of links to it without relevance, context or quality is just so much more noise .
The point here is not to mount a screed against linked data, but to caution: Be careful how you brand yourself. By the measures of growth and penetration and uptake of linked data, moreover linked open data, the semantic Web space is generally not attracting developer interest, media attention or venture dollars. I hope the release of meaningful linked data continues, but setting that goal as the measure of the semantic Web’s success is selling the wrong product.
Rather than setting a FYN objective as to whether our semantic technology efforts to date have been a success, I suggest we adopt a “follow the money” (FT$) premise. Who is investing or making money off of this stuff, and how and why? Herein lies a different measure of success.
If we look to the approaches taken by those making money in this market, we find that the:
are where the bucks are being made. These activities are all at the heart of the knowledge worker’s job responsibilities. Even the earliest advocates of the semantic Web must have had aspirations that the semantic Web had the promise to address these meaningful challenges.
Another secret to systems like Freebase, Google knowledge graph, Bing, Watson, Siri, or similar innovations is their use and reliance on Wikipedia, at least in their formative stages. Though often DBpedia was the structural form of ingest, the core basis of these systems’ capabilities comes from content — Wikipedia — the access to which was only made easier via DBpedia.
The sentiment to follow the money is not a sell out or a political statement. It is a recognition that work worth doing is work others appreciate and are willing to pay for. It is the best signal amidst the noise of what is valuable to work on.It’s Time for the Side B Hit
I’ve been a fairly active participant in the semantic Web for nearly 10 years. I sometimes have the image of an aspiring music artist from the ’50s or ’60s arguing with the record execs which song should be the favored Side A cut on the 45. The visible voices of the semantic Web want to push FYN and linked data as Side A, but it really isn’t selling, according to the advocates’ own success measures.
The Side B of interoperability, RDF and OWL is not just “filler” to the main promotion, but where I clearly think the hit resides. Some have heard that track, buy it, and are enthused about it. It would be nice if the record execs could see what is right before their face and begin promoting it as well.
FYN and its vocal proponents risk the perception of failure of the semantic Web enterprise from the simple fact of putting linked data front and center. Sure, it is a good approach with potentially rich information so long as you can trust the source both for the content itself and the quality of its RDF expression. No one is arguing with that.
But SGML and ASN.1, one could argue, in similar veins, amongst actually dozens of others, were great and useful notations, yet are now mostly historical footnotes. If a trusted source is going to serve me up 5-star linked data, I will take it. Yet the truth is I would take structured data in any form from a trusted source, but take no linked data from an unknown source or one with poor linkages. We spend much time looking at these issues for our clients, and it is the rare linked data set that becomes part of our solution. Even then, we carefully scrutinize all assumed connections.
The Side B semantic Web of vetted and interlinked, interoperable data organized by competent graphs is the winning side. It is the only location where true economic transactions are taking place around the semantic Web. To understand where the semantic Web makes sense, follow the Side B money to your answers.
The insight gained from a FT$ approach clearly points to the failure of FYN. I say, do linked data if you can, it is the best ingest format around. But don’t get too hung up on that. Spend your time figuring out how to bridge meaningful gaps in semantics or data across any enterprise, global or local. Information is not truffles, and following your nose is not the primary argument for the semantic Web. FT$ trumps FYN. FYN. or Follow Your Nose, reflects is the general practice of performing web retrieval on URIs in a knowledge base to obtain more knowledge. Two W3C articles provide additional commentary. In the linked data context, FYN represents clcking from link to link following data references of interest. FYN is a specific pattern of linked data. Ed Summers provided one of the better overviews of the use of FYN in the context of linked data and the Web of Data. See the MusicBrainz blog from February 18, 2014. Tim Berners-Lee describes 5-star linked open data in this article. The Getty Museum recently made a portion of its Arts and Architecture Thesaurus (AAT) open source using linked data; see http://blogs.getty.edu/iris/art-architecture-thesaurus-now-available-as-linked-open-data/. The linked open data (LOD) cloud diagram and supporting information is maintained at http://lod-cloud.net/. I have often written on the problems with linked and open data as presently practiced. See Practical P-P-P-Problems with Linked Data (October 4, 2010) and The Nature of Connectedness on the Web (November 22, 2010) as two examples. Specific commentary on open data in government is provided in When Linked Data Rules Fail (November 16, 2009). For another assessment of the state of the semantic Web, see Brian Sletten’s recent Keep On Keeping On article on semanticweb.com (January 13, 2014).
Structured Dynamics (SD) announced yesterday that, in association with its partner Buzzr, it was spinning off a new software company, Civic Dynamics Inc., headquartered in Québec City, Canada. Included in the launch was the introduction of the new company’s Civic Dynamics Platform. CDP is open-source software and supporting systems to assist municipalities to publish dynamic open government data, and to provide citizens a set of tools for viewing, searching, filtering and analyzing that data.
The announcements of those releases stand on their own. My purpose is not to duplicate them. Rather, now that the efforts needed for the new launch are behind us, I wanted to reflect on why and how such a spin off occurred in the first place. I think these reflections offer some insight into imperatives that face new software ventures, especially those geared to enterprise IT.A Bit of History
It was just about five years ago that Fred Giasson and I began Structured Dynamics. (This was also after a year working together at Zitigist under the sponsorship of OpenLink Software.) Our mission at SD’s inception was to create a workable platform for bringing semantic technology capabilities to enterprises. Our specific interest was in using semantic technologies and RDF to solve the decades-old challenge of information interoperability in larger organizations. By serendipity, we were able to secure an enterprise client on virtually the first day we started SD. That forced us to grapple immediately with the then current woeful state of semantic technologies for enterprises.
We observed a number of problems at that time. Here is a short list of some of those problems from five years ago, and brief statements of what we initiated to address them:
What we did have five years ago was a growing list of (often) unproven open standards (principally those from the W3C) and a large roster of prototype and research tools , most from the academic community. Still, there were some proven engines suitable to a semantic stack (most adopted as core to the Open Semantic Framework), so there were building blocks upon which a complete framework could be based. With the right design and architecture, and appropriate “glue” to tie it all together, it appeared quite feasible to create a working semantic stack suitable for enteprise use. Multi-component, open-source packages — ranging from Alfresco to Talend or Pentaho — were showing the path to such next-generation platforms.
With the development model of an integrated semantic technology stack based on open source components and consistent “glue” in mind, we could then turn our attention to the business model and strategy behind the nascent Open Semantic Framework.The Business Philosophy
I don’t speak much about my prior ventures because, well, they are in the past. But I have financed ventures via angel funding, venture capital, grants and client revenues. I also have background in ventures ranging across many aspects of enterprise (mostly) and consumer (less so) software.
Our funding prejudice in starting SD was to be self-financed via clients. A customer focus keeps one from getting too abstract or falling in love with innovations for which there may not be a real market. Revenue financing also means that we need not alter business strategy or approach based on a financier’s perspective. Customers call the shots; not the money interests. This funding prejudice has kept us market focused and, as a consequence, profitable since day one.
Our staffing prejudice was to not hire, at least during the framework development phase. Setting the vision for a framework is not a democratic activity, and every hire means less development productivity. To fulfill, we have partnered and employed consultants and sub-contractors, but have not diluted our own efforts in managing employees. We could stay focused by feeding only our own mouths and our vision.
Such narrow bandwidth also carries other implications. We could not take on too many clients at a given time. We needed to be extremely productive and leveraged, finding opportunities wherever we could to re-purpose prior writings or reusing or generalizing code. We also needed to be quite selective in what projects and what clients we chose. When attempting to make progress on a new platform, it is important to not become simply a contract fulfillment shop. Customers have many options for IT contracting or outsourcing; platform development and growth requires a certain self-selection by clients.
Our standard contract emphasizes that (most) efforts are intended to be open source, and our intellectual property clauses make that explicit. At first we did not know how the market might react to this insistence. For prospects serious enough to commit monies to us, however, we have found a good appreciation that open source leads to lower current project costs because the client is leveraging what has already been developed before. It seems only fair that new developments should also be made available to later customers, as well. Some of our prior clients are now seeing the lower costs and benefits by leveraging intermediate work in upgrading to latest versions and functionality.
Our fulfillment prejudice has been to complete work on time and under budget, document and train the customer in the work, and move on. Though we know they are profitable and a bread-and-butter for most enterprise vendors, we have not sought recurring annuities from our clients in maintenance fees. By keeping our eye squarely on successful tech transfer, we are disciplining ourselves to document as we go, provide tooling and support infrastructure as well as application software, and to find efficiencies in fulfillment. Meanwhile, we are able to progress rapidly on our overall development roadmap without getting bogged down in handholding. We would rather teach the customer how to fish, rather than doing the fishing for them.
Of course, not all enterprises understand or embrace these philosophies. That is fine under our development approach where market understanding and refinements are the drivers of decisions, not maximizing revenues for an increasingly growing staff count. We have been blessed to have new clients arise whenever they are needed, and to be real partners with us in furthering the vision. We have actively rejected some customer prospects because the philosophical fit was not good. We have also actively weaned ourselves from some engagements by insisting on sunsets for our support and encouraging more tech transfer and training.
These prejudices may change as we see the underlying Open Semantic Framework nearing fulfillment of its development vision. But, for an open source platform in a hurry (even considering it has been five years!), we believe these philosophies have served us and our clients well.An Emphasis on the Open Semantic Framework
The net outcome for the Open Semantic Framework has been to emphasize a generic, enterprise-ready design that can be rapidly embraced and adopted by multiple markets. We have called OSF a platform of ontology-driven applications. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. ODapps fulfill specific generic tasks. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation, user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.
The ODapp vision underlying the design of OSF means we can leverage an architecture of generic tools to respond to virtually any knowledge application or any enterprise domain. The basic idea is shown by this diagram, which we first published about three years ago:
(click for full size)
In the five years of development of OSF, now at version 3.x (recently announced), we have had the good fortune to have clients and uses in publishing, tech transfer of R&D, group collaboration, health, automotive, air traffic control, sustainability, community indicators and local government. Demand in the latter two areas has been particularly strong. The strength of that market interest was the source of the dilemma for Structured Dynamics.Unique Demands of Municipal Markets
The idea of rapid and nimble development of a new platform — especially one expressly designed to be generic across multiple domains — does not readily square with focusing on a specific market segment. This disconnect is particularly true for quite unique markets, as is the case for local governments.
In a past life I spent nearly ten years working for a trade association that represents municipally-owned electric utilities. APPA has members ranging from huge municipalities such as Los Angeles, Toronto, Seattle and San Antonio, to the smallest towns and burgs of the plains of North America. In my former role running the R&D and technical programs for this association, I personally interacted with hundreds of these wide-ranging individual communities.
In the larger communities, the electric utilities were separate departments from the local government per se, and were directed by professional utility managers. But for mid-size and smaller communities, there was often close interaction with all municipal departments.
Though sales lead times are long for all enterprise markets, they are particularly long and (often political) in government. Budgets are perennially tight. Budgets need to be proposed, argued with councils and management, and approved before work can begin. Staff are stretched across multiple functions, so use and maintenance are key factors as are concerns about longer-term support contracts. Portals and Web sites must serve all constituencies and content and tone need to be suitable to taxpayer-supported venues. Yet, because of the number and diversity of communities , across the entire market there is surprising innovation and experimentation. Finding better ways to do more with less is a key motivator in the local government market.
Specifically, in our own use of OSF in this market, we also observed some other unique aspects related to open data and Web sites. What constitutes open data and whether and how to make it “open” varies widely by community. Capturing local needs and perspectives often leads to comparatively high costs in theming and customizing the Web sites. The lack of dedicated and trained staff to care and feed a new Web site is always a challenge.
Structured Dynamics, with its generic platform interests and avoidance of staffing, is clearly not the right vehicle to pursue this market. Specific focus on the unique aspects of the local government market is required, plus modifications and specializationis of the platform to address government needs. Possible integration or incorporation of standard local government Web site(s) may also be required. Though we were seeing keen interest from this market, in order to address it properly a different vehicle with different venture imperatives was necessary.Doing Justice to the Local Government Market
Early on, our good colleague and friend, Steve Ardire, helped point out some gaps in our business development. We saw that three things were missing within Structured Dynamics itself to do the local government and open government data markets justice. First, we needed a dedicated company to focus solely on this market. Second, we needed an executive familiar with the OSF platform and municipal government to head the effort. And, third, we somehow needed a way to overcome the time and costs associated with tailoring the portal for local community needs.
It was actually the last of these things that showed the first solution. We were approached about eighteen months ago by Ed Sussman, the CEO of Buzzr, about possibilities of partnering for the local government market. Buzzr has a one-click solution to theming and customizing individual Web portals, buillt around the Drupal content management system (CMS). Buzzr, a NYC-based company, has impeccable Drupal chops, having been co-founded by one of the leading Drupal shops, Lullabot. Buzzr has proven the applicability of their approach to specific verticals, including retail and education. The fact that Buzzr found us and saw a good fit for the municipal market was a formative discussion. We welcomed Buzzr’s outreach because their approach squarely addresses one of the cost and effort sticking points we were observing.
When Ed first contacted us, the OSF platform was still not sufficiently mature to be a market foundation. We needed more time to refine the platform, as well as to gain more market insight from use and use cases. Fortunately, Ed and Buzzr kept their interest strong while we refined things in the background. By the time we were able to address the other missing items, Buzzr was there to partner with us on the new venture.
Our second requirement was met by hiring Kelly Goldstrand, formerly the project manager for the NOW (Neighbourhoods of Winnipeg) portal, to head up the venture’s business development. NOW is one of the flagship installations of OSF. Her career focus has been on service planning, delivery and evaluation in the area of community health, protection and development. Kelly has significant management experience in local government and clearly understood OSF; her guidance had been pivotal in much of the system’s functionality. Kelly also has a proven track record in mentoring projects through local approvals and training city staff in use and maintenance of new technologies. After early retirement Kelly was ready to consider our opportunity and then graciously agreed to join us.
The last piece of the puzzle was forming the new venture. We had been working with the Civic Dynamics name for some time, and had also played around a bit with logo and Web site. Once the other things fell into place, we incorporated Civic Dynamics, Inc. in Québec (where it is also known as Dynamique Civique), given the strong market interest shown in Canada to date, and began preparing for the formal launch of the venture. We also needed to await the completion of OSF v 3.0.A Report Card on SD’s Multi-year Plan
It now appears likely that the five-year plan we set for ourselves at the founding of Structured Dynamics may actually take six to seven years to achieve. This time extension derives from the realities of our client work over this time frame. One reality is that client-specific needs have caused us to necessarily divert from our own internal development path. Not all development can contribute to fulfilling a generic platform. Every client has unique needs and circumstances that are not generalizable to others. A second reality is that only through real client engagements can market requirements be truly discerned. Customer-centric development is absolutely essential to keep software grounded.Meanwhile, Back at Civic Dynamics
We are as curious as the next person to see whether a dedicated spin-off is the right way to handle a specific vertical market. It will also be interesting to see how coordination and support can best be provided between the dynamics duo (Structured and Civics).
Nonetheless, we are excited about finally getting postured to pursue the growing market for open, local government data. We’d like to thank Kelly and Ed and all of our original sponsors for helping to gestate the venture to this point. Now that it has been birthed, we hope to nurture it and get it on its own two feet as soon as possible. Before we know it, and assuming we’ve raised it properly, Civic Dynamics will be celebrating its own life events! See our Sweet Tools listing of about 1000 semantic technology tools There is a total of about 24,000 municipal governments across the United States and Canada.
It would be an understatement to say that open data has been transforming how government does business. Over the past five years — ranging from national governments such as the United States and the United Kingdom to hundreds of local governments and municipalities and all forms of government in between — a veritable revolution in opening up data to the public has been underway. The open data in government (OGD) movement has spawned an entirely new cottage industry in open data advocacy and tools. Literally hundreds of government organizations are committed to open data, supported by an ecosystem of advocacy, technology and consulting groups.
Open data, of course, is not limited to governments. Open data in science and from the Web and for-profit entities are legitimate focal points in their own right. But, because data generated by governments are both sanctioned and developed using taxpayer monies, open data in government (OGD) occupies a special place in the conversation. Now, with experience and practice, we are beginning to see a generational shift in how open data is being handled by governments. The first generation, still mostly the current practice, was built around the idea of just making the data public and open. This current generation of open data is characterized by the publishing of datasets via catalogs. The datasets are static, unconnected and dumb. Mostly, too, the data within those datasets are poorly described and documented, often lacking standard metadata. What is now exciting, however, is the emergence of what can best be called dynamic open data. What this is and how it offers advantages is the focus of this article.The 8 Initial Principals of Open Government Data
In October 2007, 30 open government advocates met in Sebastopol, California to discuss how government could open up electronically-stored government data for public use. Up until that point, the federal and state governments had made some data available to the public, usually inconsistently and incompletely, which had whetted the advocates’ appetites for more and better data. The conference, led by Carl Malamud and Tim O’Reilly and funded by a grant from the Sunlight Foundation, resulted in eight principles that, if implemented, would empower the public’s use of government-held data. These principles, no longer online, were summarized by Joshua Tauberer in his Open Government Data book as:
These basic principles were then updated and re-phrased by the Sunlight Foundation in August 2010 to now number 10 principles, including the use of open standards, making data permanent, and keeping usage costs to an absolute minimum. All of these are laudable points. Each may or may not be provided in a fully open way by any given governmental entity.
This first step in the open data process has led to systems that are oriented to posting and publishing downloadable datasets. Existing open government data platforms, for example, such as Socrata or DKAN, can best be described as catalog systems. Listings of datasets with associated descriptions and metadata are presented. Users or the public may then chose among one or more machine-readable formats to download the entire dataset.The 5 Added Principles of Dynamic Open Data
Of course, simply throwing data over the fence does not make it useful. Once we can get past the first threshold of making data publicly accessible, we next face the challenge of making that data meaningful and relevant. Since relevance is in the eye of the user, we no longer can think about information solely in terms of static, dumb datasets. We now need to expose the underlying data dynamically, such that users may request and filter and correlate what they need and only what they need.
Thus, there are five principles — or dimensions — by which we need to judge next-generation dynamic open data:
There is no set order to the principles above. They are presented in the order shown so as to help remember them through the FACED mneumonic.Parallels with Linked Data
Though the principles above do not call out linked data as a requirement, they do share many parallels with the early growth and maturation of linked data. A number of years back Fred Giasson and I commented on When Linked Data Rules Fail. Two of the points made in that article are the absence of suitable data descriptions and lack or wrong connections in data.gov and the NY Times datasets. I subsequently expanded on these types of problems in Practical P-P-P-Problems with Linked Data.
Official data from governments can avoid many of the provenance issues associated with general linked data, but in other areas there are important parallels. Like any emerging new practice, it takes a while to learn and formalize best practices. It is not surprising that we are seeing open data in government needing to transition from dumb datasets to actionable information. Making data actionable is when government information assets will finally become effective for the broader public.
Also, like linked data, it is likely the platforms built around semantic technologies and knowledge graphs (schema) will also come to the fore. Our own Open Semantic Framework is one such example, but there are a few now emerging in the linked data and semantic technology space. It will be through different practices and these newer platforms that we will see the next generation of open government data truly emerge.
After nearly five years of concentrated development — including the past 20 months of quiet, background efforts — Structured Dynamics is proud to announce version 3.0 of its open-source Open Semantic Framework. OSF is a turnkey platform targeted to enterprises to bring interoperability to their information assets, achieved via a layered architecture of semantic technologies. OSF can integrate information from documents to Web pages and standard databases. Its broad functions range from information ingest and tagging to search and data management to publishing.
Until today, the version available for download was OSF version 1.x. While capable as an enterprise platform — indeed, it has been in use by a number of leading global enterprises since development first began — the capability of the platform was spotty and required consulting expertise to configure and set-up. SD was hired by Healthdirect Australia (HDA) nearly two years ago to enhance OSF’s capabilities and integrate it more closely with the Drupal open-source content management system, among other modern enterprise requirements. The OSF from those developments — the non-public version 2.0 specific to HDA — has now been generalized for broader public use with today’s public announcement of version 3.0.A More Complete Enterprise Platform
Not unlike many large organizations, HDA had specific enterprise requirements when it began its recent initiative. Included in these were stringent security, broad use of proven open-source applications, governance and workflow procedures, and strict content authoring and management guidelines. These requirements further needed to express themselves via a sequence of deployment and testing environments, all conducted by a multi-vendor support group following agile development practices.
These requirements placed a premium on performance, scalability and interoperability, all subject to repeatable release procedures and scripts. OSF’s initial development as a more-or-less standalone platform needed to accommodate an enterprise-wide management model involving many players, environments and applications. Prior decisions based on OSF alone now needed to consider and bridge modern enterprise development and deployment practices.
Tighter integration with Drupal was one of these requirements (see next section), but other OSF changes necessary to accommodate this environment included:
When Fred Giasson and I first designed and architected the Open Semantic Framework in 2009, we made the conscious decision to loosely couple OSF with the initial user interface and content management system, Drupal. We did so thinking that perhaps other CMS frameworks would be cloned onto OSF over time.
Time has not proven this assumption correct. Client experience and HDA’s interests suggested the wisdom of a tighter coupling to Drupal. This shift arose because of the great flexibility of Drupal with its tens of thousands of add-on modules and its ecosystem of capable developers and designers. Our early decision to keep Drupal at arm’s length was making it more difficult to manage an OSF instance. Existing Drupal developers were not able to employ their Drupal expertise to manage OSF portals.
We pivoted on this error by tightening the coupling to Drupal, which involved a number of discrete activities:
Some of the extended capabilities in OSF v 3.0 are noted above, including the expanded roster of Web services. However, the OSF Search Web service, which is by far the most used OSF endpoint, received massive improvements in this latest release.
First, OSF Search now uses a new query parser, which provides the capability to change the ranking of search results by boosting how specific query components get scored. Types, attributes, datasets or counts may be used to vary any given search result, including different occurrences on the same page. It is also now possible to add restrictions to the search queries, including restricting results to a specified set of attributes.
This flexibility is highly useful wherein certain structured pages contain blocks or sections with patterned search results. This structuring leads to the ability to create generic page templates, wherein search queries and results vary within the layout. An “events” block may score differently than, say, a “related topics” block, all of which in turn can respond to a given context (say, “cancer” versus “automobiles”) for a given page (and its template).
These repeated patterns lend themselves to the use of reusable “search profiles,” which are predefined queries that may include context variables. These profiles, in turn, can be named and placed on page layouts. Existing profiles may be recalled or invoked to become patterns for still further profiles. The flexibility of these search profiles is immense, and the parameters used in constructing them can be quite extensive.
Thus, OSF version 3.0 includes the new Query Builder module. Via an intuitive selection interface, users may construct search queries of any complexity, and then save and reuse them later as search profiles.
Lastly, registering, configuring and managing OSF instances and datasets into Drupal has never been easier. The new OSF Configure module centralizes all the features and options required for these purposes, which are then managed by a new suite of tools (see next).Automated Installation and Management Tools
Standard enterprise deployments that proceed from development to production require constant updates and versions, both in application code and content. Keeping track and managing these changes — let alone deploying them quickly and without error — requires separate management capabilities in their own right. The new OSF thus has a number of utilities and command-line tools to aid these requirements:
The methods and processes by which these advances have been made all occurred within the context of state-of-the-art enterprise IT management. Experience with supporting infrastructure tools (such as Jira, Confluence, Puppet, etc.) and agile development methods are part of the ongoing documentation of OSF (see next). This experience also bolsters Structured Dynamics’ ability to work with other third-party applications at the middleware layer or in support of enterprise deployments.Comprehensive and Completed Updated Documentation
The Open Semantic Framework has evolved considerably since its conception now five years ago. In its early development, components and pieces were sometimes developed in isolation and then brought into the framework. This jagged development path led to a cacophony of names and terms to characterize portions of the OSF stack. This terminology confusion has made it more difficult than it needed to be to understand the vision of OSF, the layers of its architecture, or the interactions between its components and parts.
In making the substantial efforts to update documentation from OSF version 1.x to the current version 3.0, terminology was made consistent and code references were cleaned up to reflect the simpler OSF branding. This clean up has led to necessary updates across multiple Web sites maintained by Structured Dynamics with some relationship to OSF.
The Web site with the most changes required has been the OSF Wiki. In its prior incarnation, called TechWiki, there were nearly 400 technical articles on OSF. That site has now been completely rewritten and re-organized. Nearly two hundred new articles have been written in support of OSF v 3.0. Terminology related to the older cacophony (see correspondance table here) has (hopefully) been updated and corrected. Most architectural and technical diagrams have been updated. Additional documentation is being posted daily, catching up with the experience of the past twenty months.Moving Beyond the Established Foundation
SD is pleased that enterprise sponsors want to continue beyond the Open Semantic Framework’s present solid foundations. While we are not at liberty to discuss specific client initiatives, a number of ongoing developments can be described broadly. First, in terms of the key engines that provide the core of OSF’s data management capabilities, initiatives are underway in the areas of visualization, business analytics and workflow orchestration and management. There are also efforts underway in more automated means for direct ingest of quality Web-based information, both based on linked data and from Web APIs. We are also pleased that efforts to further extend OSF’s tight integration with Drupal are also of interest, even while the integration efforts of the past months have not yet been fully exploited.To Learn More
To learn more, make sure and check out the re-organized OSF wiki. See specifically the complete OSF overview, the list of all the OSF 3.0 features, and the list of all the new features to OSF 3.0. Also, for a complete soup-to-nuts view of what it takes to put up a new OSF installation, see the Users Guide. Lastly, for a broad overview of OSF, see its reference architecture and the overviews on its dedicated OSF Web site.
As a final note, Structured Dynamics would like to thank its corporate sponsors of the past five years for providing the development funds for OSF, and for agreeing with the open source purposes of the Open Semantic Framework.
The end of the year is always the silly season for technology pundits. To gain attention, it is often the “end of this”, the “death of that” or new paradigms or revolutions. Granted, it is hard to get attention when everyone is pontificating on this or that. But hype and hyperbole do not serve helping users understand fundamental changes in the marketplace.
This year’s silliness award goes to Brian Profitt of ReadWrite Web who opined 2014 as the death of the distinction between consumer and enterprise software, stating, “legacy enterprise vendors need to serve business and consumers alike, or risk becoming roadkill.” Balderdash. (And bunkum, BS and brimborion if one wants to be alliterative.)
PCs thirty years ago, local networks twenty years ago, the Web ten years ago, or cloud computing or smartphones today did or will not “kill” enterprise software. Consumer applications and technology will continue to point the way to important new trends. But the fundamental distinctions of enterprise software will also live on. Let’s look at five of these distinctions .1. The Buying Process
Consumer software is an individual purchase; enterprise software purchases are on behalf of a group. That means enterprise sales need to involve many more decisionmakers, some or all of whom may not be the actual users, as when IT is the de facto purchasing agent. Multiple perspectives need to be brought to bear on the enterprise acquisition. Often a single negative voice is sufficient to scuttle a sale. On the other hand, consumer software may be free, notably lower cost, or acquired on a whim.
Traditionally this has led to longer decision cycles and the need to employ dedicated reps for enterprise sales. Though SaaS (software as a service) or PaaS (platforms as a service) can lower initial acquisition costs and improve the fundamental business model, adoption of enterprise software still is a group decision in the enterprise. Enterprises well know that initial adoption carries longer-term costs in integration, interoperability, training and documentation. Software may be “legacy” in the enterprise because of these lifecycle realities and costs.2. Enterprise “-ilities”
Enterprises, then, in representing the interests of groups or organizations, also have requirements that extend beyond what an individual consumer requires. Many of these correspond to the well-known “-ilities” — reliability, scalability, operability, interoperability, maintainability, and availability. An individual consumer is inconvenienced when there may be failures along any of these dimensions. The enterprise experiences costs, risks or lost opportunities when they occur. In other words: money.
These “-ilities” place a premium on testing and documentation, as well as lead to often requiring a longer-term relationship with the software vendor or its representatives. Because of the financial impacts from failures in “-ilities” it is often necessary to have support agreements or contracts in place to insure risks. The “-ilities” also place additional code and testing requirements upon the software.3. Security
Though often lumped in with the “-ilities”, security is an additional enterprise requirement that warrants its own distinction. Whether profit or non-profit, all enterprises are unique, with potential proprietary information both internally and externally (with the public or possible competitors). Though individual consumers also have requirements for privacy and confidentiality, these information flows are strictly between the individual and outside entities. In an enterprise, access may occur and be between many internal individuals and all of their external contacts.
The nature of individual consumer security is more like a ring or protective shell. In enterprises, security must be built fully “into the cake”, capturing distinctions between applications, databases, datasets, and access and modification rights or not at all levels. Like the other “-ilities,” the enterprise security requirement leads to a much different development and coding model than consumer software. And, frankly, it leads to higher development costs.
Initially, these hurdles were some of the causes for slower adoption of open source within enterprises. We are also learning better architectural designs and reliance on APIs that are aiding fulfilling these enterprise requirements at lower costs with greater sustainability. But the importance of security to the enterprise remains.4. Governance
Security, the “-ilities”, and ongoing reliance on legacy enterprise systems also mean that repeatable workflows and governance need to be at the core of enterprise software use. Are things working well? Where are they breaking down? Need improvement? How can we incorporate a constant influx of new users? How can we manage actual costs and effectiveness?
Any enterprise that needs to maintain a competitive or sustainability edge must be able to address these questions. For software, this means versioning, documentation and training of same, and means to track use and misuse. (Not to mention the additional workflow software to manage these processes.) Every effective enterprise understands that what is not measured can not be managed.
These training, versioning and logging requirements are essential to effective governance of software and the information upon which it operates. These, requirements, too, are different than what an individual needs or wants. They, too, add to costs and requirements above normal consumer software demands.5. Business Model
These enterprise distinctions help bound the kind of business models that may be applied to enterprise software. Enterprise software requirements are higher and more demanding (and take longer to bring to fruition) than consumer software requirements. Support, longevity, reputation and quality are important factors for software vendors to fulfill in order to overcome legitimate risk questions enterprises ask when contemplating a new (potential legacy) commitment to a new enterprise software adoption.
Fortunately, as systems have become more open with a new architectural model based on the distributed Web, many older enterprise hurdles can now be more readily overcome. These advances are unalloyed goodness. But enterprise imperatives still remain.
We can hope with less risky SaaS or PaaS that much can be done to reduce initial acquisition costs and risks. Open source software is also lowering the cost of initial enterprise software development by orders of magnitude . Nonetheless, higher costs with support commitments distinguishes enterprise software business models from any of the consumer kind. I expect that fundamental distinction to remain.Consumer Trends DO Affect the Enterprise
These five factors, or other splits that could be reasonably made, are not meant to deny the importance of consumer software. Merely, the point is that enterprise software has its own set of imperatives. Enterprise software is certainly more conservative and slower-paced for the exact reasons of its distinction from consumer software. Talk of convergence or the “consumerization” of enterprise software misses these distinctions and what will continue to be the fundamental differences between the two software categories.
Because of its lesser requirements, meaning in economic terms “lower barriers to entry,” we will also see that consumer software and its devices will be the lodestar for innovation. In my own thirty years in this space, we have seen consumer leadership in device form factors (PCs to smartphones), architecture (Web, APIs and distributed networks), user interfaces (browsers and HTML), data and data models (RDF, XML, JSON), programming languages (scripting, Ruby, Python), business models (open source, cloud computing), software models (apps, SaaS, PaaS), etc. Enterprise software is, by and large, a sink for consumer innovations, not a source.
But to be successful in the enterprise, those innovations must also meet more stringent requirements. And, some of those requirements, such as interoperability, are clearly driven more from the enterprise side of things.
Thus, silly talk about consumer versus enterprise markets, framed as either “death” or “convergence,” really misses the point. Ultimately, they are different markets with different imperatives. Yes, there is a synergy and natural relationship — after all code and devices may be shared in either realm — but the roles and contributions of each differ. Though I don’t deny that some innovations may work equally well in either the consumer or enterprise markets, most innovation will occur in the consumer sector, while higher revenues and income are to be derived from the enterprise sector.Today’s Enterprise Picture
Despite the silly punditry noted above, major industry analysts and the venture capital community are signalling a shift from the consumer to the enterprise market. Gartner, for example, sees a doubling in enterprise software growth to 5.8% in 2014 over other IT expenditures. CB Insights points to a dramatic shift in venture capital support for enterprise software versus consumer over the past two years :
In 2013, about 70% of VC software funding went to startups building tech for businesses. (Actually, the shift was much greater in that $450 million of the consumer total went to just two consumer companies, Uber and Pinterest.) VC funding for enterprise software has risen 65% in the past two years; meanwhile, funding of consumer software by VCs has dropped 60%.
Besides the crowded consumer space and perhaps steam being lost behind social networking, these trends suggest that consumer innovations of the past few years are now ripe for “enterprising” within the enterprise market. What can be taken from the consumer side now must be looked at for incorporation and adoption on the enterprise side. This is not the “consumerization” of the enterprise, but the “enterprising” of consumer innovations.
This distinction is important. Adoption of prior consumer innovations will not occur via osmosis (“consumerization”), but by purposeful re-packaging and modification of those innovations to meet enterprise requirements (“enterprising”). That is, those successful in leveraging consumer innovations into the enterprise will do so purposefully by adapting to enterprise imperatives. The target-rich environment of the next couple of years will be adopting prior consumer innovations to the enterprise. For some recent further reading on this topic, see 5 Best Kept Secrets About Enterprise Software, The Difference Between Consumer and Enterprise Software, The Difference Between Enterprise and Consumer Software, Enterprise Web vs Consumer Web [2.0]: Top Six Differences, and Forget Virality, Selling Enterprise Software Is Still Old School. My first enterprise software company from the early 1980s required more than $1 million in start-up software development funds; more recent experience has been on the order of $50,000 to $100,000. Other recent commentaries on the shift to enterprise software include Enterprise Software Spending Will Set Pace of IT Spending in 2014, VCs Are Tripping Over Themselves To Fund Enterprise Startups, Research Shows, and As the Pendulum Swings—Back to Enterprise Software? Opening image courtesy of Willow and Stone, UK.
The Peg project has just been moved from beta to public status by its two sponsors, United Way of Winnipeg and the Institute for Sustainable Development (IISD). Peg is an innovative Web portal for community indicators of well-being for the city of Winnipeg, Manitoba. Peg helps identify and, on an ongoing basis, track indicators that relate to the economic, environmental, cultural and social well-being of the people of Winnipeg. Here is the main screen:
The Peg website (www.mypeg.ca) is a fully integrated, interactive and dynamic information portal. Peg is a robust knowledge system that includes a graphic user interface to display a range of community well-being and sustainability themes and the indicators used to track progress over time. Structured Dynamics was the lead technical contractor on the project, basing the site on our Open Semantic Framework (OSF).
We originally completed and delivered the site to the sponsors in beta with a single indicator cluster around the concept of poverty. Based on our training and the tools packaged with OSF, the sponsors were then able to gather and load further data in broader indicator areas including Basic Needs, the Built Environment, the Economy, Education & Learning, Governance, Health, Natural Environment and Social Vitality. The Web site design contractor, Tactica Interactive, was also able to extend the baselline visualization tools provided by OSF using the existing APIs and documentation. Here, for example, is a chloropleth map created by Tactica for the site:
Datasets may be selected and compared with a variety of charting and mapping and visualization tools at the level of the entire city, neighborhoods or communities. All told, there are 54 datasets now within Peg representing more than 4,000 different entities. The sponsors collected, organized and inputted these data themselves according to specifications and tools provided with OSF.
Peg is a great example of how the basic OSF can be extended and maintained by site users. After initial delivery and training, Structured Dynamics played no role in the completion and publication of the site.
The Peg model shows how the combination of open source software, documentation and training enables any organization to deploy and manage their own semantic publishing system. Congratulations to all associated with Peg for this newest release!
Two and one-half years ago the triumvirate of Google, Bing and Yahoo! — soon to be joined by Yandex, the major Russian search engine — released schema.org. The purpose of schema.org is to bring a simple means for Web site owners and authors to tag their sites with a vocabulary, designed to be understandable by search engines, to describe common things and activities on the Web. Though informed and led by innovators with impeccable backgrounds in the early semantic Web and knowledge representation , the founders of schema.org also understood that the Web is a messy place with often wrong syntax and usage. Their stated commitment to simplicity and practicality caused me to state the day of release that schema.org was “perhaps the most important event for the structured Web since RDF was released a dozen years ago.”
Just a week ago schema.org version 1.0e was released. That event, plus much else in recent months, is suggesting a real maturity and take up of schema.org. It looks like the promise of schema.org is being fulfilled.Growth and Impact of the schema
When first released, schema.org provided nearly 300 structured record types that may be used to tag information in Web pages. Via various collaborative processes since, and with an active discussion group, the schema.org vocabulary has about doubled in size. Some key areas of expansion have been in describing various actions, adding basic medical terms, product and transaction expansion via linkages to GoodRelations, civic services, and most recently, accessibility. Many other additions are in progress.
In his keynote address at ISWC 2013 in Sydney on October 23, Ramathan Guha  reported that 15 percent of crawled pages and 5 million sites have some schema.org markup. We can also see that some of the most widely used content management systems on the Web, notably including WordPress, Joomla and Drupal, have or plan to have native schema.org support. These tooling trends are important because, though designed for simple manual markup, it does require a bit of attention and skill to get schema.org markup right. Having markup added to pages automatically in the background is the next threshold for even broader adoption.
The ability of the schema.org vocabulary to capture essential domain facts as structured data is reflected in the growing list of prominent sites tagging with schema.org. According to Guha, these are some of the prominent sites now using schema.org:CategoryProminent SitesNewsNytimes, guardian.com, bbc.co.ukMoviesimdb, rottentomatoes, movies.comJobs / careerscareerjet.com, monster.com, indeed.comPeoplelinkedin.comProductsebay.com, alibaba.com, sears.com, cafepress.com, sulit.com, fotolia.comVideosyoutube, dailymotion, frequency.com, vinebox.comMedicalcvs.com, drugs.comLocalyelp.com, allmenus.com, urbanspoon.comEventswherevent.com, meetup.com, zillow.com, eventfulMusiclast.fm, myspace.com, soundcloud.comKey Applicationspinterest.com, opentable.com
Examples like Pinterest show how schema.org can also provide a central organizing point for new ventures and applications. There are also key relationships between schema.org and new search initiatives such as Google’s Now or its knowledge graph.
From day one schema.org was released with a mechanism for other parties to extend its vocabulary. However, more recently, there has been a significant increase of attention on questions of interoperability and relation to other existing vocabularies. To wit:
To be clear, it was never the intent for schema.org to become a single, governing vocabulary for the Web. Nonetheless, these broader means to enable others to tie in effectively with it are an indicator that schema.org’s sponsors are serious about finding effective common grounds.
Aside from certain areas such as recipes or claiming site or blog ownership, it has been unclear how the search engines are actually using schema.org markup or not. The sponsors have oft stated a go-slow attitude to see if the marketplace indeed embraces the vocabulary or not. I’m also sure that the sponsors, as familiar as they are with spam and erroneous markup, have also wanted to put in place effective ingest procedures that do not reduce the quality of their search indexes.
Getting Dan Brickley, one of the better-known individuals in RDF and the semantic Web, to act as schema.org’s liaison to the broader community, and beginning to open up about actual usage and uptake of schema.org are great signs of the sponsors’ commitment to the vocabulary. We should expect to see a much quickened pace and more visibility for schema.org within the search services themselves within the coming months.W3C’s Complementary Efforts
Meanwhile, back at the ranch, a number of other interesting efforts are occurring within the World Wide Web Consortium (W3C) that are complementary to these trends. As readers of this blog well know, I have argued for some time that RDF makes for a fantastic data model for interoperating disparate content, which our company Structured Dynamics centrally relies upon, but that RDF is not an essential for metadata specification or exchange. Understood serializations based on understood vocabularies — in other words, exactly the design of schema.org — should be sufficient to describe the various types of things and their attributes as may be found on the Web. This idea of structured data in a variety of forms puts control into the hands of content authors. Various markets will determine what makes best sense for them as to how they actually express that structured data.
Last week the W3C announced its retirement of the Semantic Web group, subsuming it instead into the activities of the new W3C Data Activity. The W3C also announced a new group in CSV (comma-separated values) data exchange to go along with recent efforts in JSON-LD (linked data).
These are great trends that reflect a prejudice to adoption. Along with the advances taking place with schema.org, the Web now appears to be entering into a golden age of structured data. For example, a Google Fellow instrumental in founding schema.org is Ramanathan V. Guha, with a background extending back to Cyc and through Apple and Netscape through what came to be RDF. Guha was also the lead executive behind Google’s Knowledge Graph, which has some key relations with schema.org.