Search
Subscribe

Bookmark and Share

About this Blog

As enterprise supply chains and consumer demand chains have beome globalized, they continue to inefficiently share information “one-up/one-down”. Profound "bullwhip effects" in the chains cause managers to scramble with inventory shortages and consumers attempting to understand product recalls, especially food safety recalls. Add to this the increasing usage of personal mobile devices by managers and consumers seeking real-time information about products, materials and ingredient sources. The popularity of mobile devices with consumers is inexorably tugging at enterprise IT departments to shifting to apps and services. But both consumer and enterprise data is a proprietary asset that must be selectively shared to be efficiently shared.

About Steve Holcombe

Unless otherwise noted, all content on this company blog site is authored by Steve Holcombe as President & CEO of Pardalis, Inc. More profile information: View Steve Holcombe's profile on LinkedIn

Follow @WholeChainCom™ at each of its online locations:

Entries in Informational Objects (16)

Tuesday
Jul152008

Cloud Computing: Billowing Toward Data Ownership - Part II

[Return to Part I]

Cloud Computing's Achilles Heel

2093760-1723750-thumbnail.jpg
Death of Achilles Peter Paul Rubens 1630-1635
The boom in the data center industry is building the Cloud where the conventional wisdom is that the software services of the Semantic Web will thrive. The expansion of the Cloud is believed to augur well that distributed data within the Cloud will come to substitute to some extent - perhaps substantially so - for data currently distributed outside of the Cloud. But the boom is being built upon a privacy paradigm employed by online companies that allows them to use Web cookies for collecting a wide variety of information about individual usage of the Internet. This assumption is the Cloud’s Achilles’ heel. It is an assumption that threatens to keep the Cloud from fully inflating beyond publicly available information sources.

I'm mulling over a more indepth discussion of Web cookies for a final Part III to this multi-part series. In the meantime the focus of today's blog is that a more likely consequence of the Cloud is that as people and businesses consider moving their computer storage and services into the Cloud, their direct technological control of information becomes more and more of a competitive driver.  As blogged in Part I, the online company that figures out ways of building privacy mechanisms into its compliance systems will be putting itself at a tremendous competitive advantage for attracting the services to operate in the Cloud. But puzzlement reigns as to how to connect the Cloud with new pools of the data (mostly non-artistic) that is private, confidential and classified.

Semantic Web: What’s Right About the Vision

What’s right about the Semantic Web is that its most highly funded visionaries have envisioned beyond a Web of documents to a ‘Data Web’. Here are two examples. A Web of scalably integrated data employing object registries envisioned by Metaweb Technologies’ Danny Hillis. A Web of granularly linked, ontologically defined, data objects envisioned by Radar Network’s Nova Spivack.

2093760-1693914-thumbnail.jpgClick on the thumbnail image to the left and you will see in more detail what Hillis envisions. That is, a database represented as a labeled graph, where data objects are connected by labeled links to each other and to concept nodes. For example, a concept node for a particular category contains two subcategories that are linked via labeled links "belongs-to" and "related-to" with text and picture. An entity comprises another concept that is linked via labeled links "refers-to," "picture-of," "associated-with," and "describes" with Web page, picture, audio clip, and data. For further information, see the blogged entry US Patent App 20050086188: Knowledge Web (Hillis, Daniel W. et al).

2093760-1660744-thumbnail.jpgClick on the thumbnail image to the right and you will see in more detail what Spivack envisions. That is, a picture of a Data Record with an ID and fields connected in one direction to ontological definitions in another direction to other similarly constructed data records with there own fields connected in one direction to ontological definitions, etc. These data records - or semantic web data - are nothing less than self-describing, structured data objects that are atomically (i.e., granularly) connected by URIs. For more information, see the blogged entry, US Patent App 20040158455: Methods and systems for managing entities in a computing device using semantic objects (Radar Networks).

Furthermore, Hillis and Spivack have studied the weaknesses of relational database architecture when applied to globally diverse users who are authoring, storing and sharing massive amounts of data, and they have correctly staked the future of their companies on object-oriented architecture. See, e.g., the blogged entries, Efficient monitoring of objects in object-oriented database system which interacts cooperatively with client programs and Advantages of object oriented databases over relational databases. They both define the Semantic Web as empowering people across the globe to collaborate toward the building of bigger, and more statistically reliable, observations about things, concepts and relationships.

2093760-1595453-thumbnail.jpgClick on the thumbnail to the left for a screen shot of a visualization and interaction experiment produced by Moritz Stefaner for his 2007 master's thesis, Visual tools for the socio–semantic web. See the blogged entry, Elastic Tag Mapping and Data Ownership. Stefaner posits what Hillis and Spivack would no doubt agree with - that the explosive growth of possibilities for information access and publishing fundamentally changes our way of interaction with data, information and knowledge. There is a recognized acceleration of information diffusion, and an increasing process of granularizing information into micro–content. There is a shift towards larger and larger populations of people producing and sharing information, along with an increasing specialization of topics, interests and the according social niches. All of this appears to be leading to a massive growth of space within the Cloud for action, expression and attention available to every single individual.

Semantic Web: What’s Missing from the Vision

Clouds_Missing%20From.PNG Continuing to use Hillis and Spivack as proxies, these two visionaries of the Semantic Web assume that data - all data - will be made available as an open source. Neither of them have a ready answer for the very simple question that Steve Innskeep asks above (in Part I of this two-part blog entry).

Inskeep: "Is somebody who runs a business, who used to have a filing cabinet in a filing room, and then had computer files and computer databases, really going to be able or want to take the risk of shipping all their files out to some random computer they don't even know where it is and paying to rent storage that way?"

Sir Tim Berners-Lee, the widely recognized inventor of the Web, and Director of the W3C, is every bit as perplexed about data ownership. In Data Portability, Traceability and Data Ownership - Part IV I referenced a recent interview excerpt from March, 2008, initiated by interviewer Paul Miller of ZDNet, in which Berners-Lee does acknowledge data ownership fear factors.

Miller: “You talked a little bit about people's concerns … with loss of control or loss of credibility, or loss of visibility. Are those concerns justified or is it simply an outmoded way of looking at how you appear on the Web?”

Berners-Lee: “I think that both are true. In a way it is reasonable to worry in an organization … You own that data, you are worried that if it is exposed, people will start criticizing [you] ….

So, there are some organizations where if you do just sort of naively expose data, society doesn't work very well and you have to be careful to watch your backside. But, on the other hand, if that is the case, there is a problem. [T]he Semantic Web is about integration, it is like getting power when you use the data, it is giving people in the company the ability to do queries across the huge amounts of data the company has.

And if a company doesn't do that, then, it will be seriously disadvantaged competitively. If a company has got this feeling where people don't want other people in the company to know what is going on, then, it has already got a problem ….

(emphasis added)

In other words, 'do the right thing', collegially share your data and everything will be OK. If only the real world worked that way, then Berners-Lee would be spot on. In the meantime, there is a ready answer.

Ownership Web

Cloud%20Over%20Ocean.PNGThe ready answer is an Ownership Web concurrently rising alongside, and complimentary to, the emerging Semantic Web.

For the Semantic Web to reach its full potential in the Cloud, it must have access to more than just publicly available data sources. It must find a gateway into the closely-held, confidential and classified information that people consider to be their identity, that participants to complex supply chains consider to be confidential, and that governments classify as secret. Only with the empowerment of technological ‘data ownership’ in the hands of people, businesses, and governments will the Semantic Cloud make contact with a horizon of new, ‘blue ocean’ data.

The Ownership Web would be separate from the Semantic Web, though semantically connected as layer of distributed, enterprise-class web platforms residing in the Cloud.

Ownership%20Web.PNG

The Ownership Web would contain diverse registries of uniquely identified data elements for the direct authoring, and further registration, of uniquely identified data objects. Using these platforms people, businesses and governments would directly host the authoring, publication, sharing, control and tracking of the movement of their data objects.

The technological construct best suited for the dynamic of networked efficiency, scalability, granularity and trustworthy ownership is the data object in the form of an immutable, granularly identified, ‘informational’ object.

A marketing construct well suited to relying upon the trustworthiness of immutable, informational objects would be the 'data bank'.

Data Banking

Bank_Man%20and%20Money%20Supporting.PNG Traditional monetary banks meet the expectations of real people and real businesses in the real world.

As blogged in Part I ... 

People are comfortable and familiar with monetary banks. That’s a good thing because without people willingly depositing their money into banks, there would be no banking system as we know it. By comparison, we live in a world that is at once awash in on-demand information courtesy of the Internet, and at the same time the Internet is strangely impotent when it comes to information ownership.

In many respects the Internet is like the Wild West because there is no information web similar to our monetary banking system. No similar integrated system exists for precisely and efficiently delivering our medical records to a new physician, or for providing access to a health history of the specific animal slaughtered for that purchased steak. Nothing out there compares with how the banking system facilitates gasoline purchases.

If an analogy to the Wild West is apropos, then it is interesting to reflect upon the history of a bank like Wells Fargo, formed in 1852 in response to the California gold rush. Wells Fargo wasn’t just a monetary bank, it was also an express delivery company of its time for transporting gold, mail and valuables across the Wild West. While we are now accustomed to next morning, overnight delivery between the coasts, Wells Fargo captured the imagination of the nation by connecting San Francisco and the East coast with its Pony Express. As further described in Banking on Granular Information Ownership, today’s Web needs data banks that do for the on-going gold rush on information what Wells Fargo did for the Forty-niners.

Banks meet the expectations of their customers by providing them with security, yes, but also credibility, compensation, control, convenience, integration and verification. It is the dynamic, transactional combination of these that instills in customers the confidence that they continue to own their money even while it is in the hands of a third-party bank.

A data bank must do no less.

Ownership Web: What's Philosophically Needed

Money_Brazilian.PNG Where exactly is the sweet spot of data ownership?

In truth, it will probably vary depending upon what kind of data bank we are talking about. Data ownership will be one thing for personal health records, another for product supply chains, and yet another for government classified information. And that's just for starters because there will no doubt be niches within niches, each with their own interpretation of data ownership. But the philosophical essence of the Ownership Web that will cut across all of these data banks will be this:

  • That information must be treated either or both as a tangible, commercial product or banked, traceable money.

The trustworthiness of information is crucial. Users will not be drawn to data banks if the information they author, store, publish and access can be modified. That means that even the authors themselves must be proscribed from modifying their information once registered with the data bank. Their information must take on the immutable characteristic of tangible, traceable property. While the Semantic Web is about the statistical reliability of data, the Ownership Web is about the reliability of data, period.

Ownership Web: What's Technologically Needed

What is technologically required is a flexible, integrated architectural framework for information object authoring and distribution. One that easily adjusts to the definition of data ownership as it is variously defined by the data banks serving each social network, information supply chain, and product supply chain. Users will interface with one or more ‘data banks’ employing this architectural framework. But the lowest common denominator will be the trusted, immutable informational objects that are authored and, where the definition of data ownership permits, controllable and traceable by each data owner one-step, two-steps, three-steps, etc. after the initial share.

2093760-1700737-thumbnail.jpgClick on the thumbnail to the left for the key architectural features for such a data bank. They include a common registry of standardized data elements, a registry of immutable informational objects, a tracking/billing database and, of course, a membership database. This is the architecture for what may be called a Common Point Authoring™ system. Again, where the definition of data ownership permits, users will host their own 'accounts' within a data bank, and serve as their own 'network administrators'. What is made possible by this architectural design is a distributed Cloud of systems (i.e., data banks). The overall implementation would be based upon a massive number of user interfaces (via API’s, web browsers, etc.) interacting via the Internet between a large number of data banks overseeing their respective enterprise-class, object-oriented database systems.

2093760-1666391-thumbnail.jpgClick on the thumbnail to the right for an example of an informational object and its contents as authored, registered, distributed and maintained with data bank services. Each comprises a unique identifier that designates the informational object, as well as one or more data elements (including personal identification), each of which itself is identified by a corresponding unique identifier. The informational object will also contain other data, such as ontological formatting data, permissions data, and metadata. The actual data elements that are associated with a registered (and therefore immutable) informational object would be typically stored in the Registered Data Element Database (look back at 124 in the preceding thumbnail). That is, the actual data elements and are linked via the use of pointers, which comprise the data element unique identifiers or URIs. Granular portability is built in. For more information see the blogged entry US Patent 6,671,696: Informational object authoring and distribution system (Pardalis Inc.).

Ownership Web: Where Will It Begin?

2093760-1729103-thumbnail.jpg
Aristotle
Metaweb Technologies
is a pre-revenue, San Francisco start-up developing and patenting technology for a semantic ‘Knowledge Web’ marketed as Freebase™. Philosophically, Freebase is a substitute for a great tutor, like Aristotle was for Alexander. Using Freebase users do not modify existing web documents but instead annotate them. The annotations of Amazon.com are the closest example but Freebase further links the annotations so that the documents are more understandable and more findable. Annotations are also modifiable by their authors as better information becomes available to them. Metaweb characterizes its service as an open, collaboratively-edited database (like Wikipedia, the free encyclopedia) of cross-linked data but it is really very much a next generation competitor to Google.

Not that Hillis hasn't thought about data ownership. He has. You can see it in an interview conducted by his patent attorney and filed on December 21, 2001 in the provisional USPTO Patent Application 60/343,273:

Danny Hillis: "Here's another idea that's super simple. I've never seen it done. Maybe it's too simple. Let's go back to the terrorist version [of Knowledge Web]. There's a particular problem in the terrorist version that the information is, of course, highly classified .... Different people have some different needs to know about it and so on. What would be nice is if you ... asked for a piece of information. That you [want access to an] annotation that you know exists .... Let's say I've got a summary [of the annotation] that said,  'Osama bin Laden is traveling to Italy.' I'd like to know how do you know that. That's classified. Maybe I really have legitimate reasons for that. So what I'd like to do, is if I follow a link that I know exists to a classified thing, I'd like the same system that does that to automatically help me with the process of getting the clearance to access that material." [emphasis added]

What Hillis was tapping into just a few months after 9/11 is just as relevant to today's information sharing needs.

In the War on Terror the world is still wrestling with classified information exchange between governments, between agencies within governments, and even between the individuals making up the agencies themselves. Fear factors revolving around data ownership – not legal ownership, but technological ownership – create significant frictions to information sharing throughout these Byzantine information supply chains.

Fear%20Factors_Woman%20Fretting.PNGSomething similar is happening within the global healthcare system. It's a complex supply chain in which the essential product is the health of the patients themselves. People want to share their entire personal health records with a personal physician but only share granular parts of it with an impersonal insurance company. ‘Fear factors’ are keeping people from becoming comfortable with posting their personal health information into online accounts despite the advent of Microsoft HealthVault and Google Health.

And then, in this era of both de facto and de jure deregulation, there are the international product supply chains providing dangerous toys and potential ‘mad cow’ meat products to unsuspecting consumers. Unscrupulous supply chain participants will always hide in the ‘fog’ of their supply chains. The manufacturers of safe products want to differentiate themselves from the manufacturers of unsafe products. But, again, fear factors keep the good manufacturers from posting information online that may put them at a competitive disadvantage to downstream competitors.

I'm painting a large picture here but what Hillis is talking about is not limited to the bureaucratic ownership of data but to matching up his Knowledge Web with another system - like the Ownership Web - for automatically working out the data ownership issues.

But bouncing around ideas about how we need data ownership is not the same as developing methods or designs to solve it. What Hillis non-provisionally filed, subsequent to his provisional application, was the Knowledge Web (aka Freebase) application. Because of its emphasis upon the statistical reliability of annotations, Knowledge web's IP is tailored made for the Semantic Web.  See the blogged entry US Patent App 20050086188: Knowledge Web (Hillis, Daniel W. et al). And because the conventional wisdom within Silicon Valley is that the Semantic Web is about to emerge, Metaweb is being funded like it is “the next big thing”. Metaweb’s Series B raised $42.4M more in January, 2008. What Hillis well recognizes is that as Freebase strives to become the premier knowledge source for the Web, it will need access to new, blue oceans of data residing within the Ownership Web.

2093760-1660740-thumbnail.jpgRadar Networks may be the “next, next big thing”. Also a pre-revenue San Francisco start-up, its bankable founder, Nova Spivack, has gone out of his way to state that his product Twine™ is more like a semantic Facebook while Metaweb’s Freebase is more like a semantic Wikipedia. Twine employs W3C standards in a community-driven, bottom up process, from which mappings are created to infer a higher resolution (see thumbnail to the right) of semantic equivalences or connections among and between the data inputted by social networkers. Again, this data is modifiable by the authors as better information becomes available to them. Twine holds four pending U.S. patent applications though none of these applications. See the blogged entry US Patent App 20040158455:  Methods and systems for managing entities in a computing device using semantic objects (Radar Networks). Twine’s Series B raised $15M-$20M in February, 2008 following on the heels of Metaweb's latest round. Twine’s approach in its systems and its IP is to emphasize perhaps a higher resolution Web than that of MetaWeb. Twine and the Ownership Web should be especially complimentary to each other in regard to object granularity. You can see this, back above, in the comparative resemblance between the thumbnail image of Spivack's Data Record ID object with the thumbnail image of Pardalis' Informational Object. Nonetheless, the IP supportive of Twine, like that Hillis' Knowledge Web, places a strong emphasis upon the statistical reliability of information. Twine's IP is tailored made for the Semantic Web.

Dossia is a private consortium pursuing the development of a national, personally controlled health record (PCHR) system. Dossia is also governed by very large organizations like AT&T, BP America, Cardinal Health, Intel, Pitney Bowes and Wal-Mart. In September, 2007, Dossia outsourced development to the IndivoHealth™ PCHR system. IndivoHealth, funded from public and private health grants, shares Pardalis' philosophy that "consumers are managing bank accounts, investments, and purchases online, and … they will expect this level of control to be extended to online medical portfolios." IndivoHealth empowers patients with direct access to their centralized electronic medical records via the Web.

But given the current industry needs for a generic storage model, the IndivoHealth medical records, though wrapped in an XML structure (see the next paragraph), are essentially still just paper documents in electronic format. IndivoHealth falls far short of empowering patients with the kind of control that people intuitively recognize as ‘ownership’. See US Patent Application 20040199765 entitled System and method for providing personal control of access to confidential records over a public network in which access privileges include "reading, creating, modifying, annotating, and deleting." And it reasonably follows that this is one reason why personal health record initiatives like those of not just Dossia, but also Microsoft’s HealthVault™ and GoogleHealth™, are not tipping the balance. For Microsoft and Google another reason is that they so far have not been able to think themselves out of the silos of the current privacy paradigm. The Ownership Web is highly disruptive of the prevailing privacy paradigm because it empowers individuals with direct control over their radically standardized, immutable data.

World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. W3C is headed by Sir Tim Berners-Lee, creator of the first web browser and the primary originator of the Web specifications for URL, HTTP and HTML. These are the principal technologies that form the basis of the World Wide Web. W3C has further developed standards for XML, SPARQL, RDF, URI, OWL, and GRDDL with the intention of facilitating the Semantic Web. While Berners-Lee has described in his own words (above) his perplexity about data ownership, nonetheless, the data object standards created by the W3C should be more than friendly to an Ownership Web employing object-oriented architecture. Surely, in Common Point Authoring™ will be found many of the ‘best of breed’ standards for an Ownership Web that is most complimentary to the emerging Semantic Web.

2093760-1723853-thumbnail.jpgEPCglobal is a private, standards setting consortium governed by very large organizations like Cisco Systems, Wal-Mart, Hewlett-Packard, DHL, Dow Chemical Company, Lockheed Martin, Novartis Pharma AG, Johnson & Johnson, Sony Corporation and Proctor & Gamble. EPCglobal is architecting essential, core services (see EPCglobal's Architectural Framework in the thumbnail to the right) for tracking physical products identified by unique electronic product codes (including RFID tags) across and within enterprise-scale, relational database systems controlled by large organizations.

Though it would be a natural extension to do so, EPCglobal has yet to envision providing its large organizations (and small businesses, individual supply chain participants and even consumers) with the ability to independently author, track, control and discover granularly identified informational products. See the blogged entry EPCglobal & Prescription Drug Tracking. It is not difficult to imagine that the Semantic Web, without a complimentary Ownership Web, would frankly be abhorrent to EPCglobal and its member organizations. For the Semantic Web to have any reasonable chance of connecting itself into global product and service supply chains, it must work through the Ownership Web.

Ownership Web: Where It Will Begin

The Ownership Web will begin along complex product and service supply chains where information must be trustworthy, period. Statistical reliability is not enough. And, in fact, the Ownership Web is beginning to form along the most dysfunctional of information supply chains. But that's for discussion in later blogs, as the planks of the Ownership Web are nailed into place, one by one.


[This concludes Part II of a three part series. On to Part III.]

Monday
May122008

Data Portability, Traceability and Data Ownership - Part IV

[return to Part III]

Connecting Portability to Traceability

Let’s begin this final part with a nicely presented video interview of Tim Berners-Lee, the widely acclaimed inventor of the World Wide Web, by Technology Review.

Video: Tim Berners-Lee on the Semantic Web
Technology Review (March, 2007)
Clicking on this link opens the video in a separate window for an 8 min 24 sec video.
Close that window when the video is complete and you'll be returned here.

 
Berners-Lee has a degree in physics from The Queen’s College, Oxford. He well expresses in the video the insight of an academic technologist preaching the benefits of the emerging Semantic Web as, essentially, one big, connected database.

For instance, Berners-Lee discusses life sciences not once but twice during this interview in the context of making more and better semantically connected information available to doctors, emergency responders and other healthcare workers. He sees this, and rightly so, as being particularly important to fight both (a) epidemics and pandemics, and (b) more persistent diseases like cancer and Alzheimer’s. Presumably that means access to personal health records. However, there is no mention in this interview about concerns over the ownership of information.

Here’s a more recent interview excerpt in March, 2008, initiated by interviewer Paul Miller of ZDNet, in which Berners-Lee does acknowledge data ownership fear factors.

Miller (03:21): “You talked a little bit about people's concerns … with loss of control or loss of credibility, or loss of visibility. Are those concerns justified or is it simply an outmoded way of looking at how you appear on the Web?”

Berners-Lee: “I think that both are true. In a way it is reasonable to worry in an organization … You own that data, you are worried that if it is exposed, people will start criticizing [you] ….

So, there are some organizations where if you do just sort of naively expose data, society doesn't work very well and you have to be careful to watch your backside. But, on the other hand, if that is the case, there is a problem. [T]he Semantic Web is about integration, it is like getting power when you use the data, it is giving people in the company the ability to do queries across the huge amounts of data the company has.

And if a company doesn't do that, then, it will be seriously disadvantaged competitively. If a company has got this feeling where people don't want other people in the company to know what is going on, then, it has already got a problem ….

Well actually, it would expose... all these inconsistencies. Well, in a way, you (sic) got the inconsistencies already, if it exposes them then actually it helps you. So, I think, it is important for the leadership in the company … to give kudos to the people who provided the data upon which a decision was made, even though they weren't the people who made the decision.” (emphasis added)

Elsewhere in this ZDNet interview, Berners-Lee announces that the core pieces for development of the Semantic Web are now in place (i.e., SPARQL, RDF, URI, XML, OWL, and GRDDL). But, again, what I find lacking is that these core pieces do not by themselves provide a mechanism for addressing data ownership issues.

I wish I could introduce Berners-Lee to Marshall Van Alstyne.

Actually, they may already know each other. Like Berners-Lee, Van Alstyne is a professor at the Massachusetts Institute of Technology. Van Alstyne is an information economist whose work in the area of data ownership I have greatly admired for some time (though I have yet to have had the pleasure of making his acquaintance).

There are other noteworthy recent papers by Van Alstyne but, since I first came across it several years ago, I have continued to be enamored with the prescience of a 1994 publication he co-authored entitled, Why Not One Big Database? Ownership Principles for Database Design. Here’s my favorite quote from that paper.

The fundamental point of this research is that ownership matters. Any group that provides data to other parts of an organization requires compensation for being the source of that data. When it is impossible to provide an explicit contract that rewards those who create and maintain data, "ownership" will be the best way to provide incentives. Otherwise, and despite the best available technology, an organization has not chosen its best incentives and the subtle intangible costs of low effort will appear as distorted, missing, or unusable data.” (emphasis added)

Whether they know each other or not, the reason I would want to see them introduced is that I don’t hear Van Alstyne’s socio-economic themes in the voice of Berners-Lee. In fact I have checked out the online biographies provided by the World Wide Web Consortium (W3C) of the very fine team that Berners-Lee, as the head of W3C, has brought together. I find no references to academic degrees or experiential backgrounds in either sociology or economics. The W3C team is heavily laden with technologists.

And, why not? After all, the mission of the W3C is one of setting standards for the technological marvel that is the World Wide Web. One must set boundaries and bring focus to any enterprise or endeavor, and Berners-Lee has reasonably done so by directing the W3C team to connect the data that society is either already providing, albeit free of data ownership concerns (i.e., the information already available in massively populated government databases, academic databases, or other publicly accessible sources).

It’s just that I wish there was some cross-pollination going on between the W3C and the likes of Van Alstyne that was resulting, for instance, in something like author-controlled XML (A-XML) as exampled in Parts II and III, above (and, again, below).

That the W3C is not focusing on data ownership is an opportunity for the likes of Dataportability.org. Similarly, as mentioned in Part III, above, in the world of supply chains a likely candidate for a central ‘any product data bank’ would be EPCglobal, the non-profit supply chain consortium. But EPCglobal is a long way from focusing on the kind of data ownership proposed in this writing, or perhaps even envisioning as an organization that they might want to do so.

Like EPCglobal within the ecology of supply chains, Dataportability.org has seated at its table some very powerful members of the social networking ecology (i.e., Google, Plaxo, Facebook, LinkedIn, Twitter, Flickr, SixApart and Microsoft). There is a critical mass in those members that provides an opportunity for an organization like Dataportability.org to become a neutral, central data bank for portable information among its members for the benefit of social networking subscribers.

For instance, for e-mail addresses desired by a Facebook subscriber to be portable to other social networking websites, Facebook would add tools to the subscriber's interface for seamless registration of the e-mail addresses with a central, portability database branded with Facebook's trademark (but in fact separately administered by Dataportability.org).  The subscriber would merely enter the chosen e-mail addresses into his or her interface, click on the 'register' button, and automatically author the following draft XML object ...

<?xml version="1.0" encoding="UTF-8" ?>
<PortabilityDictionary_DraftElements>
<emailaddr>noname01@pardalis.com</emailaddr>
<emailaddr>noname02@pardalis.com</emailaddr>
<emailaddr>noname03@pardalis.com</emailaddr>
</PortabilityDictionary_DraftElements>

... which would come to be registered in the central portability 'bank' (again, administered by Dataportability.org) as the following XML object.

<?xml version="1.0" encoding="UTF-8" ?>
<PortabilityDictionary_RegisteredElements>

<emailaddr UniquePointer =
" http://www.centralportabilitybank.org/email_IDs/21263 "/>

<emailaddr UniquePointer =
" http://www.centralportabilitybank.org/email_IDs/21264 "/>

<emailaddr UniquePointer =
" http://www.centralportabilitybank.org/email_IDs/21265 "/>

</PortabilityDictionary_RegisteredElements>

Again, as illustrated in Part III, above, this would set the stage for a viable model for Dataportability.org, as a non-profit consortium managed by the likes of Facebook, Flickr, etc., to provide more than just portability services. Now, with a centralized registry service for A-XML objects (i.e., author-controlled, informational objects) the portability service could easily be stretched into a non-collaborative data authoring and sharing service.

IP Comment: Compare and contrast the collaborative data authoring and sharing systems illustrated by Xerox's US Patent 5,220,657, Updating local copy of shared data in a collaborative system Φ and eiSolutions' US Patent 6,240,414, Method of resolving data conflicts in a shared data environment.

And, again, the 'data ownership' service would presumably be branded by each of the distributed ‘bank members’ (like Facebook, Flikr, etc.) as their own service.

What might this data ownership service entail? To instill confidence in subscribers that they ‘own’ their portable data, what could be provided to members by Facebook, Flickr, etc. as part of the data ownership service made possible by the central Dataportability.org?

For instance: 

  • Each time an administrative action is taken by Dataportability.org affecting the registered data object - or a granular data element within a registered object - the subscriber could choose to be automatically notified with a fine-grained report.
  • Each time the registered data object is shared - or data elements within the object are granularly shared - according to the permissions established by the subscriber, he or she could choose be immediately, electronically notified with a fine-grained report.
  • Online, on-demand granular information traceability reports (i.e., fine-grained reports mapping out who accesses or uses a subscribers shared information)
  • Catastrophe data back-up services
  • etc. 

Thus could Dataportability.org light a data ownership pathway for both the W3C and EPCglobal. 

Concluding Remarks 

The fundamental point of this multi-entry blog is that data ownership matters. With it, the Semantic Web stands the best chance for reaching its full potential for the porting of records between and among social networking sites, and for the tracking and discovering of information along both information and product supply chains.

And holding that positive thought in mind, it’s time to end this writing with a little portability rock n’ roll. It's courtesy of Danny Ayers. Enjoy!

Friday
Apr182008

Dataportability, Traceability and Data Ownership - Part III

[Return to Part II]

The Value Proposition of Data Ownership

Thanks to Henry Story for stopping by to comment on the XML object examples offered in Part II.

"Yes, unique identifiers are very helpful. But numbers rarely uniquely identify anything. Replace your numbers above with URIs (Universal Resource Identifiers) and you have not only a proven system of unique IDs, you also have (especially with http URIs) a well understood way of dereferencing the information. Then you no longer need a specialised name server. This is what the web part of the semantic web is about [which I wrote about in the Sun Bablefish blog entitled hyperdata posted September 20, 2007]. You then move out of supply chains, into supply networks, which I wrote up in another blog [entitled Supply Networks posted April 19, 2007]." (emphasis added)

The end-game goal of the emerging Semantic Web is to interconnect data so that it becomes a ‘hyperdata’ machine. Nonetheless, as Story has previously propounded, there is more to it than technology. There is also the need for policies or other non-technological means that address “who should see what data, who should be able to copy that data, and what they should be able to do with it.”

For some people the Semantic Web will be a technological wonder to behold. Others will be scared stiff by it. Many will feel both awe and trepidation. But not to be forgotten is that people matter more than the Web, itself. A Semantic Web that people view as outside of their control will be a machine that can only become a shadow of its full potential because people, businesses and, yes, even governments will not fully participate.

Previously, in Banking on Granular Information Ownership I offered this.

"People are comfortable and familiar with monetary banks. That’s a good thing because without people willingly depositing their money into banks, there would be no banking system as we know it. Banks need access to people’s money into order to make profits. Without a healthy monetary banking system our economies would be comparatively dysfunctional, and our personal lives would be critically deficient in opportunities."

The same thing can be said about the emerging Semantic Web. People will need to be made comfortable and familiar with the Semantic Web. Without people willingly depositing their information to this new Web, it will fall far short of its inherent capacity for growth.

Moreover, the Semantic Web will need access to people’s information in order make profits, no matter what the business model is. The opportunities for the Semantic Web to enrich our economies and our personal lives will be diminished without ‘buy in’ by the people whom it is envisioned to serve. The value proposition of data ownership is that it provides the most acceptable technological and socio-political pathway for adoption by ordinary people of the emerging Semantic Web.

It is because people matter more than the Web that ‘specialized name servers’ will play a large role. Using the hypothetical domain name ‘www.toydatabank.org’ I have added the following A-XML example to the continuum of examples begun in Part II. I have wrapped some of the following lines of code, and inserted spacing, for easier reading.

<?xml version="1.0" encoding="UTF-8" ?>
<Pedigree>

<PedigreeID UniquePointer =
" http://www.toydatabank.org/toymfg/Object_IDs/99087 "/>

<ManufacturerID UniquePointer =
" http://www.toydatabank.org/toymfg/mfg_IDs/00372 "/>

<ProductSerialNumber UniquePointer =
" http://www.toydatabank.org/toymfg/element_IDs/43229 "/>

<ProductDescription UniquePointer =
" http://www.toydatabank.org/toymfg/element_IDs/23444 "/>

<ProductInfoToSupplyChain UniquePointer =
" http://www.toydatabank.org/toymfg/element_IDs/66221 "/>

<ProductInfoToGovtRegulator UniquePointer =
" http://www.toydatabank.org/toymfg/element_IDs/66333 "/>

<Permissions UniquePointer =
" http://www.toydatabank.org/toymfg/Permissions_IDs/37911 "/>

<!-- Manufacturer information sharing permissions -->
<OtherData>Document Type Definitions</OtherData>
</Pedigree>

Combine a specialized name server with a centralized dictionary of uniquely identified (and standardized) data elements, a centralized registry of A-XML informational objects, an author-controlled permissions database, a distributed A-XML editor/reader and you have the essential components of what I call a supply chain ‘data bank’.

What does a data bank do? It depends on the supply chain, the social network or, as Henry Story has very neatly coined, the ‘supply network’. The white paper, Banking on Granular Information Ownership, covers much of this territory in a less technological manner with examples applicable to personal health records, food safety, product tracking, people tracking, and transactional tracking.

However, I want to add that - conceptually - the connatural, non-collaborative disposition of technological data ownership is a perfect compliment to the approach that Wikipedia has taken in fostering the collaborative authoring of encyclopedic entries. I say ‘conceptually’ because Wikipedia’s entries are collaborative though non-structured. But what if Wikipedia’s collaborative processes and methods for approving unstructured information were applied to structured information?

That is, what if the information account holders of a toy data bank were empowered to collaboratively add to their data bank’s dictionary of structured data elements so that all account holders may then draw upon them non-collaboratively for the A-XML objects each account holder authors and controls?

Consider that a supply chain member of the toy data bank wishes to add to our toy product pedigree example in Part II the language in red.

Product Pedigree Document
Manufacturer ID = Safe Toy Company
Product Serial Number = STOY991
Product Description = Painted Toy
Product Info To Supply Chain = 0% lead in paint
Product Info To Govt Regulator = Less than 600ppm of lead in paint by weight
Product Child Labor = No child labor used

The supply chain participant, using the toy data bank’s XML editor, authors a draft of the following XML data object  …

<?xml version="1.0" encoding="UTF-8" ?>
<ToyDictionary_DraftElements>
<ToyProductChildLabor>No child labor used</ToyProductChildLabor>
</ToyDictionary_DraftElements>

… that - if adopted by the toy data bank – will be deposited into a standardized toy data bank ‘dictionary’ of XML structured data elements. These would then be available for A-XML authoring by any toy supply chain participant who is a member of the toy data bank. Again, I have wrapped some of the following lines of code, etc., for easier reading.

<?xml version="1.0" encoding="UTF-8" ?>
<ToyDictionary_RegisteredElements>

<ToyProductChildLabor UniquePointer =
" http://www.toydatabank.org/toymfg/element_IDs/12637 "/>

</ToyDictionary_RegisteredElements>

And taking the ‘data bank’ analogy one step further. Let’s say that the adoption of the ‘Product Child Labor’ data element by the toy data bank involves the alternative approval of a central ‘product data bank’ overseeing a larger standardized ‘dictionary’ applicable to products of all kinds (e.g., toys, pharmaceuticals, livestock, food, etc.).

<?xml version="1.0" encoding="UTF-8" ?>
<AnyProductDictionary_RegisteredElements>

<AnyProductChildLabor UniquePointer =
" http://www.anyproductdatabank.org/prodmfg/element_IDs/73621 "/>

</AnyProductDictionary_RegisteredElements>

In the world of supply chains, a likely candidate for such a central ‘any product data bank’ would be EPCglobal, the private, standards setting consortium governed by very large organizations like Cisco Systems, Wal-Mart, Hewlett-Packard, DHL, Dow Chemical Company, Lockheed Martin, Novartis Pharma AG, Johnson & Johnson, Sony Corporation and Proctor & Gamble. EPCglobal is architecting essential, core services for tracking physical products identified by unique electronic product codes (including RFID tags) across and within enterprise systems controlled by large organizations.

The crux of this multi-entry blog is that data ownership – that is, technological data ownership – paradoxically provides a non-technological ‘something more’ that will be a necessary ingredient to the emerging Semantic Web. It will do so by empowering supply chain participants with non-collaborative authoring of granular, structured informational objects that may remain within the visibility and control of the author even as they are shared within a complex supply chain.

And with that, I think I have pretty much all the pieces I need for a final Part IV.

[continued in Part IV]

Thursday
Apr102008

Portability, Traceability and Data Ownership - Part II

[return to Part I]

The Dilemma of Missing Information

Here is a four minute video interview of Chris Saad, Co-founder and CEO of Faraday Media. If you are pressed for time, just catch the first minute and a half. Chris is also Co-founder and Chairperson of Dataportability.org of which Faraday Media is a sponsor. In Part I of this multi-entry blog I began with the video clip called Data Portability – Video that is a promo for Dataportability.org.



Learning from the Future at the Next Web with Chris Saad from Maarten on Vimeo.

Right after the Facebook/Scoble incident, Dataportability.org gained momentum and membership from individuals associated with the likes of Google, Plaxo, Facebook, LinkedIn, Twitter, Flickr, SixApart and Microsoft. At Chris' suggestion I, too, have just recently joined their DataPortability Policy Action Group.

Henry Story, a staff engineer for Sun Microsystems, made the following interesting comments on the Sun Babelfish blog about Chris Saad’s Data Portability group and the Data Portability – Video.

“Will the Data Portability group [at Dataportability.org] get the best solution together? …. [O]ne wonders whether XML is not the solution to their problem. Won't XML make data portability possible, if everyone agrees on what they want to port? Of course getting that agreement on all the topics in the world is a never ending process....

But the question is also whether portability is the right issue. Well in some ways it is. Currently each web site has information locked up in html formats … [which makes] it difficult to export the data, which each service wants to hold onto as if it was theirs to own.

Another way of looking at this is that the Data Portability group cannot so much be about technology as policy. The general questions it has to address are question of who should see what data, who should be able to copy that data, and what they should be able to do with it. As a result the policy issue of Data Portability does require one to solve the technical problem of distributed identity: how can people maintain the minimum number of identities on the web? (ie not one per site) Another issue that follows right upon the first is that if one wants information to only be visible to a select group of people - the "who sees what" part of the question - then one also needs a distributed way to be able to specify group membership, be it friendship based or other. The [Data Portability – Video] … makes that point very clearly why having to recreate one's social network on every site is impractical.

Story’s comments are a good setup for what I want to address. And what I want to address is how to make a connection between data portability and what I call the ‘frayed ends and laterals’ of complex product supply chains.

Along the way I want to pay attention to those readers (i.e., the vast majority of the regular, non-techie folks in the world) who are hanging back wondering what an XML object is. Let’s weave in a little history with a simple example, shall we?

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. W3C is headed by Sir Tim Berners-Lee, creator of the first web browser and the primary author of the original Uniform Resource Locator (URL), HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML) specifications. These are the principal technologies that form the basis of the World Wide Web.

For example, consider this product pedigree written in natural language.

Product Pedigree Document
Manufacturer ID = Safe Toy Company
Product Serial Number = STOY991
Product Description = Painted Toy
Product Info To Supply Chain = 0% lead in paint
Product Info To Govt Regulator = Less than 600ppm of lead in paint by weight

A beneficial characteristic of the World Wide Web is that you can read language like the foregoing example in a natural way but ‘behind the scenes’ (i.e., behind the web browser interface) this natural language representation can be constructed in different ways for different purposes.

The same natural language representation written as an HTML information object using an HTML authoring software application (also called an HTML editor) would read behind the scenes as follows.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<body><p>
Product Pedigree Document<br>
Manufacturer ID = Safe Toy Company<br>
Product Serial Number = STOY991<br>
Product Description = Painted Toy<br>
Product Info To Supply Chain = 0% lead in paint<br>
ProductInfo To Govt Regulator = Less than 600ppm of lead in paint by weight
</p></body></html>

Because HTML objects are designed primarily for creating static websites, and not for dynamic information sharing, W3C has further developed standards for structured electronic sharing in the form of Extensible Markup Language (XML) objects for facilitating the emerging Semantic Web.

With gracious assistance from my good friend and collaborator, Dr. Marvin Stone, here’s an example of a granular XML information object created in an XML editor that would be naturally represented through a web browser as above.

<?xml version="1.0" encoding="UTF-8" ?>
<Pedigree>
<ManufacturerID>Safe Toy Company</ManufacturerID>
<ProductSerialNumber>STOY991</ProductSerialNumber>
<ProductDescription>Painted Toy</ProductDescription>
<ProductInfoToSupplyChain>0% lead in paint</ProductInfoToSupplyChain>
<ProductInfoToGovtRegulator>Less than 600ppm of lead in paint by weight</ProductInfoToGovtRegulator>
<OtherData>Document Type Definitions</OtherData>
</Pedigree>

This type of granular XML object works fine for short, vertically integrated supply chains covered by one or two enterprise systems where a small number of supply chain participants agree on what they want to port. But due to prevalent fear factors (and other policies) that prevent or otherwise affect information sharing along lengthy, complex information supply chains, there is a critical need for a more refined XML tool.

Here’s an example of a hypothetical, author-controlled XML object that would be created/authored/constructed using an extension to the foregoing XML editor that we could call an A-XML editor extension (i.e., author-controlled XML editor extension).

<?xml version="1.0" encoding="UTF-8" ?>
<Pedigree>
<PedigreeID UniquePointer =" 99087 "/>
<ManufacturerID UniquePointer =" 00372 "/>
<ProductSerialNumber UniquePointer =" 43229 "/>
<ProductDescription UniquePointer =" 23444 "/>
<ProductInfoToSupplyChain UniquePointer =" 66221 "/>
<ProductInfoToGovtRegulator UniquePointer =" 66333 "/>
<Permissions UniquePointer =" 37911 "/>
     <!-- Manufacturer information sharing permissions -->
<OtherData>Document Type Definitions</OtherData>
</Pedigree>

In the process of being authored by the toy manufacturer, this A-XML object would be constructed to point to a central repository of uniquely identified data containing the toy manufacturer's unique ID, the unique identifiers of the painted toy’s pedigree, and a unique identifier of the toy manufacturer's information sharing permissions.

Once distributed by the manufacturer/author to a lengthy supply chain, this A-XML object would provide greater control, visibility and traceability one-share, two-shares, three-shares, etc. away from the author. As other supply chain participants access the A-XML object (using a compatible XML editor) to confirm the toy’s pedigree, the toy manufacturer would be provided with supply chain visibility never before experienced.

For instance, the data element "0% lead in paint" uniquely identified as 66221 would be accessible by any supply chain participant registered with the central repository and using a compatible XML editor. The data element "Less than 600ppm of lead in paint by weight" uniquely identified as 66333 would only be accessible by permitted government regulators also registered with the central repository. (For those of you concerned with the ethics of representing one thing to consumers while reporting something else to the government, check out Are Food Labels Reliable?)

In my first journal entry to this blog I offered this:

“Unscrupulous supply chain participants will always try to hide in the ‘fog’ of their supply chains. The manufacturers of safe products want to differentiate themselves from the manufacturers of unsafe products. But, again, fear factors keep the good manufacturers from posting information online that may put them at a competitive disadvantage to downstream competitors.”

There’s a chicken and egg effect here, isn’t there? That is, which comes first, policy or technology?

Here’s one answer.

Don’t throw the baby out with the bath water. Don’t get rid of the supply chain enterprise and legacy systems that are already providing useful information sharing without the data ownership characteristics of a tool like A-XML. But in the context of an emerging Semantic Web that will lean heavily upon software-as-a-service, consider the missing and incomplete information that is not being shared from the frayed ends and laterals of complex product supply chains.

And, ask yourself, could there be both a technological and socio-political connection made between data portability and supply chain traceability?

[continued in Part III]

Monday
Mar312008

EPCglobal & Prescription Drug Tracking

Andrew Pollack authored an article in the New York Times on March 26, 2008 entitled California Delays Plan to Track Prescription Drugs.

"In a reprieve for the pharmaceutical industry, California regulators agreed on Tuesday to delay by two years a requirement that all prescription drugs be electronically tracked as a means of thwarting counterfeiting.....

The California plan would require that drugs be tracked electronically from the manufacturer through the wholesaler to the pharmacy. Each bottle of pills sold to a pharmacy would have to have a unique serial number, encoded in a bar code or a radio-frequency identification tag.....

Pharmaceutical manufacturers [said that] putting a unique serial number on each container would require changing their packaging lines, which would cost millions of dollars and take years. […] Pharmacies and wholesalers, meanwhile, said they could not install the software and the equipment needed to read the serial numbers until they knew what systems the drug manufacturers would use."

Though not directly identified in Pollack's article, EPCglobal is a leader in establishing standards in the area of drug tracking. EPCglobal is a private, standards setting consortium governed by very large organizations like Cisco Systems, Wal-Mart, Hewlett-Packard, DHL, Dow Chemical Company, Lockheed Martin, Novartis Pharma AG, Johnson & Johnson, Sony Corporation and Proctor & Gamble. EPCglobal is architecting essential, core services for tracking physical products identified by unique electronic product codes (including RFID tags) across and within enterprise systems controlled by large organizations.

I submitted a comment to EPCglobal on January 22, 2008 about EPCglobal's Architecture Framework. You will see that the comment is addressed to Mark Frey who courteously and immediately responded that he had forwarded it to EPCglobal's Architectural Review Committee.

This is a 10 page comment (including exhibits) about broader data ownership issues than just those relating to electronic pedigree documentation for use by pharmaceutical supply chain. But see the first full paragraph on page 5 where I said:

“[W]hile EPCglobal has begun establishing forward-looking standards relative to electronic pedigree documentation for use by pharmaceutical supply chain participants [see EPCglobal Pedigree Ratified Standard Version 1.0 as of January 5, 2007], it has yet to include these standards within the EPCglobal Architecture Diagram.

With this comment I am proposing, by way of an illustrative example, that the methods developed by Pardalis within its IP may be used to derive essential specifications for connecting the current EPCglobal (EPCIS) Architecture with its ePedigree standards for the pharmaceutical industry."

The illustrative example referred to above is a mock Common Point Authoring (CPA) informational object. This illustrative example has a reference point beginning with a granular EPCglobal ePedigree document. The represented CPA informational object is the EPCglobal ePedgiree document that has been further granularized with mock XML tagging containing unique identifiers pointing to a CPA registered data element database.

My point is that EPCglobal has yet to develop standards for ePedigree document exchange that may be efficiently, flexibly and cost-effectively applied to the pharmaceutical supply chains for helping to reduce counterfeiting. Given the players who comprise EPCglobal, it is reasonable to presume that California regulators have essentially backed off enforcing their anti-counterfeiting regulations because EPCglobal has yet to catch up to the California plan. The plan was to take effect January 1, 2009. Now it has been pushed back to 2011.