Search
Subscribe

Bookmark and Share

About this Blog

As enterprise supply chains and consumer demand chains have beome globalized, they continue to inefficiently share information “one-up/one-down”. Profound "bullwhip effects" in the chains cause managers to scramble with inventory shortages and consumers attempting to understand product recalls, especially food safety recalls. Add to this the increasing usage of personal mobile devices by managers and consumers seeking real-time information about products, materials and ingredient sources. The popularity of mobile devices with consumers is inexorably tugging at enterprise IT departments to shifting to apps and services. But both consumer and enterprise data is a proprietary asset that must be selectively shared to be efficiently shared.

About Steve Holcombe

Unless otherwise noted, all content on this company blog site is authored by Steve Holcombe as President & CEO of Pardalis, Inc. More profile information: View Steve Holcombe's profile on LinkedIn

Follow @WholeChainCom™ at each of its online locations:

Entries in Semantic Trust (22)

Thursday
Sep042008

Freebase Parallax and the Ownership Web

What's Right About the Semantic Web

What’s right about the Semantic Web is that its most highly funded visionaries have envisioned beyond a Web of documents to a ‘Data Web’. Here's an example: a Web of scalably integrated data employing object registries envisioned by Metaweb Technologies’ Danny Hillis and manifested in Freebase Parallax™, a competitive platform and application to both Google and Wikipedia.

2093760-1729103-thumbnail.jpg
Aristotle
Metaweb Technologies
is a San Francisco start-up developing and patenting technology for a semantic ‘Knowledge Web’ marketed as Freebase Parallax. Philosophically, Freebase Parallax is a substitute for a great tutor, like Aristotle was for Alexander. Using Freebase Parallax users do not modify existing web documents but instead annotate them. The annotations of Amazon.com are the closest example but Freebase Parallax further links the annotations so that the documents are more understandable and more findable. Annotations are also modifiable by their authors as better information becomes available to them. Metaweb characterizes its service as an open, collaboratively-edited database (like Wikipedia, the free encyclopedia) of cross-linked data but, as you will see in the video below, it is really very much a next generation competitor to both Google and Wikipedia.

The Intellectual Property Behind Freebase Parallax

2093760-1693914-thumbnail.jpgClick on the thumbnail image to the left and you will see in more detail what Hillis envisions. That is, a database represented as a labeled graph, where data objects are connected by labeled links to each other and to concept nodes. For example, a concept node for a particular category contains two subcategories that are linked via labeled links "belongs-to" and "related-to" with text and picture. An entity comprises another concept that is linked via labeled links "refers-to," "picture-of," "associated-with," and "describes" with Web page, picture, audio clip, and data. For further information about this intellectual property - entitled Knowledge Web - see the blogged entry US Patent App 20050086188: Knowledge Web (Hillis, Daniel W. et al).

Freebase Parallax Incarnate

In the following video let's look at how this intellectual property for Knowledge Web is actually being engineered and applied by Metaweb Technologies in the form of Freebase Parallax.


Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.

The Semantic Web's Achilles Heel

You can hear it in the video. What Hillis and Metaweb Technologies well recognize is that as Freebase Parallax strives to become the premier knowledge source for the Web, it will need access to new, blue oceans of data. It must find a gateway into the closely-held, confidential and classified information that people consider to be their identity, that participants to complex supply chains consider to be confidential, and that governments classify as secret. That means that data ownership must be entered into the equation for the success of Freebase Parallax and the emerging Semantic Web in general.

Not that Hillis hasn't thought about data ownership. He has. You can see it in an interview conducted by his patent attorney and filed on December 21, 2001 in the provisional USPTO Patent Application 60/343,273:

Danny Hillis: "Here's another idea that's super simple. I've never seen it done. Maybe it's too simple. Let's go back to the terrorist version [of Knowledge Web]. There's a particular problem in the terrorist version that the information is, of course, highly classified .... Different people have some different needs to know about it and so on. What would be nice is if you ... asked for a piece of information. That you [want access to an] annotation that you know exists .... Let's say I've got a summary [of the annotation] that said,  'Osama bin Laden is traveling to Italy.' I'd like to know how do you know that. That's classified. Maybe I really have legitimate reasons for that. So what I'd like to do, is if I follow a link that I know exists to a classified thing, I'd like the same system that does that to automatically help me with the process of getting the clearance to access that material." [emphasis added]

What Hillis was tapping into just a few months after 9/11 is just as relevant to today's information sharing needs.

But bouncing around ideas about how we need data ownership is not the same as developing methods or designs to solve it. What Hillis non-provisionally filed, subsequent to his provisional application, was the Knowledge Web application. Because of its emphasis upon the statistical reliability of annotations, Knowledge web's IP is tailored made for the Semantic Web. But it is not designed for data ownership.

The Ownership Web

For the Semantic Web to reach its full potential, it must have access to more than just publicly available data sources. Only with the empowerment of technological data ownership in the hands of people, businesses, and governments will the Semantic Web make contact with a horizon of new, ‘blue ocean’ data.

Conceptually, the Ownership Web would be separate from the Semantic Web, though semantically connected as layer of distributed, enterprise-class web platforms residing in the Cloud.

Ownership%20Web.PNG

The Ownership Web would contain diverse registries of uniquely identified data elements for the direct authoring, and further registration, of uniquely identified data objects. Using these platforms people, businesses and governments would directly host the authoring, publication, sharing, control and tracking of the movement of their data objects.

The technological construct best suited for the dynamic of networked efficiency, scalability, granularity and trustworthy ownership is the data object in the form of an immutable, granularly identified, ‘informational’ object.

A marketing construct well suited to relying upon the trustworthiness of immutable, informational objects would be the 'data bank'.

Data Banking

Bank_Man%20and%20Money%20Supporting.PNG Traditional monetary banks meet the expectations of real people and real businesses in the real world.

People are comfortable and familiar with monetary banks. That’s a good thing because without people willingly depositing their money into banks, there would be no banking system as we know it. By comparison, we live in a world that is at once awash in on-demand information courtesy of the Internet, and at the same time the Internet is strangely impotent when it comes to information ownership.

In many respects the Internet is like the Wild West because there is no information web similar to our monetary banking system. No similar integrated system exists for precisely and efficiently delivering our medical records to a new physician, or for providing access to a health history of the specific animal slaughtered for that purchased steak. Nothing out there compares with how the banking system facilitates gasoline purchases.

If an analogy to the Wild West is apropos, then it is interesting to reflect upon the history of a bank like Wells Fargo, formed in 1852 in response to the California gold rush. Wells Fargo wasn’t just a monetary bank, it was also an express delivery company of its time for transporting gold, mail and valuables across the Wild West. While we are now accustomed to next morning, overnight delivery between the coasts, Wells Fargo captured the imagination of the nation by connecting San Francisco and the East coast with its Pony Express. As further described in Banking on Granular Information Ownership, today’s Web needs data banks that do for the on-going gold rush on information what Wells Fargo did for the Forty-niners.

Banks meet the expectations of their customers by providing them with security, yes, but also credibility, compensation, control, convenience, integration and verification. It is the dynamic, transactional combination of these that instills in customers the confidence that they continue to own their money even while it is in the hands of a third-party bank.

A data bank must do no less.

Ownership Web: What's Philosophically Needed

Money_Brazilian.PNG Where exactly is the sweet spot of data ownership?

In truth, it will probably vary depending upon what kind of data bank we are talking about. Data ownership will be one thing for personal health records, another for product supply chains, and yet another for government classified information. And that's just for starters because there will no doubt be niches within niches, each with their own interpretation of data ownership. But the philosophical essence of the Ownership Web that will cut across all of these data banks will be this:

  • That information must be treated either or both as a tangible, commercial product or banked, traceable money.

The trustworthiness of information is crucial. Users will not be drawn to data banks if the information they author, store, publish and access can be modified. That means that even the authors themselves must be proscribed from modifying their information once registered with the data bank. Their information must take on the immutable characteristic of tangible, traceable property. While the Semantic Web is about the statistical reliability of data, the Ownership Web is about the reliability of data, period.

Ownership Web: What's Technologically Needed

What is technologically required is a flexible, integrated architectural framework for information object authoring and distribution. One that easily adjusts to the definition of data ownership as it is variously defined by the data banks serving each social network, information supply chain, and product supply chain. Users will interface with one or more ‘data banks’ employing this architectural framework. But the lowest common denominator will be the trusted, immutable informational objects that are authored and, where the definition of data ownership permits, controllable and traceable by each data owner one-step, two-steps, three-steps, etc. after the initial share.

2093760-1700737-thumbnail.jpgClick on the thumbnail to the left for the key architectural features for such a data bank. They include a common registry of standardized data elements, a registry of immutable informational objects, a tracking/billing database and, of course, a membership database. This is the architecture for what may be called a Common Point Authoring™ system. Again, where the definition of data ownership permits, users will host their own 'accounts' within a data bank, and serve as their own 'network administrators'. What is made possible by this architectural design is a distributed Cloud of systems (i.e., data banks). The overall implementation would be based upon a massive number of user interfaces (via API’s, web browsers, etc.) interacting via the Internet between a large number of data banks overseeing their respective enterprise-class, object-oriented database systems.

2093760-1666391-thumbnail.jpgClick on the thumbnail to the right for an example of an informational object and its contents as authored, registered, distributed and maintained with data bank services. Each comprises a unique identifier that designates the informational object, as well as one or more data elements (including personal identification), each of which itself is identified by a corresponding unique identifier. The informational object will also contain other data, such as ontological formatting data, permissions data, and metadata. The actual data elements that are associated with a registered (and therefore immutable) informational object would be typically stored in the Registered Data Element Database (look back at 124 in the preceding thumbnail). That is, the actual data elements and are linked via the use of pointers, which comprise the data element unique identifiers or URIs. Granular portability is built in. For more information see the blogged entry US Patent 6,671,696: Informational object authoring and distribution system (Pardalis Inc.).

The Beginning of the Ownership Web

Common Point Authoring is going live this fall in the form of a data bank for cattle producers in the upper plains. Why the livestock industry? Because well-followed commentators like Dr. Kris Ringwall, Director of the Dickinson Research Extension Center for North Dakota State University, recognize that there are now two distinct products being produced along our nation's complex agricultural supply chains: (1) a traditional product, and (2) an informational product describing the pedigree of the traditional product.

The following excerpt is from a BeefTalk article, Do We Exist Only If Someone Else Knows We Exist?, recently authored by Dr. Ringwall.

BeefTalk_Do%20We%20Exist.PNG"The concept of data collection is knocking on the door of the beef industry, but the concept is not registering. In fact, there actually is a fairly large disconnect.

This is ironic because most, if not all, beef producers pride themselves on their understanding of the skills needed to master the production of beef. Today, there is another player simply called “data.”

The information associated with individual cattle is critical. Producers need to understand how livestock production is viewed ....

That distinction is not being made and the ramifications are lost revenue in the actual value of the calf and lost future opportunity. This is critical for the future of the beef business ...."

Ownership Web: Where It Will Begin

The Ownership Web will begin along complex product and service supply chains where information must be trustworthy, period. Statistical reliability is not enough. And, as I mentioned above, the Ownership Web will begin this fall along an agricultural supply chain which is among the most challenging of supply chains when it comes to information ownership. Stay tuned as the planks of the Ownership Web are nailed into place, one by one.

Sunday
Aug172008

Scientific American: Can You Spare Some DNA?

In an aside to Traces of a Distant Past published in the July, 2008 issue of the Scientific American, author and Senior Editor, Gary Stix wrote:

"No matter what assurances are given, some groups will be reluctant to yield a cheek swab or blood sample. Investigators in this field may never achieve their goal of obtaining a set of samples that fully reflects every subtle gradation of human genetic diversity."

See specifically the side comment entitled Can You Spare Some DNA? at the top of page 8, below.

Read this document on Scribd: Traces of the Distant Past - SCIAM - June 08


I've e-mailed the following to Gary Stix.


From: "Steve Holcombe"
Sent: Sunday, August 17, 2008 12:02 PM
To: editors@sciam.com
Subject: For GaryStix re "Can You Spare Some DNA"


Gary,

You emphasize a very important point in this sub-article.

Would you have interest in an article exploring the movement from a documents web (the current web) to that of a data web (Web 3.0, semantic web)?

With the movement toward a data web there will be greater opportunities for 'data ownership' as defined by the actual information producers. The emergence of a data web should provide opportunities for ameliorating resistance to the sharing of genetic bio-material by empowering those who provide their genetic heritage with more direct, technological oversight and control over how the derived information is used, who uses, when they use it, etc.

I'm not saying that all American indigenous tribes would jump on the band wagon in providing their genetic material and information. That is, most people put their money in banks but there will always be a few who only put their money under their mattress, right? But there are technological means arising within the context of a data web that are specifically designed to address personal and societal fear factors that you well point out.

Hope to hear back from you.

Best regards,

Steve

It will be interesting to see if I receive a response from Gary Stix.

Thursday
Jul312008

DataPortability In-Motion Podcast: Episode 12 (Drummond Reed)

This is a transcription for filing in the Reference Library to this blog of the substantive dialogue found in Episode 12 of the DataPortability: In-Motion Podcast. The series is co-hosted by Trent Adams (TA), founder of Matchmine, and Steve Greenberg (SG), CEO of Ciabe.

The guest in Episode 12 is Drummond Reed (DR). Drummond is a founding Board Member at the Information Card Foundation, Vice-President of Infrastructure at Parity Communications, a member of the Standards Committee at Project VRM, a founding Board Member at OpenID Foundation, Co-Chair of the XRI Technical Committee at OASIS, the XDI Technical Committee at OASIS, the Secretary at XDI.org, and a pioneering inventor from as early as 1996 of certain ‘data web’ patents when, no doubt, it was difficult for Drummond to find anybody who actually knew what he was talking about. Well, now he has an audience and deservedly so.

The transcription picks up at the 6:50 minute mark (after announcements and banter between Trent and Steve) and goes straight through to the 50:01 minute mark when Drummond leaves the stage.

6:50 TA “… another great step that was announced this week is the launch of the Information Card Foundation.”

6:55 DR “You bet … I’m here actually at the Burton Group Catalyst Conference [held in San Diego, June 23 - 27, 2008] where [the announcement was made and we] got a really good reception. This is [a] very enterprise facing audience but I think actually that’s part of the message of what information cards are going to do [which will be to] bridge enterprise, identity and access management requirements with ... web scale requirements. So, it’s something that is, I think, important from the standpoint of data portability but also hooking it into …. enterprise systems and a lot of places where data resides and people are going to want to share it but very carefully.”

7:38 SG “So can you give us an overview of what info cards actually are?”

7:45 DR “Information cards are primarily first and foremost a visual metaphor – a consistent visual metaphor – for identity interactions. And this is intended to be across any device, and potentially – the way the Higgins Project is approaching this – any protocol. So they can be protocol independent meaning that I … can use and information card on my PC to log into a website … that will behind the scenes use the WS star stack protocol that Microsoft, and IBM and others have been driving. But I could ... use an iCard on an iPhone and be sharing a card directly with a friend and in the background that form of a card might use XDI as a protocol …. There’s even work [going on] to use iCards with OpenID. So it is first and foremost a visual metaphor and it’s based on a very simple concept which is the same thing we use for identity and sort of a basic form of data sharing - we might call claim sharing - in the real world which is you’ve got a wallet with a set of cards in it, and when you need to go someplace and prove your identity or give them some basic information you take out your wallet, pick the card that will work for that transaction, and ... give a copy to the other party, and they accept it or not, and that’s all there is to it.”

9:29 SG “So the big idea here is that it’s a standardized set of tools and APIs that developers can build to that supports various forms of identity, and that rather than having to rebuild ... a custom connector for every site you just write to this one API and then whatever the user has you can figure out a way to work with [it].”

9:54 DR “Yes, that’s very true and as you know, Steve … I’m working a lot now with the Higgins Project and what you just described is almost a perfect definition of what Higgins is about which is ... an identity framework but it’s primarily an abstraction layer to hook into almost any kind of system with what they call context providers - you know, a form of adapter into that system - to expose the data in the way it can be shared and the most common … interface that you’ll use to access a Higgins repository - that has a Higgins interface - to either pull out or put in some data is going to be information cards …. The whole purpose of the Higgins API is to give developers ... one API they can use to access identity data and potentially ... take it from one place and move it to the other – which is where it intersects with data portability. But to give one API to do that any place you want … that’s why it’s a big project.”

11:25 SG “So this is sort of an essential component of the OASIS data web, right?”

11:32 DR “It’s definitely a key implementation of [the OASIS data web]. It’s probably the text book implementation, right.”

11:39 SG “So can you talk to us for a bit about what [the] ‘data web’ actually is?”

11:43 DR “Sure, sure. Exactly …. The concept is pretty simple. We all know what the web has done for content, right? It’s turned our world upside down in the last fifteen, twenty years. And there’s an increasing need … to say that the web works great for content, for human readable … information, but can we apply the same concepts of interlinking and … this tremendous, unprecedented interoperability across domains? Can we apply that to ‘data’? You know, can we get it down to the point where machines can navigate and exchange information under the proper controls the way humans can today by …. If I want to browse over a whole set of documents … I am as a human selecting for this document, reading links, deciding what is relevant, and then choosing the next link I want and then activating that link and going to the next site and so on. That’s tremendously powerful but can … we get to the point where the data is understandable by a machine so a machine can follow the data – follow the links – in order to carry out something that a human wants to do without a human having to guide every decision. And to do that you have several requirements. To take the web and apply it to data … it’s a hard problem. It’s a very attractive solution. I mean that’s what got a bunch of us involved with this but it turns out there’s some real challenges and piece by piece that’s what we’ve been working on in two [indecipherable], in OASIS, and also other pieces of Higgins, and folks all over. There are a bunch of related efforts that now fall under the rubric of the data web.”

13:54 SG “And one of those, which you are also deeply involved in, is the XDI project. And that’s an enabling technology for the data web …. Why don’t you tell us about how XDIs fits into all of this and how link contracts [work] …? How does one bit of data actually connect to another bit of data?”

14:21 DR ”Yeah, exactly. I love the way you put that because in the context of data portability – I mean, ever since I first heard that term, data portability ... wearing my XDI TC [XDI Technical Committee at OASIS] hat I said ‘oh, my god’ that’s like the mission statement for the XDI technical committee in two words. What did we never just say, ‘hey, look’ its just data portability? It’s going to enable data portability. Well I can think of a couple thinks. Again, that’s why we call it data web because there are other things that the data web will enable but if there is one thing that is a headline feature it’s data portability. And so it’s very simple at a high level to explain what is XDI. XDI is a protocol for sharing data just like HTTP has become a protocol for sharing contents … And I have to clarify .… The web really is comprised of three pieces: [HTTP, HTML and URIs]. It’s those three things that created the web, and those of use working on the data web have long said that we needed … to take each of those three things and basically add more structure. The data web is increasingly being called the ‘structured web’ because you need to … add more structure to the documents so instead of HTML documents you need a more structured document format … It’s really a database format is what you’re putting on the wire with XDI as a protocol. And then you need more structure to the links. The links have to be able to granularly address the data … anywhere on the data web. So, it’s not enough to just say ‘here’s a document’ and, then, oh, there’s a hash line if you actually want to point to some anchor, some bookmark, inside the document. That’s what web addresses can do today. They can get you to a document and then if you want they can get you to one point inside that document but if you want to go any further … if you wanted to pass around an hCard or a vCard … in the classic data portability sets, well how do you address the … ‘zip code’ inside that? And if you have a phone number [that is structured] how do you address the area code? Database technologies have a way of doing that. Even XML and XPath provide you with a way to do that. With the data web you have to essentially be able to address data at any level from the smallest atomic field … to the largest collection … which have been called ‘data web sites’ …. So if you theoretically took … an existing data base and put it on the data web it would be very much like hooking up existing content repositories. An example. Take newspaper classifieds. Put them on the web and suddenly you can display them all as HTML pages, right?”

17:54 SG “So the way I sort of try to explain this to people is if you think about what SQL is …. A SQL query is the programmer … describing to the system what the data he wants back looks like, right? So based on attributes of the data, based on … samples or fields, you say to the system [to] go out, look at all your data and give me the stuff that looks like this. And … specifying … the what rather than the how…. And this is sort of the way of applying that same metaphor to all the data that lives on all the various sites that may not even know they’re connected to each other.”

18:41 DR “Yes, exactly. It’s literally … some of the folks on the XDI TC do look at as SQL taken to the web wide level, so whether it’s SQL or whether it’s … LDAP, when your talking to a directory system then you usually using … a directory access protocol like LDAP.”

19:05 SG “But LDAP is hierarchical, right? So that to me is always the difference is that … you have to know the structure of the initial data store to work with LDAP whereas if you look at something like SPARQL or SQL you just say, if one of these things is connected to one of those things and its got these attributes and there’s five of them, tell me about it.”

19:34 DR “Yeah, exactly.”

19:36 SG “Describe the sort of thing that you want to get back ….”

19:40 DR “Yep, and one of the deliverables – the XDI TC just updated it’s charter a few months ago – and we …. have now separated out one of the specs that we’ll be coming out with is a data web query language and it’s … to be able to query any repository on the data web in the same way. And … whether it’s an LDAP directory that was exposed or a SQL database that was exposed or an XML document or an object or any database in place [they all] could be mapped into the data web, and all of them could respond to a query … in XDI format.”

20:27 SG “So that’s interesting. So would you extend XSL to do that? Or how would you actually do that?”

20:34 DR “You mean the actual format? The XDI documents? And queries and all that?”

20:38 SG “Well, I’m just wondering …. If you don’t know in advance what the document is … would you specify the query in terms of an XPath plus an XSL transform or how would you tell it …?”

20:57 DR “That gets right to the heart of … XDI as an architecture for the data web. And quite honestly it took us … TC [i.e., XRI Technical Committee at OASIS] is almost five years old …and our first specs won’t be coming out until this fall, and I know a lot of people who have said … so what have you even been working on? What’s taken so long? Well the [XRI Technical Committee at OASIS] knew that what it needed to do to create the data web …. Just as without HTML you never would have had the web. What HTML did for the web is it said, here’s the universal characteristics of …. content documents. And you look at HTML and … a lot of people, when HTML was invented it was a very, very simple variant of SGML … that a lot of people had never even heard of …. And it was like ten years before the web there was this huge language [i.e., SGML] that was being used by big companies [like Boeing] to exchange extraordinarily complex data …. HTML was this [little, tiny, very simple language that was, like,] one percent of the capacity of SGML that Tim Berners-Lee and his associates came up with for the web but what they did is that they just extracted […] OK, what are the universal characteristics of documents, you know? They have headings, they have paragraphs, they have bulleted lists and numbered lists and … it ended up being about thirty, forty elements and surprise, surprise with the addition of a few things as the web started to grow, it’s worked through all contents, right? I mean … so the power they tapped into was by saying, OK what are the … universal characteristics? Even though it’s not incredibly rich, the power comes because everybody can use it. Every browser can talk to every web server, right?”

23:10 SG “Well, the key insight was to apply standardization to the data so that you could write any program you wanted … that could talk to the data which is the exact opposite of the way things had been done before.”

23:23 DR “Exactly. There you go.”

23:24: SG “Which is … one of those forehead slappingly (sic) obvious things that only the most brilliant person in the universe can point out. It seems so obvious after the fact but for those of us working … in the industry in the late 80s and early 90s … when that first came out we all went, oh, my god who could no one have seen that … give that guy a Nobel prize.”

23:54 DR ”Exactly … so what the XDI TC came together around was saying, OK, we need to do the same thing for data. And what we really need to do is we need to get down to what’s often been called a common schema which sounds like an impossible journey but it’s like … the whole point of data in all these repositories is that it’s in all these different schemas that are in some cases designed for the repositories – you know, to optimize like LDAP schemas and directories – or in some cases they’re designed to be very generalized but still to, you know, like SQL and relational databases – to … have wonderful search characteristics … and everything else and other repositories [and] object oriented [repositories] and other specialized [repositories] and all these different schemas and here we’re saying, well, we need a common schema so that we can access the data, deal with the data, wherever it is in whatever schema it may be … in its native repository.”

24:57 SG “So does Dublin Core help with that?”

25:00 DR “No, actually Dublin Core is a common ontology …. It’s a set of characteristics of content objects …. You could say books, web documents, basically anything that is a document or a content object of some kind. What Dublin Core was trying to do was say, what are the universal characteristics [of a content object]. It’s got an author, it’s got a creation date … [and it’s got] basically sort of universal bibliographic characteristics ….”

25:45 SG “So how do you contrast [XDI] against the characteristics of what you were talking about a second ago?”

25:50 DR “What we’re talking about [is] a common schema, a way of describing the data itself in whatever repository it may be in such [a way] that you can …. The ultimate goal is to be able to take the bits – the very exact bits – that sit in repository A – whatever kind it may be – extract them in such a way that they can be understood and ultimately moved to repository B which may be the same thing – you might be going from one SQL database in one system to another SQL database in some other system – but you might be going from one SQL database to an LDAP directory someplace – or you might be going from an XML document into an object oriented repository.”

26:35 SG “So whereas the Dublin Core stuff is concerned with the semantic elements of a document, you’re actually concerned with transforming the data from a usable representation in one system to a usable representation in [another] system.”

26:55 DR “Yes”

26:56 SG “So how is that not XSchema?”

26:58 DR “Yes … fundamentally, what you [have] a problem of is schema translation or schema transformation. In other words, XML and XML schema tackles the problem of taking the data from its native format and putting it into XML with a schema that either reflects the schema that it came out of or puts it in whatever that XML schema is. So you have now a schema description language. And a common representation format [which is] a grammar in which you can access and move the data around. So if you can take the data, for instance, again, out of an LDAP directory and put it into an XML format, and you have a schema designed to do that – in fact OASIS produced one of those … the SML - now you can take that LDAP data and put it in an XML document and systems that understand that schema can then move that data around. So that’s what the SML was created for …. [It was created] for interchange of data between LDAP directory systems. OK, so we’ve gotten that far with XML. We’ve got a schema description language which now you can put an XML document itself (sic) so you can even exchange the schemas, but what you haven’t solved is the problem of moving data from … a schema in one system to a different schema in another system. That’s the problem … you actually have to solve … if you want to achieve data portability that’s not tied to any one schema format. And that’s a classic [problem]. Ever since I heard of the Data Portability Initiative and [wondered] how XDI would be … relevant … I have been waiting and watching to see how folks would start to key in on the schema problem.”

29:11 SG “Well, this is why I was asking before about using XSL because it seems to me like if you can get the data into XML as a transitional format – where you use XSchema to get it into XML – you could then use XSL transforms to make it look like whatever you want, and then use another XML schema to turn it into the target anything (sic).”

29:39 DR “What you just described is a perfect setup for … the [ultimate] problem that the XDI common schema format actually solves. So, take the different systems … the tens of millions of systems that exist out there … now, most of those systems are not one-off systems so … many of them are using some vocabulary whether its all the LDAP systems are using (sic) could put there’s into DSML so you could understand the LDAP schema but when you move to relational databases [and] object databases they have … thousands and thousands of schemas designed to hold their data so now imagine … the end to end problem you have of taking data out of one system – its in one schema now in XML – and now you’ve got to come up with the XSLT transform to get it into ten thousand other systems, right? And each of those ten thousand other systems has to be able to talk to ten thousand other systems. And just imagine the number of transforms you’re going to have to come up with. It’s pear-wise mapping, and it just goes exponential.”

30:53 SG “So I guess what I’m not getting is if it’s not a one to one map how does this actually work? Can you describe to me the process of how … the transform actually takes place?”

31:07 DR “Well … what I was just describing was if you take the transform approach but its end to end and therefore goes exponential between all the systems, to contrast that the XDI approach is to say, OK, we’re coming up …. What we’ve designed is a common schema …. It’s an incredibly simple schema because it really goes down below the layer of conventional schemas and says there’s an atomic set of relationships between data … there’s a way of describing those relationships that is universal ….”

31:42 SG “And what would some of this be? Would it be like an RDF bag or an RDF seek?”

31:46 DR “Yes … now you are getting down to the stuff … where it’s really hard to put this stuff in words that are going to make sense to a lot of the audience. It’s easier to talk in analogies …. The schema that we have come up with – which we call XDI RDF – is in fact an RDF vocabulary. However, it’s very different. We’ve got a number of RDF folks on the [XRI Technical Committee at OASIS] and I’ve been talking with other RDF experts outside the [XRI Technical Committee at OASIS] about XDI RDF and as sort of what you would expect, they’re looking at it and going, mmmm, that’s not a conventional RDF [because] no one’s been thinking about this approach to it. An RDF vocabulary … in which there is a primitive set of relationships – very, very simple relations that are described – predicates, is the RDF term for it – and, but those predicates actually …. every identifier that you use in XDI RDF is itself built up from these predicates …. so … in conventional RDF you have subject, predicate, object, triples, right? Everything is described in triples and if you can use those triples to describe all the data … wherever it might [be] whether its in a web document or an LDAP repository or all these things that might be on the data web, the “Semantic Web” approach is describe them all in triples and then everyone is going to be able to understand everyone else’s, your going to have this Semantic Web of information because it’s now all described in a common way [because] everything is in triples. And … that’s one approach you might say to the common schema format which is to say, well, if you put it all in triples where you have a way for every system to expose [these] triples now everyone can talk to everyone and they can talk right down at the level of the data. You can make it as granular as you need, right? And everything is in triples … everything is in RDF documents, right? And that is a solution … the Semantic Web is one solution, I think a very powerful one, to the common schema … problem. The twist on it that XDI RDF introduces is to go a step further and say, OK, with … RDF documents you still have the problem of vocabulary mapping which is, OK, if you take an RDF document … in an LDAP repository then its going to be using a set of RDF statements – a vocabulary is what they call it – to describe data in LDAP terms and if you went over to a SQL system and they designed … an HR [i.e., human resources] repository then they are going to have RDF documents now, everything’s in triples so its in a form you can understand but its still [has] its own vocabulary, and how does the system in the LDAP repository understand the HR vocabulary and how does the HR … system understand the LDAP vocabulary. So you have this vocabulary mapping problem. What the XDI RDF approach [does] is it says, OK, we are going to take another step which is all of the …. [Let me back up], in conventional Semantic Web, these triples are all formed from URI’s [which are] web addresses [and therefore unique] …. Every subject is a web address. Every predicate [is a web address]. An example [would be] an hCard [where there is a URI that describes the hCard]. And then for each part of the hCard, each predicate, it would say, OK, this is an hCard and this is a person’s name and then the object is the person’s name … and then here’s their phone number [identified by a URI] and then [here is the] value of the phone number, right? So that’s the vocabulary and the set of URI’s you end up with that describes the data is called the RDF vocabulary, right? So, in XDI RDF we don’t use flat identifiers instead … the identifiers themselves are built up, they’re structured identifiers, and that’s why XDI is based on XRI which is a structured identifier format and those identifiers themselves define … the data, and they define the relationships between the data, and so if you took an hCard and you put it in the XDI format then you are still going to end up with the subject, predicate, object but the identifiers … those predicates that say, this is a name, this is … a telephone number, this is a zip code … you can turn around and every one of those identifiers you can say a piece of software can look at it and say, mmmm, I don’t know what that is. I’ve never seen one of those before. You can take that identifier and you can go look up the definition and the definition will also be another XDI document and you can work your way back all the way through and the … definitions break this down into this low level grammar which is this common schema for describing everything out there. So with the dictionary approach because you have been able to break things down to these atomic relationships the LDAP system can now describe the relationships its making … between ultimately these atomic fields of data and the relationships [of the] higher level objects that it has, it can describe them in a way that literally in XDI documents – those descriptions [and] those definitions – can be described in such a way that an XDI processor over at the HR system can literally look over at the dictionary, look through the dictionary, understand the relationships such that … a human programmer does not have to sit down and work out the XSLT transformation between data that is in the LDAP system to the HR system.”

39:22: SG “Well when you say ‘dictionary’ its almost more like it’s a periodic table, right? That you’ve got these molecules of data and now that I know what it is I can say, oh, I now know the characteristics of this thing because I have its atomic number.”

39:39 DR “Yeah … I really appreciate that, Steve. I think you just … the dictionary uses a sort of human semantics but we’re actually much more down to physics, here, and its true that … the ultimate semantic thing that you are doing is that you are going to take the bits that comprise a phone number in the LDAP system and you are going to map them to the bits that ultimately need to reside in the HR system that describe a phone number but happens to … literally, the schema of a phone number is different in the HR system than it is in the LDAP system. But we’re down to the level where clearly as long as you provide adequate descriptions at that atomic level table that this is what machines were designed to do, right? Oh, it’s described this way here, and it’s described that way there. They [i.e., the machines] figure out the transform. The human doesn’t have to do the transform.”

40:32: SG “Right, so I know if I get a molecule that has the atomic weight of [a] ‘phone number’ then this is what I do with it. I know what to do with the phone number.”

40:41: DR “Exactly, exactly so really … in fact … the atomic table, I think I am going to march forward and start using that because literally now you can start to see that any system that now wants to say, OK, I really want to support data portability. I really wanna make data available in a way that other systems can understand it. They sit down once and provide their mapping to the atomic table. And … again, their mapping, they may have things that … schema elements and combinations and things that exist only in their system. They don’t exist anywhere else in the world. And that’s cool. That’s there value add but if they want to make that data digestible in other places …. If they provide that mapping then XDI software and XDI agents can go access it and figure out what the transforms are … figure out what … they can understand, what they can use. Now we can have real data portability.”

41:37 SG “So the XDI would say that a US postal code is five … numbers with a possible dash and then four more numbers. A Canadian postal code might have some letters in there, right? So … you’ll know how to handle them. That’s neat.”

42:00 DR “Exactly, because you have taken them all the way down to the A, B and F (sic) of the respective systems. And that’s the level of which, again, when I talk about XDI dictionaries I talk about software that is designed to express the data, but it’s more than that [because] there’s going to be dictionary agents that handle the job of semantic mapping across systems.”

42:22 SG “OK, so now that we’ve talked about how to transfer things between, how do control who can see what, and who can‘t?”

42:32: DR “Exactly. So here’s the batta-ba-boom (sic) of all of this. The part about XDI that has gained the most interest since it was TC was first announced was this capability we call ‘link contracts’. And, ultimately, what link contracts [may be described as is a] Web 2.0 or Web 3.0 term for portable authorization. How systems can control … access … and usage of the data in the system … but also how that control could be portable to another system, and will make this immediately relevant to … issues that are right at the core of data portability, for instance. If an individual starts using a service provider to store a bunch of their personal data, what VRM calls a personal data store, and you start storing a bunch of personal data, and you start sharing it using protocols … whatever the protocol is but imagine using XDI and start to share it with a bunch of systems like a bunch of vendors who all want to subscribe to your phone number, right? The biggest challenge after you solve the problem of how you actually would … move the data – once you solve that semantic problem – is control. How do you say, what … vendors can have the information and what they can do with the information? What is the associated privacy policy or [associated] usage policy? In a way that not only vendors can understand but you could turn around and move to another service provider – another provider of a personal data store – so that you’re not locked into one provider in that way that one provider controls access to the data on their system. How can you have a portable way of handling authorization?”

44:25 SG “So talk about … what that would look like. How it would actually work?”

44:29 DR “Well, so, the secret really is [about] these things [like] access control …. Applying those forms of policy to the data are very, very well understood in directory systems and identity management systems …. I’m here at the Burton [Group] Catalyst Conference and there’s just session after session about how this is done in enterprises all over the world. The challenge is in order to make it portable …. No one’s ever made it portable. The LDAP community is the oldest …. Goes back … to X.500 … [which was the] biggest directory effort in history and they ultimately punted on the problem. They could not come up with a solution for how one LDAP system could describe access control to another LDAP system. So you could literally move not just entries from one LDAP directory to another but control over those entries. Who could access what? It’s never been solved in LDAP. Well I believe the answer is that it’s never been solved because … even in the same system you had to come down to a common … graph of information because … the only way you can describe control over information is to have pointers to that information that don’t change when you move from one system to another.”

45:50 SG “But it’s a common graph of not just the information but also the identity[, right?] The ability to say that … both systems can know that the actor who is setting or requesting information is actually the same person[, right?].”

46:05 DR “Exactly. Actually you have to have three things in common, OK? You have to describe the data in a way that both systems can understand it [so that they] can be pointing at the same thing …. This is Steve’s phone number, OK? So you have to be able to identify [that] this is Steve in a way that both systems can understand. You have to be able to identify that this is Steve’s phone number in a way [that] both systems can understand, and then you have to be able to describe the control that Steve is providing which is to say Trent can have access to this phone number on both systems. And, so, yes, you have to solve the data problem, you have to solve the identity problem, and you have to solve the control problem, and XDI link contracts solve that …. First of all, XDI is a protocol …. A big step beyond the common format is that in XDI …. there’s a sender and receiver of every message as there is in e-mail, so identity is built into the protocols …. There is anonymous access to the system but the identifier is for an anonymous accessor. [In the] vast majority of XDI transactions …. it’s not an IP address that is addressing the system, it is an identity. You know … take MySpace …. If MySpace starts exposing data in XDI then … they can identify, oh, this is Steve accessing his own account on MySpace. Oh, this is Trent accessing Steve’s account and you authenticate as you’re Steve or you’re Trent and they then know because they know they have a way of describing the permissions that apply to that data what they give to Steve because its his account [and] what they give to Trent because Trent is a friend of Steve’s.”

47:55 TA “Only if in fact I was a friend of Steve’s.”

47:58 DR [Laughs]

47:59 TA “So there needs to be an enemy control in there somewhere as well.”

48:02 SG “Hey, do you want me to come back next week?”

48:04 TA “I keep pushing my luck, and speaking of pushing our luck, I think we’re pushing our collective luck here by locking Drummond to the phone when he’s got to skedaddle over to the Catalyst sessions.”

48:15 DR “At least we got to the heart of … link contracts [and] portable authorization which I do want to say just this one final thing …. So the two hardest problems [as I have been watching the Data Portability initiative] are the common schema problem and the authorization problem – portable authorization problem. And … so I just really want to thank both of you for this opportunity …. As you can tell, I am very passionate about this topic. To talk about … well, there is this effort at OASIS … to solve both [of] those problems with the XDI protocol and the XDI link contracts and … I just wanna let you know, hey, we’ve made a lot of progress …. We’re no longer in the, OK, figure out the problem stage. There are actually two implementations of XDI. One of the first module (sic) we worked out [is] called [the] ATI model which is available from a company call ooTao. They actually have a commercial XDI server that does that and then the XDI RDF model has been implemented at Higgins in a module called XDI for Java and if you just Google XDI 4J or XDI for Java … Higgins and XDI, you’ll get to the pages there, and that module is [a] stand-alone XDI module that you can already start to play with. You can validate XDI documents. You can send and receive XDI messages. You can set up your own end points. This is real stuff now. So we are very excited to … get the specs out and actually start to show folks here’s how you could have data portability that runs everywhere.”

50:01 SG “That’s great. One ring to rule them all.”

[Drummond leaves the stage. Finishing banter]

Tuesday
Jul152008

Cloud Computing: Billowing Toward Data Ownership - Part II

[Return to Part I]

Cloud Computing's Achilles Heel

2093760-1723750-thumbnail.jpg
Death of Achilles Peter Paul Rubens 1630-1635
The boom in the data center industry is building the Cloud where the conventional wisdom is that the software services of the Semantic Web will thrive. The expansion of the Cloud is believed to augur well that distributed data within the Cloud will come to substitute to some extent - perhaps substantially so - for data currently distributed outside of the Cloud. But the boom is being built upon a privacy paradigm employed by online companies that allows them to use Web cookies for collecting a wide variety of information about individual usage of the Internet. This assumption is the Cloud’s Achilles’ heel. It is an assumption that threatens to keep the Cloud from fully inflating beyond publicly available information sources.

I'm mulling over a more indepth discussion of Web cookies for a final Part III to this multi-part series. In the meantime the focus of today's blog is that a more likely consequence of the Cloud is that as people and businesses consider moving their computer storage and services into the Cloud, their direct technological control of information becomes more and more of a competitive driver.  As blogged in Part I, the online company that figures out ways of building privacy mechanisms into its compliance systems will be putting itself at a tremendous competitive advantage for attracting the services to operate in the Cloud. But puzzlement reigns as to how to connect the Cloud with new pools of the data (mostly non-artistic) that is private, confidential and classified.

Semantic Web: What’s Right About the Vision

What’s right about the Semantic Web is that its most highly funded visionaries have envisioned beyond a Web of documents to a ‘Data Web’. Here are two examples. A Web of scalably integrated data employing object registries envisioned by Metaweb Technologies’ Danny Hillis. A Web of granularly linked, ontologically defined, data objects envisioned by Radar Network’s Nova Spivack.

2093760-1693914-thumbnail.jpgClick on the thumbnail image to the left and you will see in more detail what Hillis envisions. That is, a database represented as a labeled graph, where data objects are connected by labeled links to each other and to concept nodes. For example, a concept node for a particular category contains two subcategories that are linked via labeled links "belongs-to" and "related-to" with text and picture. An entity comprises another concept that is linked via labeled links "refers-to," "picture-of," "associated-with," and "describes" with Web page, picture, audio clip, and data. For further information, see the blogged entry US Patent App 20050086188: Knowledge Web (Hillis, Daniel W. et al).

2093760-1660744-thumbnail.jpgClick on the thumbnail image to the right and you will see in more detail what Spivack envisions. That is, a picture of a Data Record with an ID and fields connected in one direction to ontological definitions in another direction to other similarly constructed data records with there own fields connected in one direction to ontological definitions, etc. These data records - or semantic web data - are nothing less than self-describing, structured data objects that are atomically (i.e., granularly) connected by URIs. For more information, see the blogged entry, US Patent App 20040158455: Methods and systems for managing entities in a computing device using semantic objects (Radar Networks).

Furthermore, Hillis and Spivack have studied the weaknesses of relational database architecture when applied to globally diverse users who are authoring, storing and sharing massive amounts of data, and they have correctly staked the future of their companies on object-oriented architecture. See, e.g., the blogged entries, Efficient monitoring of objects in object-oriented database system which interacts cooperatively with client programs and Advantages of object oriented databases over relational databases. They both define the Semantic Web as empowering people across the globe to collaborate toward the building of bigger, and more statistically reliable, observations about things, concepts and relationships.

2093760-1595453-thumbnail.jpgClick on the thumbnail to the left for a screen shot of a visualization and interaction experiment produced by Moritz Stefaner for his 2007 master's thesis, Visual tools for the socio–semantic web. See the blogged entry, Elastic Tag Mapping and Data Ownership. Stefaner posits what Hillis and Spivack would no doubt agree with - that the explosive growth of possibilities for information access and publishing fundamentally changes our way of interaction with data, information and knowledge. There is a recognized acceleration of information diffusion, and an increasing process of granularizing information into micro–content. There is a shift towards larger and larger populations of people producing and sharing information, along with an increasing specialization of topics, interests and the according social niches. All of this appears to be leading to a massive growth of space within the Cloud for action, expression and attention available to every single individual.

Semantic Web: What’s Missing from the Vision

Clouds_Missing%20From.PNG Continuing to use Hillis and Spivack as proxies, these two visionaries of the Semantic Web assume that data - all data - will be made available as an open source. Neither of them have a ready answer for the very simple question that Steve Innskeep asks above (in Part I of this two-part blog entry).

Inskeep: "Is somebody who runs a business, who used to have a filing cabinet in a filing room, and then had computer files and computer databases, really going to be able or want to take the risk of shipping all their files out to some random computer they don't even know where it is and paying to rent storage that way?"

Sir Tim Berners-Lee, the widely recognized inventor of the Web, and Director of the W3C, is every bit as perplexed about data ownership. In Data Portability, Traceability and Data Ownership - Part IV I referenced a recent interview excerpt from March, 2008, initiated by interviewer Paul Miller of ZDNet, in which Berners-Lee does acknowledge data ownership fear factors.

Miller: “You talked a little bit about people's concerns … with loss of control or loss of credibility, or loss of visibility. Are those concerns justified or is it simply an outmoded way of looking at how you appear on the Web?”

Berners-Lee: “I think that both are true. In a way it is reasonable to worry in an organization … You own that data, you are worried that if it is exposed, people will start criticizing [you] ….

So, there are some organizations where if you do just sort of naively expose data, society doesn't work very well and you have to be careful to watch your backside. But, on the other hand, if that is the case, there is a problem. [T]he Semantic Web is about integration, it is like getting power when you use the data, it is giving people in the company the ability to do queries across the huge amounts of data the company has.

And if a company doesn't do that, then, it will be seriously disadvantaged competitively. If a company has got this feeling where people don't want other people in the company to know what is going on, then, it has already got a problem ….

(emphasis added)

In other words, 'do the right thing', collegially share your data and everything will be OK. If only the real world worked that way, then Berners-Lee would be spot on. In the meantime, there is a ready answer.

Ownership Web

Cloud%20Over%20Ocean.PNGThe ready answer is an Ownership Web concurrently rising alongside, and complimentary to, the emerging Semantic Web.

For the Semantic Web to reach its full potential in the Cloud, it must have access to more than just publicly available data sources. It must find a gateway into the closely-held, confidential and classified information that people consider to be their identity, that participants to complex supply chains consider to be confidential, and that governments classify as secret. Only with the empowerment of technological ‘data ownership’ in the hands of people, businesses, and governments will the Semantic Cloud make contact with a horizon of new, ‘blue ocean’ data.

The Ownership Web would be separate from the Semantic Web, though semantically connected as layer of distributed, enterprise-class web platforms residing in the Cloud.

Ownership%20Web.PNG

The Ownership Web would contain diverse registries of uniquely identified data elements for the direct authoring, and further registration, of uniquely identified data objects. Using these platforms people, businesses and governments would directly host the authoring, publication, sharing, control and tracking of the movement of their data objects.

The technological construct best suited for the dynamic of networked efficiency, scalability, granularity and trustworthy ownership is the data object in the form of an immutable, granularly identified, ‘informational’ object.

A marketing construct well suited to relying upon the trustworthiness of immutable, informational objects would be the 'data bank'.

Data Banking

Bank_Man%20and%20Money%20Supporting.PNG Traditional monetary banks meet the expectations of real people and real businesses in the real world.

As blogged in Part I ... 

People are comfortable and familiar with monetary banks. That’s a good thing because without people willingly depositing their money into banks, there would be no banking system as we know it. By comparison, we live in a world that is at once awash in on-demand information courtesy of the Internet, and at the same time the Internet is strangely impotent when it comes to information ownership.

In many respects the Internet is like the Wild West because there is no information web similar to our monetary banking system. No similar integrated system exists for precisely and efficiently delivering our medical records to a new physician, or for providing access to a health history of the specific animal slaughtered for that purchased steak. Nothing out there compares with how the banking system facilitates gasoline purchases.

If an analogy to the Wild West is apropos, then it is interesting to reflect upon the history of a bank like Wells Fargo, formed in 1852 in response to the California gold rush. Wells Fargo wasn’t just a monetary bank, it was also an express delivery company of its time for transporting gold, mail and valuables across the Wild West. While we are now accustomed to next morning, overnight delivery between the coasts, Wells Fargo captured the imagination of the nation by connecting San Francisco and the East coast with its Pony Express. As further described in Banking on Granular Information Ownership, today’s Web needs data banks that do for the on-going gold rush on information what Wells Fargo did for the Forty-niners.

Banks meet the expectations of their customers by providing them with security, yes, but also credibility, compensation, control, convenience, integration and verification. It is the dynamic, transactional combination of these that instills in customers the confidence that they continue to own their money even while it is in the hands of a third-party bank.

A data bank must do no less.

Ownership Web: What's Philosophically Needed

Money_Brazilian.PNG Where exactly is the sweet spot of data ownership?

In truth, it will probably vary depending upon what kind of data bank we are talking about. Data ownership will be one thing for personal health records, another for product supply chains, and yet another for government classified information. And that's just for starters because there will no doubt be niches within niches, each with their own interpretation of data ownership. But the philosophical essence of the Ownership Web that will cut across all of these data banks will be this:

  • That information must be treated either or both as a tangible, commercial product or banked, traceable money.

The trustworthiness of information is crucial. Users will not be drawn to data banks if the information they author, store, publish and access can be modified. That means that even the authors themselves must be proscribed from modifying their information once registered with the data bank. Their information must take on the immutable characteristic of tangible, traceable property. While the Semantic Web is about the statistical reliability of data, the Ownership Web is about the reliability of data, period.

Ownership Web: What's Technologically Needed

What is technologically required is a flexible, integrated architectural framework for information object authoring and distribution. One that easily adjusts to the definition of data ownership as it is variously defined by the data banks serving each social network, information supply chain, and product supply chain. Users will interface with one or more ‘data banks’ employing this architectural framework. But the lowest common denominator will be the trusted, immutable informational objects that are authored and, where the definition of data ownership permits, controllable and traceable by each data owner one-step, two-steps, three-steps, etc. after the initial share.

2093760-1700737-thumbnail.jpgClick on the thumbnail to the left for the key architectural features for such a data bank. They include a common registry of standardized data elements, a registry of immutable informational objects, a tracking/billing database and, of course, a membership database. This is the architecture for what may be called a Common Point Authoring™ system. Again, where the definition of data ownership permits, users will host their own 'accounts' within a data bank, and serve as their own 'network administrators'. What is made possible by this architectural design is a distributed Cloud of systems (i.e., data banks). The overall implementation would be based upon a massive number of user interfaces (via API’s, web browsers, etc.) interacting via the Internet between a large number of data banks overseeing their respective enterprise-class, object-oriented database systems.

2093760-1666391-thumbnail.jpgClick on the thumbnail to the right for an example of an informational object and its contents as authored, registered, distributed and maintained with data bank services. Each comprises a unique identifier that designates the informational object, as well as one or more data elements (including personal identification), each of which itself is identified by a corresponding unique identifier. The informational object will also contain other data, such as ontological formatting data, permissions data, and metadata. The actual data elements that are associated with a registered (and therefore immutable) informational object would be typically stored in the Registered Data Element Database (look back at 124 in the preceding thumbnail). That is, the actual data elements and are linked via the use of pointers, which comprise the data element unique identifiers or URIs. Granular portability is built in. For more information see the blogged entry US Patent 6,671,696: Informational object authoring and distribution system (Pardalis Inc.).

Ownership Web: Where Will It Begin?

2093760-1729103-thumbnail.jpg
Aristotle
Metaweb Technologies
is a pre-revenue, San Francisco start-up developing and patenting technology for a semantic ‘Knowledge Web’ marketed as Freebase™. Philosophically, Freebase is a substitute for a great tutor, like Aristotle was for Alexander. Using Freebase users do not modify existing web documents but instead annotate them. The annotations of Amazon.com are the closest example but Freebase further links the annotations so that the documents are more understandable and more findable. Annotations are also modifiable by their authors as better information becomes available to them. Metaweb characterizes its service as an open, collaboratively-edited database (like Wikipedia, the free encyclopedia) of cross-linked data but it is really very much a next generation competitor to Google.

Not that Hillis hasn't thought about data ownership. He has. You can see it in an interview conducted by his patent attorney and filed on December 21, 2001 in the provisional USPTO Patent Application 60/343,273:

Danny Hillis: "Here's another idea that's super simple. I've never seen it done. Maybe it's too simple. Let's go back to the terrorist version [of Knowledge Web]. There's a particular problem in the terrorist version that the information is, of course, highly classified .... Different people have some different needs to know about it and so on. What would be nice is if you ... asked for a piece of information. That you [want access to an] annotation that you know exists .... Let's say I've got a summary [of the annotation] that said,  'Osama bin Laden is traveling to Italy.' I'd like to know how do you know that. That's classified. Maybe I really have legitimate reasons for that. So what I'd like to do, is if I follow a link that I know exists to a classified thing, I'd like the same system that does that to automatically help me with the process of getting the clearance to access that material." [emphasis added]

What Hillis was tapping into just a few months after 9/11 is just as relevant to today's information sharing needs.

In the War on Terror the world is still wrestling with classified information exchange between governments, between agencies within governments, and even between the individuals making up the agencies themselves. Fear factors revolving around data ownership – not legal ownership, but technological ownership – create significant frictions to information sharing throughout these Byzantine information supply chains.

Fear%20Factors_Woman%20Fretting.PNGSomething similar is happening within the global healthcare system. It's a complex supply chain in which the essential product is the health of the patients themselves. People want to share their entire personal health records with a personal physician but only share granular parts of it with an impersonal insurance company. ‘Fear factors’ are keeping people from becoming comfortable with posting their personal health information into online accounts despite the advent of Microsoft HealthVault and Google Health.

And then, in this era of both de facto and de jure deregulation, there are the international product supply chains providing dangerous toys and potential ‘mad cow’ meat products to unsuspecting consumers. Unscrupulous supply chain participants will always hide in the ‘fog’ of their supply chains. The manufacturers of safe products want to differentiate themselves from the manufacturers of unsafe products. But, again, fear factors keep the good manufacturers from posting information online that may put them at a competitive disadvantage to downstream competitors.

I'm painting a large picture here but what Hillis is talking about is not limited to the bureaucratic ownership of data but to matching up his Knowledge Web with another system - like the Ownership Web - for automatically working out the data ownership issues.

But bouncing around ideas about how we need data ownership is not the same as developing methods or designs to solve it. What Hillis non-provisionally filed, subsequent to his provisional application, was the Knowledge Web (aka Freebase) application. Because of its emphasis upon the statistical reliability of annotations, Knowledge web's IP is tailored made for the Semantic Web.  See the blogged entry US Patent App 20050086188: Knowledge Web (Hillis, Daniel W. et al). And because the conventional wisdom within Silicon Valley is that the Semantic Web is about to emerge, Metaweb is being funded like it is “the next big thing”. Metaweb’s Series B raised $42.4M more in January, 2008. What Hillis well recognizes is that as Freebase strives to become the premier knowledge source for the Web, it will need access to new, blue oceans of data residing within the Ownership Web.

2093760-1660740-thumbnail.jpgRadar Networks may be the “next, next big thing”. Also a pre-revenue San Francisco start-up, its bankable founder, Nova Spivack, has gone out of his way to state that his product Twine™ is more like a semantic Facebook while Metaweb’s Freebase is more like a semantic Wikipedia. Twine employs W3C standards in a community-driven, bottom up process, from which mappings are created to infer a higher resolution (see thumbnail to the right) of semantic equivalences or connections among and between the data inputted by social networkers. Again, this data is modifiable by the authors as better information becomes available to them. Twine holds four pending U.S. patent applications though none of these applications. See the blogged entry US Patent App 20040158455:  Methods and systems for managing entities in a computing device using semantic objects (Radar Networks). Twine’s Series B raised $15M-$20M in February, 2008 following on the heels of Metaweb's latest round. Twine’s approach in its systems and its IP is to emphasize perhaps a higher resolution Web than that of MetaWeb. Twine and the Ownership Web should be especially complimentary to each other in regard to object granularity. You can see this, back above, in the comparative resemblance between the thumbnail image of Spivack's Data Record ID object with the thumbnail image of Pardalis' Informational Object. Nonetheless, the IP supportive of Twine, like that Hillis' Knowledge Web, places a strong emphasis upon the statistical reliability of information. Twine's IP is tailored made for the Semantic Web.

Dossia is a private consortium pursuing the development of a national, personally controlled health record (PCHR) system. Dossia is also governed by very large organizations like AT&T, BP America, Cardinal Health, Intel, Pitney Bowes and Wal-Mart. In September, 2007, Dossia outsourced development to the IndivoHealth™ PCHR system. IndivoHealth, funded from public and private health grants, shares Pardalis' philosophy that "consumers are managing bank accounts, investments, and purchases online, and … they will expect this level of control to be extended to online medical portfolios." IndivoHealth empowers patients with direct access to their centralized electronic medical records via the Web.

But given the current industry needs for a generic storage model, the IndivoHealth medical records, though wrapped in an XML structure (see the next paragraph), are essentially still just paper documents in electronic format. IndivoHealth falls far short of empowering patients with the kind of control that people intuitively recognize as ‘ownership’. See US Patent Application 20040199765 entitled System and method for providing personal control of access to confidential records over a public network in which access privileges include "reading, creating, modifying, annotating, and deleting." And it reasonably follows that this is one reason why personal health record initiatives like those of not just Dossia, but also Microsoft’s HealthVault™ and GoogleHealth™, are not tipping the balance. For Microsoft and Google another reason is that they so far have not been able to think themselves out of the silos of the current privacy paradigm. The Ownership Web is highly disruptive of the prevailing privacy paradigm because it empowers individuals with direct control over their radically standardized, immutable data.

World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. W3C is headed by Sir Tim Berners-Lee, creator of the first web browser and the primary originator of the Web specifications for URL, HTTP and HTML. These are the principal technologies that form the basis of the World Wide Web. W3C has further developed standards for XML, SPARQL, RDF, URI, OWL, and GRDDL with the intention of facilitating the Semantic Web. While Berners-Lee has described in his own words (above) his perplexity about data ownership, nonetheless, the data object standards created by the W3C should be more than friendly to an Ownership Web employing object-oriented architecture. Surely, in Common Point Authoring™ will be found many of the ‘best of breed’ standards for an Ownership Web that is most complimentary to the emerging Semantic Web.

2093760-1723853-thumbnail.jpgEPCglobal is a private, standards setting consortium governed by very large organizations like Cisco Systems, Wal-Mart, Hewlett-Packard, DHL, Dow Chemical Company, Lockheed Martin, Novartis Pharma AG, Johnson & Johnson, Sony Corporation and Proctor & Gamble. EPCglobal is architecting essential, core services (see EPCglobal's Architectural Framework in the thumbnail to the right) for tracking physical products identified by unique electronic product codes (including RFID tags) across and within enterprise-scale, relational database systems controlled by large organizations.

Though it would be a natural extension to do so, EPCglobal has yet to envision providing its large organizations (and small businesses, individual supply chain participants and even consumers) with the ability to independently author, track, control and discover granularly identified informational products. See the blogged entry EPCglobal & Prescription Drug Tracking. It is not difficult to imagine that the Semantic Web, without a complimentary Ownership Web, would frankly be abhorrent to EPCglobal and its member organizations. For the Semantic Web to have any reasonable chance of connecting itself into global product and service supply chains, it must work through the Ownership Web.

Ownership Web: Where It Will Begin

The Ownership Web will begin along complex product and service supply chains where information must be trustworthy, period. Statistical reliability is not enough. And, in fact, the Ownership Web is beginning to form along the most dysfunctional of information supply chains. But that's for discussion in later blogs, as the planks of the Ownership Web are nailed into place, one by one.


[This concludes Part II of a three part series. On to Part III.]

Thursday
May222008

Personal Health Records, Data Portability and the Continuing Privacy Paradigm

Google began offering online personal health records (PHRs) to the public this last Monday. So earlier this week I clicked over to Google Health and signed up for an account.

I was offered the choice of conveniently entering in my medical profile (e.g., age, sex, blood type, allergies, test results, immunizations, etc.). Or, though it was a moot point, I could also upload any medical information of mine that might already be held by Beth Israel Deaconess Medical Center, Cleveland Clinic, Longs Drug Stores, Medco, CVS Caremark, Quest Diagnostics, RxAmerica, or Walgreens Pharmacy.

And then I did not fill out any of my medical profile information.

I did not because while it looks like a beautiful information garden that Google is offering, it’s nonetheless a garden that has been charted within a continuing privacy paradigm that unnecessarily allocates more power into the hands of the garden's gatekeeper (i.e., Google) than to the actual gardeners themselves (like you and me).

The privacy paradigm in this instance is not perpetuated by the Congressional Health Insurance Portability and Accountability Act, but by Google’s Health Terms of Service …

“If you create, transmit, or display health or other information while using Google Health, you may provide only information that you own or have the right to use. When you provide your information through Google Health, you give Google a license to use and distribute it in connection with Google Health and other Google services. However, Google may only use health information you provide as permitted by the Google Health Privacy Policy, your Sharing Authorization, and applicable law. Google is not a "covered entity" under the Health Insurance Portability and Accountability Act of 1996 and the regulations promulgated thereunder ("HIPAA"). As a result, HIPAA does not apply to the transmission of health information by Google to any third party.”

Now, don’t get me wrong. Google Health is a worthy service by the standards of the prevailing privacy paradigm. There are going to be a number of people who will choose to enter in their medical information.

Notwithstanding, Google Health indeed represents the predominant privacy paradigm of data possession by online companies at a time when the momentum is beginning to shift toward a 'data ownership' paradigm - see Dataportability, Traceability and Data Ownership - for truly empowering information owners and producers with technological possession of their own information.

Furthermore, it is difficult for me to imagine Google convincingly claiming the moral high ground for PHRs when I read about U.S. Senators, in a Congressional hearing held the day after Google Health was made public, pressing executives from Yahoo, Google, and Cisco Systems “to justify their business practices in China and other Internet-censoring countries”. See Senators weigh new laws over China online censorship.

An article in The New England Journal of Medicine was recently published entitled Personally Controlled Online Health Data — The Next Big Thing in Medical Care?

“Most physicians in the United States have paper medical records — the sort that doctors have kept for generations. A minority have electronic records that provide, at a minimum, tools for writing progress notes and prescriptions, ordering laboratory and imaging tests, and viewing test results .... Yet electronic health data are poised for an online transformation that is being catalyzed by Dossia (a nonprofit consortium of major employers), Google Health, Microsoft HealthVault, and other Web services that are seeking expanded roles in the $2.1 trillion U.S. health care system.” [emphasis added]

If you choose not to gain access to the full text of this article by subscribing to The New England Journal of Medicine, see Internet Health Records: Convenience at a Cost? by Joanne Silberner on the National Public Radio website (which is available in both text and a 4m 36sec audio).

Silberner does a professional job but it seems like once you have read (or listened) to one of these articles about PHRs, you have pretty much read them all. What she familiarly relates is that despite the involvement of Dossia, Google Health, and Microsoft HealthVault, creating and maintaining a full health record may be a job for the compulsive and, on top of that, medical records experts are worried about privacy.

Holding that thought ...

... let’s pause for a moment to jump from the world of PHRs over to current events vis-à-vis data portability within social networking. Michael Arrington blogs a very neat summary in Data Portability – It’s The New Walled Garden at TechCrunch.

"A huge battle is underway between Google, MySpace and Facebook around control of user profiles and, therefore, users themselves …. Internet giants know that the days of getting you to spend all of your time inside their walled gardens are over. So the next best thing is to at least maintain as much data about the user as possible, and make sure they identify with your brand while they are out there not being on your site …. The companies with the profiles (mostly MySpace and Facebook) know this. And they know that to keep users happy, and to stop them from entering in all that friend data into other sites, they need to make their data at least somewhat portable. Not too portable, mind you. That means they’d lose control. But just portable enough …. [emphasis added]

 Arrington further states ...

Google is a little different. They don’t have a social networking presence in the U.S., so they are trying to get in the middle between the guys with the profiles (like Facebook) and the sites that want the data. Their Friend Connect product does just that, and makes them an important data middle man. That position can later be leveraged intensely. In fact, in many ways Google can become the most important social network without actually having a social network." [emphasis added]

In other words, Google's Friend Connect provides it with an opportunity to place MySpace and Facebook within a Google ‘picture frame’ from the perspective of internet users. And that picture frame is the opening to a walled garden of data – yours and mine. Whomever controls the entrance to the garden controls ... well, you get the picture, right?

And, coming back to the world of PHRs, it takes no imagination to conclude that Google might do the same with Google Health. That is, Google Health as 'a picture frame' for Microsoft HealthVault, Dossia, etc.

But for all these machinations inside of Silicon Valley the question still goes begging ...

  • Is there to be a critical mass of internet users who will actually put their medical profile online under the current privacy paradigm?

Speaking only for myself, the answer is ‘no’.