DataPortability In-Motion Podcast: Episode 10 (Mark Adams)

Saturday

Nov222008

Saturday, November 22, 2008 at 8:33PM

This is a transcription for filing in the Reference Library to this blog of the substantive dialogue found in Episode 10 of the DataPortability: In-Motion Podcast. The series is co-hosted by Trent Adams, founder of Matchmine, and Steve Greenberg, CEO of Ciabe.

The guest in Episode 10 is Trent’s brother, R. Mark Adams. He is a genetic engineer who earned his Ph.D. in cell biology and was a pioneer in the field of bioinformatics. He is currently a Senior Associate at Booz Allen Hamilton and runs their bioinformatics group. Here's an interesting excerpt from the transcription:

... these issues that have been eating away at us in the informatics field for twenty years are suddenly sort of appearing on the public stage in a very profound way, and it's going to be interesting to see how, as the parallel development of the sort of data portablity and social networking and these things move in parallel, how that's going to enhance, extend and enable, uh, people to actually take more control of the electronic health record like we talked about [regarding] Google or HealthVault with their own genetic data and their own exploitation of that data in a secure way. Very, very interesting and profound. And I think very challenging for data portability in the future. You know, it matters."

The transcription picks up at the 1:47 minute mark (after announcements and banter) and goes straight through to the 48:03 minute mark when Mark Adams leaves the stage.

1:47 [Mark Adams] Uh, sure. Uh, again, as you said, I run the bioinformatics group here at Booz Allen. One of our particular interests is in providing, uh, bioinformatic strategy to people, uh, working on problems within the healthcare and research space. And as a result, obviously, this ... involves and touches a lot of the issues that the [DataPortability In Motion] here talks about, uh, most importantly around areas of data security, uh, data safety .... Uh, you know, where does that data go and how is it managed? How do we know who's who? Obviously, within biomedical informatics and in healthcare in general these issues are particularly important, ah, and we're spending a lot of our time in effort, uh, working on coming up with ways to manage and deal with that. That's obviously why we are very interested to ... to talk to you guys about, you know, where some of the natural synergies are between the work that we’re doing kind of in the medical/scientific space and the work that you guys are doing on the social space because we think increasingly, uh, these two spaces are coming together, both because scientists and physicians and healthcare folks themselves are innately social and are beginning to leverage the social networking tools increasingly, uh, but also obviously because, you know, some ... of the tools, functions and activities that you guys have been doing in this space are maybe directly applicable to the things we are doing and vice versa. So I'm really excited to be here. I think it is really interesting. And one of the things we have been doing a lot of is providing informatics strategy – bioinformatic strategy - for a number of different clients, uh, both government as well, as well as private clients, primarily pharmaceutical companies [and] biotechs. But our , you know, kind of the main thing we've been doing lately with the government is [that] we have been leading government's cancer biomolecular informatics grid project or caBIG. Uh, it is an open source initiative - large scale - for providing standardized infrastructure for, uh, cancer research data. For providing a mechanism for cancer researchers across the country and across the globe to share data. Not surprisingly to you guys the cornerstone of that effort is standardized data representation, common data elements, standardized vocabularies, messaging systems, things that you'd be familiar with. But, you know, there's a flip side to that and these things are not necessarily intrinsically technical - or not completely technical. Uh, they have an important social component as well, and, you know, one of the reasons that, that some of these efforts are successful, and that some of them aren't, often comes down to how the socialization aspects, or the community aspects, of these projects work. Again, because it's an open source project, uh, you know, it isn't being driven by a particular company, uh, but rather it's driven by the community views.

4:32 [Steve Greenberg] So then could I say then that the question of data portability is actually not a technology question? It is actually a business model and [an] operational model question?

4:44: [M. Adams] It is and in fact I would go even one layer further and say that it's a scientific/technical question, right? I mean from a, from a data portability standpoint .... Or let's talk data standards .... Again, you know, data standards exist. Of course, you've probably heard a million times that the great things about standards is that there are so many of them. That's true in biomedicine as well as anything. There are several ... standards for, uh, managing reporting and sharing around clinical information - clinical trials information or medical/clinical information - messaging standards - standardized vocabularies. You've perhaps heard of things like HL7 - Health Level Seven. Uh, HL7 is a, is a data standard, a messaging standard, that’s been around for a long time. Uh, they recently (well, maybe not so recently now) went to version 3, HL7 v3 that’s called ... which is an XML based [version]. They essentially migrated what they were doing to XML. So, uh, in doing that they've created some new opportunities but also some new complications, obviously. Uh, that overlaps in a significant way with another XML based effort called CDISC [Clinical Data Interchange Standards Consortium]. Uh, this is another data standards consortium looking at sharing clinical data primarily like a pharmaceutical company might collect and share with a ... uh, the FDA [the U.S. Food & Drug Administration] or ... with a trial organization. The FDA does not accept CDISC [protocols] right now, although they are working on it ....

6:11 [Greenberg] So ... so is the work of these organizations to codify the data transfer standards or is it to, uh, codify, uh, the access policies for the data?

6:24 [M. Adams] Well, you know, that's a really, really important question but it's very complicated, and it’s the former not the latter.

6:31 [Greenberg] And if it was easy they wouldn't pay you so much.

6:33 [M. Adams] Well, you know, the data ... the access ... you know, it's fascinating because, you know, think about the criticality of health data, right? Uh, if you're at a pharmaceutical company and your transmitting or sharing clinical trials data like the basic data behind, uh, you know, what you would need to approve a drug ... considering that they may have spent literally hundreds of millions of dollars doing that activity, the, the data itself is very valuable and its proprietary nature [makes it] doubly so. Right, uh, if data's got to be transmitted, it's got to be transmitted correctly. It's got to be secured. It's got to be safe. Uh, data privacy ... uh, this is as true for health records .... I mean all this stuff, talking about news ... although this isn't news this week, it was news last week ... uh, and that's Google ,uh, starting up, you know, their own EHR [electronic health record], right? Uh, you know ....

7:22 [Greenberg] So, actually, I will, I will repeat my, uh, prediction for this podcast. I don’t know if I ever made this prediction, Trent. I believe that in ten years Google will make more money off EHR than off search.

7:38 [Trent Adams] Yeah, you said, you said that last week .... Yeah.

7:40 [Mark Adams] That, uh, that I don't think is an unreasonable prediction. Uh, I think one of the fascinating ... I don't think it's a coincidence, uh, that Google Health - their EHR - is getting a lot of sort of very interesting traction in ... interesting communities, right? In-patient communities. You know, obviously, [Booz Allen Hamilton has] been looking very closely at it. Uh, I've got a Google Health account. Signed up for it [the] first day. Why, partly because I spent a lot of time recommending to clients, agencies, executive leadership, how these things are and work. I figure I might as well at least make sure I eat that particular peanut butter. But it's, uh, not a coincidence that Microsoft, for example, doesn't brand HealthVault [as a Microsoft product, or at least] not in a very obvious way. Uh, largely because, uh, you know, the public, you know, isn't necessarily likely to trust a Microsoft with their health data. Now maybe they will. I mean, I would be curious to see how your [prediction] plays out. [That is,] will a significant amount of Microsoft's revenue be off of EHR. But it's funny if you say that ... if you say, well, we think ten years from now Microsoft will be making more money off its EHR than it’s making off of it's desktop products, or something ... that doesn't ring as true and I would argue [that] it doesn't ring as true ....

9:02 [Steve Greenberg] Well, Microsoft is ... actually pursuing a lot of very interesting strategies, uh, one of which is they're slowly trying to move the market toward an authentication scheme called CardSpace. And what that will let them do is actually have real privacy and real traceability of access on the Web.

9:26 [Mark Adams] Well, sure, again though, I mean, and this gets back to your original point, and that's - is it technology or is it sociality (sic) ...? There are many technologies for providing security and safety, right? Their are many of them. All kinds of different ways of approaching that. Uh, whether or not they get taken up is a question more of socialization. It's a question of market uptake. It's a question of what people do and whether, you know, you can have the best algorithms in the world and if people don't trust you, they are not going to trust you. They won't use it. And so for that reason ... I think it is very interesting and I bring it up to say, clearly the smart guys down there in Redmond [i.e., Microsoft] are thinking about this and how ... they're branding their product, right? Not necessarily what they are doing technologically - which is of course also interesting as you point out with CardSpace and so on - but how they are branding their product probably will have more to do with whether or not it actually gets used than the particular technology that they end up using.

10:29 [Steve Greenberg] Well it actually ... I mean it raises an interesting question which is to say that, uh, for organizations ... for [a] research organization or [a] government organization ... that has substantial resources to bring to bear ... they have their own operations group ... they can ... and they tend to be transferring data in a relatively small number of very large chunks ... then encrypting it and signing it and getting it all in the right format and things like that ... is something that they can reasonably be expected to do. But when we ... when we think about sort of bringing this out to the masses for EHR and for banking records and for things like you know ... [while] you were talking about the [caBIG] project, I was thinking about folding it home (sic), and ... there are all [of] these interesting questions about how, you know, if I send a bit of computation off to your computer and then you do it, how do I know when it comes back that you weren't trying to game it, or that you actually did the computation that I asked you to [do]. So there are a whole bunch of really interesting trust issues that I assert have yet to really be worked out in a serious way before these technologies can go large scale.

11:48 [M. Adams] It's interesting, you know, I mean, you know, in a sense it’s always ... a question of one's vision, alright? Or one's level of focus. Because on one hand that's certainly true, especially in clustered and distributed computing ... and, again, you know, we're running a large scale distributed effort, right? In this case it's distributed data, it's federated data, not distributed computing, but the same things are true and probably the most important data that's being collected, you know, and managed is provenance or quality, right? It's data about data. It's metadata. You know, who did this experiment? You know, how did they do it? You know, these things are at least as, and maybe more important in many ways than the actual underlying data itself. But the methods and means to which we manage that ... and I don't necessarily mean even technologically speaking because there are several good metadata standards ... it’s how they're used for search, for analysis, [and] for presentation. I've got a slide in one of my decks I like a lot and it shows ... a scan of the first edition, first volume of the Journal of Transactions of Royal Society of England, right? This is, uh, the first scientific journal ... 1665 ... Newton and Hook and a bunch of other guys you probably have heard of get together and say, hey, we're gonna to publish our scientific work. Before that, you know, science was kind of like alchemy, right? You did it in secret. You didn't tell anybody what you were doing. It was all a big solitary thing. These guys got together and said, we're going to have regular meetings under the auspices of a royal society ... kind of a club ... and we're going to publish it and not only are we going to publish it but we're going to send it to anybody who asks for it. [These papers] got sent all over the world. In doing that these guys essentially created the enlightenment, and the biggest boom in scientific thought ever, and we're still being propelled forward by that, right? So, here's this incredible thing that they got by coming up with a way that they could share their experiences and get credit for it, OK? And get credit for it. Don't forget that.

13:47 [Greenberg] So, so is that the very first data portability for scientific data?

13:50 [M. Adams] I think it was .... There was a standard way to represent it. There was an article format, right? You'd send it in ..... It was a letter and they would publish it. There was a letter and it would say, Dr. Hook did present to us, uh, some experiments most fortuitous on springs and here's what he found, right? It was a way that these guys could share data, and it was interoperable in the sense that any scientist who could read the language - English - could understand it. Uh, and they could then write back their responses which might get included in another transaction. So that's point one, right? So you have this picture. Now, then I turn and say, well great, how do we share scientific data today? Guess what? We share it in journals. The only difference ... Newton would recognize these journals articles. They're not that different. They look the same. They have more or less the same format. Uh, you know, everybody kind of evolved more or less the same. We learned this in elementary school about what you do for an experiment. Have an hypothesis and you so some stuff. Newton could read and recognize the scientific journals of today. They would not be dissimilar. And in fact when we think about how we communicate scientific information, we publish in journals, we have meetings ... I just came back from a ... scientific meeting in Ohio .... We present it to one another .... We discuss it .... I mean the royal society model - 400 years old - is still the prevalent model. However, you know, the only real difference is that now the journals have colored covers, you know [rather than leather-bound covers]. However, you know, where we are with data - and by this I mean scientific data, think [of] the human genome or, you know, things like gene expression profiling or proteomics or nanotechnology or [physics and the use of super colliders which may generate giga-bytes per second or] any number of fields generating vast amounts of data .... So all of this data, all of a sudden, just presenting your summary results isn't accomplishing the scientific purpose, right? And the scientific purpose being [that] we want other scientists to understand what we did well enough that they could go do it themselves and verify it. Or not. Because what propelled the enlightenment forward wasn't just the fact that these guys were sharing data, it was that they were sharing enough about what they were doing that, uh, they could, uh, get other people to be able to verify it and so it wasn't just Mr. Hook's opinion, it was a scientific, verifiable fact.

16:08 [Greenberg] That, that raises a very interesting, uh, side point which is that, you know, I as a layman tend to sort of when I see something interesting I go and try and read, read the studies when I see something in the news, and so much of what I see is, uh, meta-analysis that [for instance] some [medical doctor] went into a statistical evaluation of fifty studies but we don't know the provenance of those particular studies and we don't know the statistics that those particular people used to get their original conclusions.

16:39 [M. Adams] And, most importantly, you can't, and I will say in no case could you go back and get the original data and do it yourself.

16:47 [Greenberg] But now what we are talking about ... so if the essence of science is reproducibility then making [available] this original data is really good science.

16:59 [M. Adams] Well, that's the point. Right now we're saying [that] beyond these journal articles where [the author summarizes] and you just have to take their word for it ... that's [bad science]. 'Take my word for it' is bad science, across the board. All scientists recognize that. Now, I'll sidebar it and hit a point in a second which I think is important and bears mentioning but the point is that when you can get the original data ... and in fact the [human] genome project ... probably the largest aggregation of annotated scientific information that you can have access to, is all open. You could, you could sit in your basement, write code to analyze that, come up with conclusions, uh, potentially publish them, uh, without any authorization on the part of some agency in power, right?. It's all there. Uh, if you go do a meta-analysis, as you say, of the human genome project data, or in fact in some case many other data sets that are publicly available, because the data is made available via the web, uh, then you can at least have a reasonable shot at doing the work. And in fact [many] of the tools ... are open source and freely available. Many of them are exposed to [grid] services, uh, that if you have the right experience and the right tooling you can call the same grid services and carry out the same analysis. And in fact the guys ... one of the groups on our project up there in Boston ... the gene pattern guys at the Broad Institute at MIT and Harvard ... the whole point of the gene pattern tool set is to provide a set of tools which intrinsically by their design capture the workflow and record what happened in a way that it can be shared with colleagues later in a fully reproducible way. So this now becomes a way of sharing results and the process for getting there in a way that if I were to give you that, you know, that data set and that gene pattern workflow you could go reproduce it exactly the way they reproduced it.

18:53 [Greenberg] Do you know what I have always felt [to be] fascinating as a follow-up point to that, is that ... in the lay community, the 99% of us who are not scientists, what we hear is that, uh, a study is released [in which] they proved 'X', this is good for you, this is bad for you, this is important, this is not important, and then six months later somebody comes out and says, uh, well, that’s, that's not true, right? And, so, I'm always very interested in these sorts of stories, and try to follow up on them, and the reason in the ones I've looked at is almost always the case of bad statistics. That the data was OK, and even for things that are peer reviewed, if ... the statistical analysis had an error it's not caught and that by making this data available you could have as part of the peer review process, they could double-check the statistics.

19:53 [M. Adams] Well, sure, I mean, you know, this is actually, I said I was going to sidebar a point, but you made it for me, Steve, and one of the reasons that this is the case, right? I mean, one of the reasons that people don't like to for example share the raw data behind these large longitudinal works that you describe, they will make the claim that well if I share it people don’t really understand that data set, how it's collected, and all the metadata about it, that they won't analyze it right and they'll make a mistake. Now, I always warn people to say that, you know, there's a world of difference between a mistake and 'don't agree with me'. And sometimes that distinction isn't all so clear in our community, right? The [position taken of] 'you must have made a mistake by definition because you agreed with me'. Uh, I think there is a little bit of a concern that, you know, that if some lay person, or some person who doesn't understand the statistics as well as the original investigator, gets their hands on the data then he or she will, uh, interpret the data in a mistaken way and in all caps CAUSE PROBLEMS, right? That's not by the way a belief that I share. Uh, you know, the whole point of science is to insure that when these mistaken things happen that there's enough people out there looking at them who can then publish a correction [and say] no, they did the statistics wrong, or, they made an error in their analysis. Unfortunately, there's a lot tied up in here, right? And particularly in biomedicine it’s not just, uh, somebody's ego, sometimes it can be, you know, national policy [that says] should we stop using this product? Uh ... it could be matters of economics [which says] if whole milk is bad for you, what does that do to the dairy industry?

21:43 [Greenberg] Look at the policy argument that came up over the statistical sampling for the census seven or eight years ago, right? So there's a whole argument to be made here but I tend to take the opinion that, uh ... by publishing the statistical methods you can, uh, make sure that your work is double-checked as part of the peer review process. If an only if to make sure that before it gets published it’s double-checked.

22:14 [M. Adams] And if and only if they've got access to the underlying data because the truth is that having the statistical methods, I mean, and this just comes down to practice, more true in epidemiology, uh, and so on where more data comes from survey or, you know, instruments that are not necessarily ... technical ... but, you know, where I've surveyed the community, I've asked questions, and anybody who does this can tell you that the creation of those surveys, questionnaires, and so on is a key science and art in and of itself, right? Without having access to the raw data, the way that the surveys were constructed and everything to compliment the statistical analysis, those papers will detail the statistical analysis, that actually is usually fairly thoroughly detailed. What's often missing, or at least often missing to the degree that it needs to be in order to reproduce the results, is access to the underlying, raw data. If I wanna go back and verify independently, access to that raw data is very important. And, again ....

23:10 [T. Adams] Well, isn't it true though that up until, you know, relatively recently [that] it was, uh, technically infeasible to push the data sets around so really the only thing that could make its way into the broader distribution is the summary.

23:26: [M. Adams] You know what, Trent, that's it in a nutshell. That's kind of the point I made about journals, right? You’re still printing the same thing, right? And I mean, and to be honest, these issues over access to data didn't start now, they started - guess again - [in] the seventeenth century, right? Remember Flamsteed and Tyco Brahe had been collecting data, in this case, you know, observations of stars ... I mean Newton goes and gets all this data from Huygens or Brahe or whomever, uh, and then comes up with his theories. Well, OK, was it Flamsteed the astronomer or Huygens or was it Newton [who should be given credit]? And it wasn't easy to get that data. Now, partly, as you point out, Trent, because the way you would get it is as a giant handwritten book, right, full of observations, he really didn't want to go copy it for him. But also ... not so different from our scientists today, the astronomers who collected those observations, that was their intellectual property. That was there work, right? So if you have a statistician who creates a big survey and goes to the trouble of getting it out there [to] all these people and collects it, that's their work and they would like to continue to mine that. They don't want some graduate student in Finland to go do the next big thing on it. And, worse yet, god help him if he goes and does the next big thing and contradicts their work. So, you know, the issues here are complex, right? It's always nice if there is a sort of simple scientific truth and that's what we'll go with, but there are issues of objectivity, safety, data privacy, and so on that get in there as well.

23:58 [T. Adams] Well, in fact to that exact point we've been talking a lot about, you know, the free form or free flow of data and the need for it, but touching more closely on the security and controls and protocols around that might be interesting. Uh, we kind of ... touched upon the CardSpace metaphor, and we've talked in the past about the Higgins Project and, you know, kind of their approach to that .... How do you come down or what's your approach to answering the question of you know (a) privacy, confidentiality and then [b] also security and veracity of the data?

25:35 [M. Adams] Well, you know, and again as I think I said before, I think there are many, many security approaches and frankly ... most of them are probably going to work. You know, we, number one you have to remember with regards to .... privacy of health data, you're bound by law, right? There are regulatory issues. You may have heard these things before but , you know, there's issues like HIPAA. Uh, you know, people talk and through around HIPAA a lot but I mean in essence HIPAA is pretty simple. It's says that a covered entity like a hospital can share that health data internally to that covered entity to the folks who need to see it, uh, but the covered entity's liable when that data gets out, right? So if that data ends up in the hands of a ... you know, outside of the covered entity, and is the result of the covered entities, you know, not doing what they were supposed to do, uh, that covered entity is liable, OK? So HIPAA, and you know this is why when you go to the doctor, you now, these days, you go to the doctor and the first thing they give you is a HIPAA form, and you are authorizing that doctor to act on your behalf, right, in moving the data around. Very interestingly, I mean, not to put too fine a point on it, when you use something like, uh, Google Health, you essentially are waiving your HIPAA rights. [I don't want to say waiving them], you're taking control. You're taking ownership of that data.

26:57 [Greenberg] But isn't, isn’t Google then the covered entity?

27:00 [M. Adams] You know, and I've always wondered how exactly how that's going to work. I ... I won't even say. I just don't know enough to have an answer to it. But I'll say this, it's complicated, OK? Because when you, you own your health data, if I go to Google and I tell Google go give my health data to this other entity then, you know, you did it, right? Nobody's acting on your behalf. Now, what I don't know is that if Google were to make an error, or to get hacked, or something bad were to happen, and unauthorized entities were to get access to that data, what the limits of their liability would be. I don't know. I don't know if it would be like a hospital, for example. I assume it would be but, you know, I mean here's why it matters. Let's back up a step, and we're talking a bit about security, right? We, we from a security standpoint use a standard security model, right? Uh, we are using a security model that is derived from the Globus framework, which is what we use for our grid tooling. It's called the common security model and it’s been developed in a number of places but it ... at NCI [National Cancer Institute] as well. But it makes use of all kinds of stuff you would find familiar, certificate granting authorities, uh, you know access ... controls that are based on possession of standard XY hundred certificates, uh, uses a tool called grid grouper which is another derivative of the Globus project which allows you to, you know, essentially create groups, access control groups, to your data, information or resources, based on groups or individuals whose authentication may have occurred at another institution, one who could do a solid authentication but then the authorization to the local tools is made using these [unintelligible]. We work quite a bit with the folks who did the Shibboleth Project, right? GridShib, uh, in creating the context. And all of this stuff is on the website and there are folks who know vastly more technically than I do. The point is, take a standard security framework that’s based on well-accepted, well-understood tools. Use standard tooling, like X.500 certificates provide that, and then think very hard about what the context of access to that information is. What the restrictions should be. The data sharing and intellectual capital group within the project, uh, which just purely spends its time thinking about these issues and coming up with ways to evaluate what kinds of things, uh, you know you should provide access to and under what context, and working with people like John Wilbanks at the Science Commons Foundation is a way for us to think about creating license models or flow charts or work flows that will allow people to sort of make determinations about what kind of security does any kind of data need. And it's our goal at the end of the day, uh, to provide, starting at [NCI caGrid Level 2 Certificate Authority], and then working our way upwards for, you know, the strength of the security, you know, again, just going under standard federal models for this stuff, uh, and then helping people think about, well what kind of strength of security do you really need? Isn't click-thru EULA enough? Uh, is it something that can be open-access? Uh, is this something that needs a bi-directional materials transfer agreement between entities and strong authorization before they can get it? Maybe identifiable health data might fall under that category.

30:14 [T. Adams] So to follow that along the, you know, part one is do you guys find that right now it's, there's been a lot of talk, a lot of work in the various standards, competing ones exist, rather than continuing to go down the research road of figuring out which is 'the best', figure out which ones meet the existing criteria and begin implementing them and work iteratively towards a solution.

30:44 [M. Adams] Well, that's what we're doing. I mean, and this is why we haven’t tried to create these things. We've just taken the ones that exist. We've taken an existing federal framework, right? In this case it's a [NCI caGrid certification] framework, right? Uh, you know, in saying how strong the security needs to be, not what is it or what tools does it need, but just, what's the strength? Um, and then essentially re-using these pre-existing tools. I mean like I said, X500 certificates. These are things that are already widely used, right? Go to VeriSign. So using kind of the components and tooling that are widely available and then implementing them, now again, you know, our project is open. It's an open source project and the tooling that we use is that's that available [as] open source. Um, so we're not burdening our, you know, our users on an open source project with non-open source tooling. But that's really the only restriction. Other than that, you know, our goal is to find something that works, that somebody else did. Make sure it aligns with the appropriate guidelines, whether those be regulatory in nature like HIPAA or whether they be technical in nature or whether they be, you know, standards in nature and then move forward. We're more interested, frankly in kind of what the framework is for, uh, the utilization of the tools. You know, making sure people are appropriately utilizing the tools, than we are in, you know, what the specific tools are.

32:07 [T. Adams] So you know in fact exactly along those lines, you know, as far as, I don't know how close you have been following it, but in the DataPortablity Project itself, um, there's an interesting discussion going on, around a couple of different camps. You know, one camp is, uh, you know, a really deep technical stack is required as an output of the organization. And then another camp is, and you know there is a Venn diagram of overlap, it's not like warring factions, you know, something along the lines of a, uh, suggested guidance framework, uh, guidelines, best practices, model that people are able to kind of look at and pick and choose, you know, the things underneath the covers that, you know, would fit the bill.

32:56 [Greenberg] The two camps I would more accurately describe as specify the methods versus specify the outcomes. And each, you know, when you specify the methods you can verify compliance much more easily, and when you specify the outcomes you recognize that there is a, there's still controversy in the methods and, uh, and that you say, alright, you know, we'll let the industry converge on what it wants.

33:25 [M. Adams] How important is it to specify the methods and to what degree of specification? In other words is that necessary outcome? The way you are making is sound, Steve, is that you kinda have to end up there but there are several routes to it. Or is ... an outcomes specification, uh, a reasonable end-point? Like, hey, you know, as long as these things connect we don't care what happens behind it.

33:45 [Greenberg] Well that is a, uh, a point of active discussion within the organization. I can ... my personal take is that, uh, is that the technology standards themselves are not settled and that, uh, and that what we really need to do is get agreement on what the actual problem is we're trying to solve and then let industry, or researchers, or whoever solve it in the best way that fits their particular needs. Uh, because there's a fair amount of controversy in a lot of these technical conversations and I personally think it's too early to pick a winner.

34:27 [M. Adams] I always wonder whether it's ever time to pick a winner, uh, when it's a particular technology stack, right? I mean, but, you know it's why I asked the question about outcomes. It always seems to me, and again I'm an outsider in this particular discussion, but having the outcome, having a mechanism and saying, look, here's the point at which the exchange is going to take place. We talk a lot about touch points, right? Identifying the touch points and saying, that has to be typed and specified, but when you ride up away from the touch points in these things seems like you can ease back in the specificity requirement and hopefully not have to require a particular technology stack.

35:07 [T. Adams] The reason I was asking the question, uh, was if when you guys, Mark, were starting this up and looking at the solutions you needed to pull together, if, if an entity existed or a set of guidelines existed, would it have made your life simpler and would it have been of value or would you have said, well, ok, that suggested set of guidelines is not necessarily in the line with what we are looking at so we're going to go under the covers and look at the technology and redo our own guidelines anyway.

35:45 [M. Adams] Well, look, number one, we did look at the standards. We went with NIS [Network Information Service] standards for security, for example. Uh, we made use of existing grid standards, for example, for, you know, uh, data security infrastructure.

36:00 [Greenberg] But I would argue that the reason you were able to use those NIS standards is that through the '70s and '80s and early '90s there were long running arguments about Kerberos versus you know the other ones whose names I know longer even remember, and that these things were hashed out, and that these were, these were .... What NIS did was ratify de facto standards rather than attempt to assert de jure standards.

36:26 [M. Adams] Almost certainly. I mean, you know how things work in NIS. I mean its, again what ....

36:34 [Greenberg] But that's the right answer. My point is that's the right answer ....

36:37 [M.Adams] I don't think there's another ... answer. I mean, I think there's a tendency on the part of industry - perhaps our industry although we don't really have one, uh, but there's a tendency on the part of industry - broadly - to try to skip to a technology stack as a means of adopting standards quickly, right? And we see this all the time. I mean, we see this regularly. Like, OK, you've got to use this technology stack in order to influence standards. Now, of course, the reasons for this are simple, right? There's a lot of money tied up in those technology stacks. A lot of money to be made on owning the standard for anything. So when you go sit on these standards committees which ever they be, not the government ones incidentally, at least not as much, but , you know, certainly the industry standards committees, they're loaded with guys who, you know, want to push their technology standard, uh, partly if not wholly because they have a business advantage in doing so. So, you know, one has to be careful in how one creates the standards but one needs them, right? But this is why again I sort of say trying to divorce standards as cleanly as possible from their underlying technology implementation is an important thing to do, but the reason being, it allows you in a kind of an unconfined environment to determine standards that can then be widely adopted and used without the complexity or the risk of creating a lock-in, right? And, you know, I guess especially because we've all be through this sort of tech wars in the 70's, 80's and even the late 90's. Uh, we've all been through that and we know generally that's a mess, and it’s a mess we then have to reverse engineer our way out of.

38:14 [Greenberg] Even so ... so I would say, Mark, that even aside from the commercial, uh, desire to get your own particular proprietary standard adopted there's also a real sort of human impatience at work that people say, oh my god, it's 2008, can't we just skip to the end, we all know what the answer is. We all know what the right answer is. This technology or that technology, can we get there? And the problem is we don't all know that this particular technology is the right answer, and people sort of .... They want to skip that step, but you really can't, and you have to wait and let things play out and then only after, only after there's been real convergence on a standard can you really ratify it. But that's my opinion anyway.

38:59 [M. Adams] Yeah, and I'd agree with that a hundred percent. You know, and that’s, that I think is why this standards process is an important one, and that's one .... You know, not only standards process [unintelligible] but just the fact that people endorse from the beginning that they are going to use the standards.

39:12 [Greenberg] Well, that's why it's a process, right? So agreeing on a standard is a fairly long-term process. It is not simply waking up one day and saying, you know what, this one looks pretty good. Ding! It is so decreed.

39:25 [M. Adams] Yeah, I agree with that.

39:26 [T. Adams] Well, you know, to kind of close out the discussion. Mark was telling me the other day, uh, about entering into some meetings, um, where he was bringing up actually some data portability and Web 2.0 stuff in a kind of interesting context. But I don't remember the details.

39:42 [M. Adams] Yeah, I mean .... I'll give you the anecdote. The quick anecdote. You know, here's what's driving this, right, and it's scientists, uh, and others increasingly, uh, you know, are making use of the electronic technologies as a means to exchange data. You know, they're using our project. They're using others. Using just the web itself as a means to exchange data. It's becoming an important, and increasingly important part of the work. And obviously, just to be honest, an increasing influx of younger scientists who sort of grew up using the 'net or the web as part of their lives is driving that as well. That said I was at the ... a meeting in Ohio last week on the roll of bioinformatics in overcoming barriers in cancer research and, you know, I raised sort of, you know, let's call it social networking - Web 2.0 - uh, as a means to create the kinds of communities that, uh, can support and ultimately engender this kind of close cooperation and coordination between scientists the way that we see it in other fields, right? So Web 2.0 is a means to sort of create and nurture communities and also providing an avenue for data exchange is natural fit for scientists who are .... And particularly bioinformatics like I am who are spending a lot of their time exchanging and managing data online. Um, and we see some examples of this by the way. Myexperiment.org from the UK. Great example of a scientific community, uh, which is around, you know, the exchange of work flow, scientific work flows, for analyzing data. People have a .... They have a blog, and they have a profile, and all the stuff you'd be familiar with. But what they're exchanging for the most part are workflows or recipes for, uh, analyzing scientific data and creating a community around that. So, you know, this is being done - SciLink is the other ... kind of LinkedIn for scientists which leverages the natural social networking that occurs with regards to authorship of papers, for example, to create a community. So scientists are starting to do this increasingly and sort of the plea I made at the meeting was, hey, we should kind of thoughtfully and proactively thinking about how to integrate this in our strategies going forward ....

41:46 [Greenberg] I like the idea ... so on Facebook they have a 'hooked-up with' so you should have a 'published with'.

41:53 [M. Adams] Well, SciLink does that. As a matter of fact you can go in there and say, hey, who is this? It's like a ... if you've ever used LinkedIn, right? Hey, how many connections am I linked to, to other people in the same field? Actually, it's even more useful probably than just, you know, hey, I know this guy because he knows this guy that's in LinkedIn. Your saying a field Erdos number. You know about that, Steve?

42:22 [T. Adams] Mark, what's your Erdos number?

42:23 [M. Adams] My Erdos number is three. So, which means .... Steve when we talk about an Erdos number, there's a mathematician, Paul Erdos, who, you know, was an itnerant, basically. The latter part of his life he didn't own a house or anything. He just kind of lived with his mathematician friends and worked on papers. This gave rise to a very broad range of collaborations, as you can imagine. Uh, so, what did that mean? Well that meant that lots of different people collaborated with Paul Erdos on a range of very interesting mathematical concepts. The idea that was given rise to ... it's kind of like six degrees of Kevin Bacon, right? Well, if I published with Paul Erdos then I have Erdos number one, and if I publish with somebody else who published with Paul Erdos then I have an Erdos number of two. So my Erdos number is three. I published with Temple Smith who published with Stanislaw Ulam who published with Paul Erdos. And, again, the reason I raise the point is just to say, these social networking things actually have an existential reality for scientists because it actually matters who you publish with. Uh, because you share fields, you share interests, you share commonalities. It's actually a pretty good way to find potential collaborators. So, you know, and, again, [these are] scientists, right? Let me roll this back and sort of make the point. Well, where else, I mean, we've been talking about, you know, all of these means and technologies to share and exchange data, and we've talked about it in a very abstract sense. Well, there are these guys, scientists, they wear lab coats, they hack on computers in their basements or whatever. Well, you can go to a website today called 23andme [and] for about 900 bucks you can do a half a million nucleotide polymorphism genome scan of your genome. You spit in a tube, send it to them, they send it back with an analysis of your genome. And, you know, for those of you of a certain lay community ... polymorphisms in your genetics can be determinative of all kinds of things. Risk of disease. Reaction to certain medications. Uh, you know, character traits even. I mean, all kind of things. Uh, so, this is a pretty neat thing and the fact that anybody with a thousand bucks .... It's like Gattica. You know, you can go get a scan and then they have a whole bunch of tools at 23andme for kind of analyzing it. But the fact is that more and more data is going to come in. And better and better refinement around these, [the] predictive ability of these kind of point mutations within the genome in terms of what implications they have, are going to become more and more significant, right? Over time that's going to become more meaningful. So, you're a regular guy. You decide you want to do this, and there's some tools you find, and you can find out things that have import in all different aspects of your life. Now, I say there's a big data set for which there’s are all kinds of incentives to find things out about, and potentially to, uh, exist in a community of people who are thinking about this, some professionally, and some as amateurs, and some people as pure consumers. How's that going to play out long term with regards to security, a means to safely exchange that data, safely and anonymously exchange that data. To exploit it, both from a positive, and then potentially from a negative standpoint. I mean, this is pretty interesting stuff. And this isn't science fiction. This isn't the realm of Gattica anymore. You can go do it today. So, you know, this .... Because of that, you know, these issues that have been eating away at us in the informatics field for twenty years are suddenly sort of appearing on the public stage in a very profound way, and it's going to be interesting to see how, as the parallel development of the sort of data portablity and social networking and these things move in parallel, how that's going to enhance, extend and enable, uh, people to actually take more control of the electronic health record like we talked about [regarding] Google or HealthVault with their own genetic data and their own exploitation of that data in a secure way. Very, very interesting and profound. And I think very challenging for data portability in the future. You know, it matters.

46:14 [T. Adams] Well ... along those lines then, uh, you know, we started off the show, uh, kind of broadening the scope a little bit and, uh, you know, kind of getting outside our little bubble of social networking sites, talking to each other about who is a friend of who (sic), and we have ended the conversation incredibly well in, in why the work we are doing in the Dataportablity project has ... could have a much broader impact and, you know, wider implications beyond, you know, the Silicon Valley, and so, you know, I guess I would just leave, you know, leave the podcast, uh, with this question out to you, Mark, and that is: Do you see value in the community that you interact with as, you know, folks we should reach out to and bring into the discussion at the Dataportablity project at this stage?

47:10 [M. Adams] Yeah, I do. I mean, and you know, again, because we're starting to think and talk about, you know, social networking and we're starting to think and talk about data portability beyond our own narrowly construed portability, although, you know, I think Steve makes the point, it's like sometimes you don't know if there ought to be a standard there. You’ve got to play around with it a bit. You know, I'm beginning to wonder whether it might not be a bad idea to maybe invite a couple of DataPortability people to come to our meeting. We're going to have an annual meeting in June [2008] here in D.C. It's free and open to the public. Uh, where people are going to talk about the tooling and the things we are doing. Wondering if we ought to find a couple of people and invite them to attend.

47:45 [T. Adams] So then it sounds as if we are in violent agreement, uh, that (a) uh, we need to send some emissaries your direction, and (b) uh, we need to try to extract some folk from your camp and bring you into the discussions that we're having as well.

48:03 [M. Adams] Yeah, agreed.

[Finishing banter, etc.]

References (29)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Source: DataPortability: In-Motion Podcast - Episode 10

by Host Trent Adams, Co-host Steve Greenberg at Blank (Media) Slate on June 2, 2008

"Trent’s brother R. Mark Adams joins the data portability discussion. He is a genetic engineer who earned his Ph.D. in cell biology and was a pioneer in the field of bioinformatics. He is currently a Senior Associate at Booz Allen Hamilton and runs their bioinformatics group. Of specific interest related to data portability is his work for the open CaBIG (Cancer Biomedical Informatics Grid) project, a National Cancer Institute initiative to link cancer researchers and their data."
Related: CaBIG - The cancer Biomedical Informatics Grid

at Wikipedia

"The cancer Biomedical Informatics Grid®, or caBIG® is an initiative of the National Cancer Institute, part of the National Institutes of Health. caBIG®, is a voluntary virtual informatics infrastructure that connects data, research tools, scientists, and organizations to leverage their combined strengths and expertise in an open federated environment with widely accepted standards and shared tools."
Related: Health Level 7

at Wikipedia

"Health Level Seven (HL7), is an all-volunteer, not-for-profit organization involved in development of international healthcare standards. “HL7” is also used to refer to some of the specific standards created by the organization (i.e. HL7 v2.x, v3.0, HL7 RIM etc.)."
Related: XML

at Wikipedia

"The Extensible Markup Language (XML) is a general-purpose specification for creating custom markup languages.[1] It is classified as an extensible language, because it allows the user to define the mark-up elements. XML's purpose is to aid information systems in sharing structured data, especially via the Internet, [2] to encode documents, and to serialize data."
Related: Clinical Data Interchange Standards Consortium

at Wikipedia

"Clinical Data Interchange Standards Consortium (CDISC) is a non-profit organization, whose mission is "to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare". Their main project, the described data standard, bears the same name."
Related: Google Health

at Wikipedia

"Google Health is a personal health information centralization service (sometimes known as personal health record services) by Google. The service allows Google users to volunteer their health records -- either manually or by logging into their accounts at partnered health services providers -- into the Google Health system, thereby merging potentially separate health records into one centralized Google Health profile."
Related: Booz Allen Hamilton

at Wikipedia

"Booz Allen Hamilton is a private consulting firm and government contractor with headquarters in McLean, Virginia, USA and 63 offices throughout the nation."
Related: Microsoft HealthVault

at Wikipedia

"Microsoft HealthVault is a platform from Microsoft to store and maintain health and fitness information. Started in October 2007, the website is accessible at www.healthvault and addresses both laymen and healthcare professionals."
Related: Windows CardSpace

at Wikipedia

"Windows CardSpace (codenamed InfoCard), is Microsoft's client software for the Identity Metasystem. CardSpace is an instance of a class of identity client software called an Identity Selector. CardSpace stores references to users' digital identities for them, presenting them to users as visual Information Cards. CardSpace provides a consistent UI that enables people to easily use these identities in applications and web sites where they are accepted."
Related: Royal Society

at Wikipedia

"The Royal Society started publishing in 1665, very soon after it was founded, and currently publishes seven, high quality peer-reviewed journals covering: biological and physical sciences; history and philosophy of science; and cross-disciplinary research at the interface between the physical and life sciences. The list includes the world's longest running scientific journal, Philosophical Transactions of the Royal Society."
Related: Broad Institute

at Wikipedia

"The Broad Institute of MIT and Harvard is an American research institute dedicated to the study of genomics for the biomedical sciences."
Related: Tycho Brahe

at Wikipedia

"Tycho Brahe, born Tyge Ottesen Brahe (December 14, 1546 – October 24, 1601), was a Danish nobleman known for his accurate and comprehensive astronomical and planetary observations."
Related: Isaac Newton

at Wikipedia

"Sir Isaac Newton, FRS (4 January 1643 – 31 March 1727 [OS: 25 December 1642 – 20 March 1726] was an English physicist, mathematician, astronomer, natural philosopher, alchemist, theologian and one of the most influential men in human history."
Related: John Flamsteed

at Wikipedia

"John Flamsteed FRS (19 August 1646 – 31 December 1719) was an English astronomer and the first Astronomer Royal."
Related: Christiaan Huygens

at Wikipedia

"Christiaan Huygens (April 14, 1629 – July 8, 1695) was a Dutch mathematician, astronomer and physicist; born in The Hague as the son of Constantijn Huygens, a friend of René Descartes."
Related: Globus Alliance

at Wikipedia

"The Globus Alliance is an international association dedicated to developing fundamental technologies needed to build grid computing infrastructures. The Globus Alliance was officially established in September 2003, however it was created out of the previous Globus Project that had been established in 1995."
Related: National Cancer Institute

at Wikipedia

"The National Cancer Institute (NCI) is part of the United States Federal government's National Institutes of Health. The NCI is a federally funded research and development center, one of eight agencies that compose the Public Health Service in the United States Department of Health and Human Services."
Related: Shibboleth Project

at Shibboleth Project

"The Shibboleth System is a standards based, open source software package for web single sign-on across or within organizational boundaries. It allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner."
Related: X.500

at Wikipedia

"X.500 is a series of computer networking standards covering electronic directory services. The X.500 series was developed by ITU-T, formerly known as CCITT. The directory services were developed in order to support the requirements of X.400 electronic mail exchange and name lookup. ISO was a partner in developing the standards, incorporating them into the Open Systems Interconnection suite of protocols. ISO/IEC 9594 is the corresponding ISO identification."
Related: John Wilbanks

at Wikipedia

"John Wilbanks is an American entrepreneur, scientist, and engineer. He is a vice president of Creative Commons and the executive director of Science Commons as of 2008."
Related: Science Commons

at Wikipedia

"Science Commons (SC) is a Creative Commons project for designing strategies and tools for faster, more efficient web-enabled scientific research. The organization identifies unnecessary barriers to research, crafts policy guidelines and legal agreements to lower those barriers, and develops technology to make research data and materials easier to find and use. Its goal is to speed the translation of data into discovery and thereby the value of research."
Related: Kerberos (protocol)

at Wikipedia

"Kerberos is a computer network authentication protocol, which allows individuals communicating over a non-secure network to prove their identity to one another in a secure manner."
Related: Network Information Service

at Wikipedia

"The Network Information Service or NIS (originally called Yellow Pages or YP) consists of a client-server directory service protocol for distributing system configuration data such as user and host names between computers on a computer network. Sun Microsystems developed the NIS and licenses this technology to virtually all other Unix vendors."
Related: Myexperiment.org

at Myexperiment.org

"myExperiment makes it really easy to find, use and share scientific workflows and other files, and to build communities."
Related: Paul Erdos

at Wikipedia

"Paul Erdős (Hungarian: March 26, 1913 – September 20, 1996) was an immensely prolific (and famously eccentric) Hungarian mathematician. With hundreds of collaborators, he worked on problems in combinatorics, graph theory, number theory, classical analysis, approximation theory, set theory, and probability theory."
Related: 23andMe.com

at 23andMe

"23andMe's mission is to be the world's trusted source of personal genetic information. "
Related: CaGrid

at Wikipedia

"The cancer Biomedical Informatics Grid®, or caBIG® is an initiative of the National Cancer Institute, part of the National Institutes of Health."
Related: Venn diagram

at Wikipedia

"Venn diagrams or set diagrams are diagrams that show all hypothetically possible logical relations between a finite collection of sets (groups of things). Venn diagrams were invented around 1880 by John Venn. They are used in many fields, including set theory, probability, logic, statistics, and computer science."
Related: R. Mark Adams

at Booz Allen Hamilton

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

The @WholeChainCom™ Blog