The explosive demand for providing global data access and integrated business solutions
The following quoted text (indented at the bullet) is taken from the statement of prior art in US Patent 6,988,109 entitled System, method, software architecture, and business model for an intelligent object based information technology platform and filed in 2001 by IO Informatics, Inc.
- As demand for Information Technology (IT) software and hardware to
provide global data access and integrated business solutions has
exploded, significant challenges have become evident. A central problem
poses access, integration, and
utilization of large amounts of new and valuable information generated
in each of the major industries. Lack of unified, global, real-time
data access and analysis is detrimental to crucial business processes,
which include new product discovery,
product development, decision-making, product testing and validation,
and product time-to-market.
With the completion of the sequence of the human genome and the continued effort in understanding protein expression in the life sciences, a wealth of new genes are being discovered that will have potential as targets for therapeutic intervention. As a result of this new information, however, Biotech and Pharmaceutical companies are drowning in a flood of data. In the Life Sciences alone, approximately 1 Terabyte of data is generated per company and day, of which currently the vast majority is unutilized for several reasons.
First, data are contained in diversified system environments using different formats, heterogeneous databases and have been analyzed using different applications. These applications may each apply different processing to those data. Competitive software, based on proprietary platforms for network and applications analysis, have utilized data platform technologies such as SQL with open database connectivity (ODBC), component object model (COM), Object Linking and Embedding (OLE) and/or proprietary applications for analysis as evidenced in patents from such companies as Sybase, Kodak, IBM, and Cellomics in U.S. Pat. Nos. 6,161,148, 6,132,969, 5,989,835, 5,784,294, for data management and analysis, each of which patents are hereby incorporated by reference. Because of this diversity, despite the fact, that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, current data mining tools cannot handle all data simultaneously. There is a significant lack of data handling methods, which can utilize these data in a secure, manageable way. The shortcomings of these technologies are evident within heterogeneous software and hardware environments with global data resources. Despite the fact that the seamless integration of public, legacy and new data is crucial to efficient research (particularly in the life sciences), product discovery (such as for example drug, or treatment regime discovery) and distribution, current data mining tools cannot handle or validate all diverse data simultaneously.
Second, with the expansion of high numbers of dense data in a global environment, user queries often require costly massive parallel or other supercomputer-oriented processing in the form of mainframe computers and/or cluster servers with various types of network integration software pieced together for translation and access functionality as evidenced by such companies as NetGenics, IBM and ChannelPoint in U.S. Pat. Nos. 6,125,383[,] 6,078,924, 6,141,660, 6,148,298, each of which patents are herein incorporated by reference--(e.g. Java, CORBA, "wrapping", XML) and networked supercomputing hardware as evidenced by such companies as IBM, Compaq and others in patents such as for example U.S. Pat. Nos. 6,041,398, 5,842,031, each of which is hereby incorporated by reference. Even with these expensive software and hardware infrastructures, significant time-delays in result generation remain the norm.
Third, in part due to the flood of data and for other reasons as well, there is a significant redundancy within the data, making queries more time consuming and less efficient in their results.
Fourth, an additional consideration, which is prohibitive to change towards a more homogenous infrastructure, is cost. The cost to bring legacy systems up to date, to retool a company's Intranet based software systems, to carry out analysis with existing tools, or even to add new applications can be very expensive. Conventional practices require retooling and/or translating at application and hardware layers, as evidenced by such companies as Unisys and IBM in U.S. Pat Nos. 6,038,393, 5,634,015.
Because of the constraints outlined above, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of architecture to overcome these obstacles is needed.
These are not the only limitations. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, the state of object data is crucial for their overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. Furthermore, because biological data describe a "snapshot" of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. Therefore, there is a growing need for a object data state processing engine, which allows to continuously monitor, govern, validate and update the data state based on any activities of intelligent molecular objects in real-time.
Data translation processes between different data types are time-consuming and require provision of information on data structure and dependencies, in spite of advances in information technology. These processes, although available and used, have a number of shortcomings. Data contained in diversified system environments may use different formats, heterogeneous databases and different applications, each of which may apply different processing to those data. Because of that, despite the fact that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, several different applications and/or components have to be designed in order to translate each of those data sets correctly. These require significant effort and resources in both, software development and data processing. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, access to all data is crucial for overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. The current individual data translation approach does not support these needs. Because biological data describe a "snapshot" of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. The latter requires real-time processing and automated, instant data translation of data from different sources. Therefore, there is a growing need for an object data translation engine, which allows for bi-directional translation of multidimensional data from various sources into intelligent molecular objects in real-time.
The flood of new and legacy data results in a significant redundancy within the data making queries more time consuming and less efficient in their results. There is a lack of defined sets of user interaction and environment definition protocols, which are needed to provide means for intelligent data mining and optimization in result validation towards real solutions and answers. An additional consideration, which is prohibitive to change towards a more homogeneous infrastructure is the missing of object representation definition protocols to prepare and present data objects for interaction within heterogeneous environments. Lastly, data currently are interacted with and presented in diverse user interfaces with dedicated, unique features and protocols preventing universal, unified user access. Thus, a homogeneous, unified presentation such as a web-enabled graphical user interface which integrates components from diverse applications and laboratory systems environments is highly desirable, but currently non-existent for objects in real-time.
Because of these constraints, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of an architecture and unifying user interface to overcome these obstacles is needed.
References (13)
-
US Patent 6,988,109 (IO Informatics, Inc., filed 6 Dec 2001) -
"A computer method and apparatus enable object-linking-and-embedding controls to directly communicate with each other and share resources." US Patent 6,988,109 (Kodak Limited, filed 27 Sept 1996) -
"The present invention provides methods and systems for testing and confirming how well a network model represents a biological pathway in a biological system. The network model comprises a network of logical operators relating input cellular constituents (e.g., mRNA or protein abundances) to output classes of cellular constituents, which are affected by the pathway in the biological system." US Patent 6,132,969 (Rosetta Inpharmatics, filed 19 Jun 1998) -
Related: System for cell-based screening"The invention involves providing cells containing fluorescent reporter molecules in an array of locations and scanning numerous cells in each location with a fluorescent microscope, converting the optical information into digital data, and utilizing the digital data to determine the distribution, environment or activity of the fluorescently labeled reporter molecules in the cells." US Patent 5,989,835 (Cellomics, Inc., filed 27 Feb 1997) -
"A computer-based method and system describes molecules in a most fundamental and compact way using a set of attributes of the molecule derived from data representing the atomic structure and atomic charge of the molecule." US Patent 5,784,294 (IBM Corporation, filed 9 Jun 1995) -
"The drug discovery research system provides for at least one of the plurality of computers to run a multi-platform object oriented programming language, and at least one of the plurality of computers to store drug discovery related data. The system has a network architecture interconnecting the plurality of computers. The network architecture allows objects to transparently communicate with each other." (Netgenics Corp., filed 11 Jun 1997) -
"An information platform automates the collection of data, provides a method for organizing the library of information and provides analysis using multiple content-types, thereby providing a user with a market understanding necessary to execute rapid and knowledgeable decision making." (Aeneid Corporation, filed 30 Jan 1998) -
"A method, apparatus, and article of manufacture for generating class specifications for an object-oriented application that accesses a hierarchical database." US Patent 6,141,660 (IBM Corporation, filed 16 July 1998) -
"A method for aggregating distributed information from a plurality of data sources each having an address. A plurality of user criteria are received and site specific information describing idiosyncrasies of each data source are stored." US Patent 6,148,298 (ChannelPoint, Inc., filed 23 Dec 1998) -
"A massively parallel diagonal-fold mesh array processor provides a triangular diagonally folded mesh computer with the same functionality as a square mesh computer but with half the number of connection wires". US Patent 6,041,398 (IBM Corporation, filed 26 Jun 1992) -
"A computer system having a plurality of processors and memory including a plurality of scalable nodes having multiple like processor memory elements." US Patent 5,842,031 (IBM Corporation, filed 6 Jun 1995) -
"A programmed computer system transforms a distinctive representation of a business model into a generic representation format, such as the Unified Modeling Language ("UML") object model." US Patent 6,038,393 (Unisys Corp., filed 22 Sep 1997) -
"A generic high bandwidth adapter providing a unified architecture for data communications between buses, channels, processors, switch fabrics and/or communication networks." US Patent 5,634,015 (IBM Corporation, filed 4 Oct 1994)


Reader Comments