WO2013036677A1

WO2013036677A1 - Medical informatics compute cluster

Info

Publication number: WO2013036677A1
Application number: PCT/US2012/054010
Authority: WO
Inventors: Rob WYNDEN; Andrew V. NGUYEN; Michael Bobak
Original assignee: The Regents Of The University Of California
Priority date: 2011-09-06
Filing date: 2012-09-06
Publication date: 2013-03-14

Abstract

Provided are systems and methods for aggregating, managing, and/or analyzing data, such as medical data, via a Medical Informatics Compute Cluster (MICC). In certain aspects, the methods include obtaining signal data for a subject from one or more body sensors; storing data associated with the subject; associating respective ones of the signal data with respective ones of the data associated with the subject; and storing at least a portion of the signal data or associations between the signal data and the data associated with the subject. Further, in certain aspects the methods include associating patterns within the signal data with patterns associated with the subject. The methods may facilitate creating a retrospective analysis of an event (e.g., an adverse event such as a heart attack, seizure, or hypoglycemic event), detecting the occurrence of an event (e.g., a hypoglycemic event), and/or predicting the future occurrence of an event (e.g., a hypoglycemic event). Systems for use in practicing methods of the invention are also provided. The systems and methods have utility in a variety of clinical and non-clinical applications.

Description

MEDICAL INFORMATICS COMPUTE CLUSTER

CROSS-REFERENCE TO RELATED APPLICATION

[0001] Pursuant to 35 U.S.C. § 119 (e), this application claims priority to the filing date of the United States Provisional Patent Application Serial No. 61/531,572, filed September 6, 2011; the disclosure of which is incorporated herein by reference.

INTRODUCTION

[0002] Complex systems such as commercial aircraft often contain data recorders that can enable retrospective analysis in the event of failure or other major event. For example, airplanes often contain a cockpit voice recorder that collects a continuous recording of cockpit conversations and other ambient noises, and a flight data recorder that measures continuous data on many parameters that affect a plane's performance. Combining this information after a major event, such as a crash or near-crash, can provide a valuable starting point to investigators looking to determine the cause(s) of the event.

[0003] However, such retrospective analysis is often impossible in other settings because data that is analogous to that collected by such aircraft "black boxes" is impossible or difficult to collect, store, combine, and/or mine. For instance, in a clinical setting most information that hospitals and other health care providers generate for each patient often must be discarded because there is no adequate way to collect and store such information. Further, even if such information is stored, there is often no adequate way to combine or mine that information to as to use it effectively, such as to enable a retrospective analysis of an adverse event, detect the occurrence of an event, and/or to facilitate predictions of the occurrence of a future event.

SUMMARY

[0004] Provided are systems and methods for aggregating, managing, and/or analyzing data, such as medical data, via a Medical Informatics Compute Cluster (MICC). In certain aspects, the methods include obtaining signal data for a subject from one or more body sensors; storing data associated with the subject; associating respective ones of the signal data with respective ones of the data associated with the subject; and storing at least a portion of the signal data or associations between the signal data and the data associated with the subject. Further, in certain aspects the methods include associating patterns within the signal data with patterns associated with the subject. The methods may facilitate creating a retrospective analysis of an event (e.g., an adverse event such as a heart attack, seizure, or hypoglycemic event), detecting the occurrence of an event (e.g., a hypoglycemic event), and/or predicting the future occurrence of an event (e.g., a hypoglycemic event). Systems for use in practicing methods of the invention are also provided. The systems and methods have utility in a variety of clinical and non-clinical applications.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The invention may be best understood from the following detailed description when read in conjunction with the accompanying drawings. Included in the drawings are the following figures:

[0006] FIG. 1 is a graphical illustration of an example medical informatics system.

[0007] FIG. 2 is a graphical illustration of one implementation of a SACS.

[0008] FIGs. 3-4 are logical diagrams of example MICC systems.

[0009] FIG. 5 is a network diagram of one example of a MICC implementation

[0010] FIG. 6 is an example workflow of how MICC can be used when integrating with an existing infrastructure.

[0011] FIGs. 7-8 are graphical illustrations showing the processes of feature extraction, change detection, and machine learning applied to data obtained from one or more biomedical sensor(s).

[0012] FIGs. 9-14 are graphical illustrations of example information displays that can be rendered by entities associated with a medical informatics system

[0013] FIG. 15 is a block flow diagram of a process of managing user data.

[0014] FIG. 16 shows results of feature extraction on arterial blood pressure waveform data, depicting variants of the Sliding Window PSD algorithm with varying parameters. The light gray plot has a window size of 1024 points (roughly 8 seconds) and a skip size of 500 points (4 seconds); the blue plot has a window size of 1024 points and a skip size of 125 points

(1 second); the brown plot has a window size of 512 points (roughly 4 seconds) and a skip size of 128 points (roughly 1 second); all have a period of 0.008 sec (125 Hz) which is the period of the recorded samples.

[0015] FIG. 17 shows the impact of initial values on single-pass algorithms. In the top panel, the plot is zoomed out on the Y-axis (measured in volts), where the signal ranges from 0 to 50 V. The bottom panel is moved forward in time 20 seconds, where the EKG then reads between -300 and 300 mV.

[0016] FIG. 18 shows the application of several feature extraction algorithms on the arterial blood pressure waveform data immediately before and after the onset of ventricular tachycardia. The top chart shows the application of the Sliding Window PSD algorithm with several different parameters. The second chart shows the application of the P2PDelta- Amplitude algorithm to a previously extracted feature (beat-to-beat systolic pressure). The third chart shows the application of the P2PDelta-Time algorithm to a previously extracted feature (beat-to-beat systolic pressure).

[0017] FIG. 19 shows the application of EM clustering as applied to MIMIC-II data.

[0018] FIG. 20 shows instance information for cluster 0 of FIG. 19.

DETAILED DESCRIPTION

[0019] Provided are systems and methods for aggregating, managing, and/or analyzing data, such as medical data, via a Medical Informatics Compute Cluster (MICC). In certain aspects, the methods include obtaining signal data for a subject from one or more body sensors; storing data associated with the subject; associating respective ones of the signal data with respective ones of the data associated with the subject; and storing at least a portion of the signal data or associations between the signal data and the data associated with the subject. In certain aspects, the methods facilitate creating a retrospective analysis of an event (e.g., an adverse event such as a heart attack, seizure, or hypoglycemic event), detecting the occurrence of an event (e.g., a hypoglycemic event), and/or predicting the future occurrence of an event (e.g., a hypoglycemic event). Systems for use in practicing methods of the invention are also provided. The systems and methods have utility in a variety of clinical and non-clinical applications.

[0020] Before the present invention is described in greater detail, it is to be understood that this invention is not limited to particular embodiments described and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

[0021] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

[0022] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, some potential and exemplary methods and materials may now be described. Any and all publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. It is understood that the present disclosure supersedes any disclosure of an incorporated publication to the extent there is a contradiction.

[0023] It must be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a sensor" includes a plurality of such sensors and reference to "the terminology server" includes reference to one or more terminology servers, and so forth. Further, a reference to "datum" may include "data," and vice versa, unless the context clearly dictates otherwise.

[0024] It is further noted that the claims may be drafted to exclude any element which may be optional. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely", "only" and the like in connection with the recitation of claim elements, or the use of a "negative" limitation.

[0025] The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed. To the extent such publications may set out definitions of a term that conflict with the explicit or implicit definition of the present disclosure, the definition of the present disclosure controls.

[0026] As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible. [0027] As summarized above, aspects of the invention include aggregating, managing, and/or analyzing data, such as medical data, through the use of a MICC. Various examples, subsystems, components (e.g., in hardware and/or software) and applications are described herein; however, the invention is not intended to be limited by these examples.

[0028] Referring first to FIG. 1, an example MICC system 10 is presented that facilitates aggregating, managing, and/or analyzing data. This system 10 includes a Signal Archiving and Computation System (SACS) 12, an Integrated Data Warehouse (IDW) 14, an analysis engine 16 and a data store 18. Briefly, in this embodiment the SACS 12 is configured to collect and/or store signal data from one or more sensors 20 (e.g., a physiological sensor, such as a blood pressure monitor, heart rate monitor, EEG, EKG, etc.). The IDW 14 is configured to store data associated with a subject (e.g. data from a patient's electronic health record, laboratory or pharmacy databases, etc.). The analysis engine 16 is communicatively coupled to the SACS 12 and the IDW 14 and configured to associate signal data collected by the SACS 12 with data associated with the subject obtained via the IDW 14. The data store 18 is communicatively coupled to the analysis engine 18 (and optionally the SACS 12) and configured to store at least a portion of the signal data or associations between the signal data and the data associated with the subject.

[0029] As is apparent from FIG. 1, MICC facilitates the collection, storage, and analysis of a broad range of data. One such type of data is referred to herein as "signal data" and "sensor data." The terms "signal data" and "sensor data" may be used interchangeably herein, and are used to refer to the raw as the raw and/or derived data that is output by one or more biological sensors. Examples of signal data include the raw source voltage as analog to digital values of arterial blood pressure, intracranial pressure, EKG, and EEG, or their physical unit converted equivalents, as well as any other physiological parameter for which automated data collection exists within the clinic such as instantaneous heart rate, respiration rate, or systolic blood pressure. The term is meant to encompass both raw and/or derived data that is output by one biological sensor, as well as data that is output by two or more biological sensors. Where signal data includes data that is output by two or more biological sensors, such data may be combined using approaches described more fully herein, and shall still be considered "signal data." Signal data may include time series notations, as described more fully herein, and such time series information shall also be considered to be part of the signal data. Signal data is collected from a subject.

[0030] In addition to signal data, MICC also facilitates the collection, storage, and analysis of a "clinical data." As used herein, "clinical data" is used broadly and generically to refer to all non-signal data that is known or obtained about a user. Clinical data thus may include data not generated in a clinic or other healthcare setting. Examples of clinical data thus include any data that is stored in nursing, laboratory, and/or pharmacy databases, as well as data that is contained in electronic health records. Further examples of clinical data include image data, such as radiological images, as well as food/diet diaries, calendars, user responses to questions, and the like. Clinical data are often stored via nursing, laboratory, and pharmacy databases as well as electronic medical records, which are sometimes consolidated into an IDR (integrated data repository) or CDW (clinical data warehouse), with images often being stored in a radiology picture archival and communication system (PACS). The term "Existing Databases" may be used herein to describe such existing databases that contain data or information on the subject. This can include, but is not limited to, electronic health records, pharmacy databases, insurance claims databases, calendars, activity logs, etc.

[0031] The terms "Integrated Data Warehouse" or "IDW" may be used interchangeably herein to refer to a data warehouse that is responsible for storing clinical data, such as clinical histories, lab values, pharmacy data, food/diet diaries, etc. Generally, these data are used to provide the context for the sensor data. An IDW may also store instances of the computable phenotype, and provide these instances to the various engines within a MICC system (e.g., signal processing, feature extraction, statistical/machine learning, and rules/CEP).

[0032] Aspects of the systems and methods provided herein relate to data that is generated in a clinical environment, such as a hospital or other healthcare provider. For example, a modern clinical environment may generate a wealth of data for a patient, particularly if the patient is undergoing an invasive procedure or is in an intensive care unit (ICU).

[0033] Aspects of the systems and methods provided herein relate to data that is not generated in a clinical environment. For example, in certain aspects the sensor data may comprise data obtained from sensors within a user's phone (e.g., smartphone). Such sensor data may be obtained for purposes unrelated to clinical monitoring or diagnosis (e.g., drowsiness monitoring, measuring the user's alertness while on the job, and the like).

[0034] In addition to the utility of a MICC as a signal archive, the association of signal data with clinical data can also be used to establish new and/or more accurate clinical phenotypes, including computable phenotypes. As used herein, a "clinical phenotype" is any observable characteristic or trait of a patient. The term is meant to include, for example, morphology, development, behavior, biochemical, and/or physiological properties. The phrase "computable phenotype" is used herein to refer to a clinical phenotype that is formally defined as the combination of attributes, weights, and rules, derived from signal data and/or clinical data, that is expressed in a computable standard format.

[0035] The availability of the full signal data can enable more accurate associations between the signal data and the clinical phenotype. As an example of some current challenges, signal data collection for 45 signals (e.g. 14-channel EEG, 12-lead EKG, arterial blood pressure, and other waveforms) at 125Hz, with two-byte resolution, for one year, requires approximately 350 GB of raw disk storage per bed; for 200 beds, with data replication, at least 70 TB of capacity would be necessary per year. As an example of a more realistic case, approximately 40 GB to 100 GB of raw disk storage may be used per bed per year. Using existing methodologies, associations between the signal data and a clinical phenotype are difficult or impossible.

[0036] In certain aspects, a MICC may facilitate the identification of associations between signal data and clinical phenotypes. Moreover, a MICC may facilitate the creation and/or identification of computable phenotypes, using, for example, high-throughput data storage according to any suitable data storage mechanism(s) available and generally known in the art (e.g., storage provided by the HDF5 technology suite and associated tools such as those used on the BioHDF project, "NoSQL" databases, PhysioNet tools, etc.). The signal data would be associated with clinical data that has been extracted from an integrated data warehouse system, as shown in FIG. 1. Examples of methodologies used to draw associations include biostatistics, machine/statistical learning, and hybrid systems. The data may be temporarily or permanently stored in the IDW. To compute on the signal data and extracted clinical data, a compute cluster may be deployed using various frameworks that facilitate parallelizing computation, such as MapReduce, PVM (parallel virtual machine), MPI (message passing interface), etc. in certain aspects, a MICC may not include parallel computation. The MICC may also accept plug-ins implemented in the above frameworks to support a suite of potential data mining services for informatics or analytics.

[0037] The services provided on the MICC can be utilized by researchers and can leverage the scale of data housed within the MICC signal stores and integrated data warehouses. These MICC services include algorithms for pattern detection and classification within a warehouse of clinical signal data and data extracted from the integrated data repository. Any combination of machine learning, biostatistics and/or data exploration tools may be used to provide insight into challenging clinical questions in a variety of clinical and non-clinical domains. Data mining services provided on the MICC can include, for example, the pattern based interactive diagnosis of disorders, pattern classification applied to electrocardiography, and pattern recognition analysis of coronary artery disease. Data mining services may also include, for example, determining of the cost effectiveness of interventions (e.g., hypertension interventions, health education counseling, and the like), determining of the effectiveness of interventions (e.g. sleep apnea interventions, exercise regiments, and the like), and determining the effectiveness of healthcare providers (e.g., a clinic, physician group, and the like).

[0038] Further, implementation of MICC can focus on collecting large amounts of clinical signal data and establishing new, better, and/or more accurate clinical phenotypes, including computable phenotypes. Once these improved phenotypes have been established, retrospective studies can be conducted that compare the clinical phenotypes present in a patient's electronic medical record against phenotypes generated by MICC using the patient's clinical signal data. The MICC-generated phenotypes can then be evaluated for their accuracy and ability to provide clinical decision support. The services provided on the MICC may enable simple retrospective reviews of clinical signal data by combining them with semantically harmonized data from the IDW via a unified visualization layer.

[0039] In another aspect, once these improved phenotypes have been established, prospective studies can be conducted that compare the clinical phenotypes present in a patient's electronic medical record against phenotypes generated by MICC using the patient's clinical signal data. Such studies may involve predicting the occurrence of a future event (e.g., a hypoglycemic event) or the detection of an event in progress (e.g., arrhythmia). Detection of an event in progress or the likelihood of a future event may trigger one or more notifications (e.g., a text message, display on a dashboard, phone call) so as to alert a user and/or to prevent the occurrence of the event.

[0040] In another aspect, once these improved phenotypes have been established, they may be used by a MICC (e.g., as a plug-in or other module) to predict the occurrence of a future event (e.g., a hypoglycemic event) or the detection of an event in progress (e.g., arrhythmia). Detection of an event in progress or the likelihood of a future event may trigger one or more notifications (e.g., a text message, display on a dashboard, phone call) so as to alert a user and/or to prevent the occurrence of the event.

[0041] Various embodiments of a MICC are provided in further detail below. While some examples of the MICC are provided below in a clinical context, the disclosure is not intended to be limited to such example use cases, and other use cases are possible. For example, a MICC as described herein can be utilized to facilitate various services and can be implemented in office environments, manufacturing environments, health clubs, gyms, weight loss programs, etc., in addition to medical centers, clinics, hospitals, etc. Further, while certain various embodiments below relate to collecting and storing data from medical sensors, any suitable body or biometric sensor that provides output relating to a user and/or health of the user can also be used. Examples of sensors that may be utilized herein include, but are not limited to, multipurpose sensors such as, e.g., the BodyBugg® sensor distributed by Apex Fitness, Inc., pedometers, personal satellite positioning system (SPS) location devices, etc.

SYSTEMS

[0042] The present disclosure provides systems for aggregating, managing, and/or analyzing data, such as medical data. An example of a system architecture that can be employed in connection with a MICC as described herein is now presented; however, other architectures can also be used. Referring again to FIG. 1, this system 10 includes a Signal Archiving and Computation System (SACS) 12, an Integrated Data Warehouse (IDW) 14, an analysis engine 16 and a data store 18. The SACS 12 is configured to collect and/or store signal data from one or more sensors 20. The IDW 14 is configured to store data associated with a user (e.g. clinical data). The analysis engine 16 is communicatively coupled to the SACS 12 and the IDW 14 and configured to associate signal data collected by the SACS 12 with data associated with the user obtained via the IDW 14. The data store 18 is communicatively coupled to the analysis engine 18 (and optionally the SACS 12) and configured to store at least a portion of the signal data or associations between the signal data and the data associated with the user. Each of these various components will now be described in greater detail.

Sensors

[0043] Signal data may be obtained from one or more sensors. Sensors may be referred to herein as "physiological sensors", "body sensors," "biometric sensors," and "biological sensors," with such terms used interchangeably to broadly and generically refer to any sensor that provides output relating to a user and/or health of the user (e.g., a subject, patient, etc.).

[0044] Any of a broad range of sensors may be used in the systems and methods of the present disclosure. As appreciated by one of skill in the art, the number, type, and/or placement of sensors utilized to collect data from a user may vary depending upon at least the condition(s) for which the user is being monitored, the purpose(s) of monitoring the user, and other factors. Sensors of interest include single-purpose sensors, and multi-purpose sensors such as, e.g., the BodyBugg® sensor distributed by Apex Fitness, Inc.

[0045] Examples of sensors that may be utilized herein include, but are not limited to, sensors such as, pedometers, personal satellite positioning system (SPS) location devices, global positioning system (GPS) location devices, EEG (e.g., wet- and/or dry-EEG sensors), MEG, ECG sensors, SSEP sensors, EMG sensors, EKG sensors, ECoG sensors, temperature sensors, spirometers (such as for measuring FVC, FEV1, FEV1 , PEF, FET, MVV, MEF75*, and the like), accelerometers, pulse oximeters, blood glucose sensors (e.g., continuous or non- continuous blood glucose monitoring sensors), thermometer, intracranial pressure sensors, blood pressure sensors (e.g., continuous or non-continuous), and the like.

[0046] Suitable sensors also include, but are not limited to, sensors that measure one or more physiological parameters, such as temperature, blood pressure, pulse, respiratory rate, oxygen saturation, end tidal C0₂, FVC, FEV1, FEV1 , PEF, FET, MVV, MEF75*, location, blood sugar, and the like. In certain aspects, a sensor may measure just one physiological parameter, such as temperature. In other aspects, a single sensor may measure two or more parameters, including 3 or more, e.g. 5 or more, 10 or more, 20 or more, or 30 or more.

[0047] Sensors may be configured to transmit signal data by any convenient means. In certain aspects, a sensor may include a wired connection such that the signal data is transmitted via a physical connection to, e.g., a MICC system. In other aspects, a sensor may be configured to transmit signal data via a wireless connection, such as Wi-Fi, Bluetooth, cellular (e.g., 3G, 4G, LTE and the like) or other wireless protocol.

[0048] In certain aspects, a sensor may be configured to transmit signal data directly to a

MICC. In such embodiments, the sensor may establish a direct connection to the MICC (e.g., physical connection and/or wireless connection). In certain aspects, connection to a MICC may include an authentication and/or security protocol which may identify the sensor and/or user for which the signal data is being collected.

[0049] A sensor may transmit its signal data either directly or indirectly. In certain aspects, a sensor transmits signal data to a MICC indirectly. An example of an indirect connection would be a sensor that includes a Bluetooth module that may be paired with a user' s device (e.g., computer, tablet, phone, smartphone, and the like), which device is used to connect with a MICC. A user's device may be used to aggregate signal data from two or more sensors, which it may transmit to a MICC.

[0050] FIG. 3 presents one example of an indirect connection. In this example, signal data from one or more sensors is transmitted through, or to, a cloud. In certain aspects, data may be transmitted through a cloud (e.g., via a cloud). In other aspects, data may be transmitted to a cloud (e.g., ginger.io, SaaS, etc.). For example, data may be transmitted to a cloud, from where MICC , MICC pulls data into the SACS and/or IDW.

[0051] The cloud may be public or private. Additionally, user-generated data may be transmitted to the cloud. User generated data may take a variety of different forms, such as survey information (e.g., "how severe is your pain today?" or "what did you eat for lunch?), free form text, and the like. The survey information may be entered by a user using any convenient means, such as a phone (e.g., smartphone), computer, tablet and the like. The user generated data and the sensor data is then transmitted from the cloud to a MICC system comprising an IDW (described below). The MICC system may then output (e.g., via display, e-mail, or other type of notification, as described more fully below) information to a user. In an example such as FIG. 3, the sensor(s) and user generated data need not ever connect directly with a MICC. In certain aspects, a sensor may transmit data directly and/or indirectly.

[0052] In certain aspects, signal data may include continuous (e.g., waveform) data.

Also of interest is time-series data. A sensor may be configured to provide signal data as time- series data. Time-series data may involve describing a physiological signal as a set of key- value pairs, where the key is a timestamp and the value is the measurement (e.g., a voltage, concentration, or any other recording by a sensor). Any of a variety of timestamps may be used in describing such time-series data (e.g., a timestamp encoded as the number of milliseconds since the epoch (January 1, 1970), as a timestamp encoded as the number of days from the epoch, and the number of picoseconds of the current day that have elapsed, etc.)

[0053] A sensor may be configured to provide a timestamp, such as by including an internal record of time. In certain aspects, the sensor's timekeeping means may be updated and/or synchronized with one or more other sensors. For example, in certain aspects the timekeeping means of the sensor is calibrated and/or synchronized upon connection of the sensor to a MICC (e.g., each time a sensor connects, upon first connection, upon regular connections, and the like). MICC may provide a sensor with instructions to update its internal timekeeping means to a particular value upon such a connection between the MICC and a sensor. In other aspects, an intermediary device (e.g., a smartphone) may be used to update and/or synchronize one or more other sensors.

[0054] In certain aspects, two or more sensors are used to record signal data from a user.

This includes 3 or more, such as about 4 or more, 5 or more, e.g., about 5 to about 10 sensors, about 10 to about 15 sensors, about 15 to about 20 sensors, about 20 to about 25 sensors, about 25 to about 30 sensors, about 30 to about 35 sensors, about 35 to about 40 sensors, about 40 to about 45 sensors, about 45 to about 50 sensors, about 50 to about 60 sensors, about 60 to about 70 sensors, about 70 to about 80 sensors, about 80 to about 90 sensors, about 90 to about 100 sensors, about 100 to about 125 sensors, about 125 to about 150 sensors, about 150 to about 175 sensors, about 175 to about 200 sensors, about 200 to 250 sensors, about 250 to about 300 sensors, about 300 to about 350 sensors, about 350 to about 400 sensors, about 400 to about 450 sensors, about 450 to about 500 sensors, about 500 to about 750 sensors, about 750 to about 1000 sensors, or about 1000 sensors or more.

[0055] Such sensors may be identical (e.g., exactly the same, but placed at a different position on the user' s body) or different. Identical sensors may be used to provide data redundancy and reduce the likelihood of data loss if, for example, a sensor fails, falls off, or the like. In other aspects, a heterogeneous mix of sensors may be used. In such aspects, the sensors may record the same sensor value (e.g., location, environmental temperature, etc.) or different values.

[0056] A sensor may be used to collect information from an individual subject (e.g., a single patient). A population of subjects (e.g., such as 3 or more, 4 or more, 5 or more, and the like) may each be provided with an identical sensor or a plurality of sensors, with data collected by the MICC for each of the sensors. In certain aspects, one or more sensors may be used to collect information simultaneously from a population of subjects (e.g., 2 or more subjects, such as 3 or more, 4 or more, 5 or more, and the like). As an example, a sensor may be used to measure the ambient temperature and/or chemical composition of an area (e.g., a room) in which one or more subject is present. In such embodiments, the one or more sensors may thus record signal information for a plurality of subjects simultaneously.

Signal Archiving and Computation System (SACS)

[0057] In some implementations of a MICC, computation and archival tasks may be bundled into a single entity. As a result, the MICC may utilize a SACS that is a bundled archival and computation system.

[0058] A non-limiting example of a SACS 12 is provided in FIG. 2, and is described in

Example 1. FIG. 2 presents an overview of the primary components of this implementation of a SACS. In this embodiment, the primary components involve: 1) data storage, 2) metadata storage, 3) MapReduce, and 4) data visualization. These are implemented in this particular, non- limiting example of a SACS using: 1) HBase, 2) a metadata relational database management system (a metadata RDBMS) (e.g., MySQL), 3) Hadoop/MapReduce, and 4) data visualization (e.g., Chronoscope with the Google Web Toolkit). A user may interface with a SACS via the data visualization.

[0059] A SACS can be configured to store (e.g., archive) and/or process signal data.

While the SACS may also provide an environment for the analysis and machine/statistical learning of data when combined with the IDW, this functionality can be implemented separately and interfaced with the SACS. [0060] A SACS may be configured to collect signal data from one or more sensors.

Signal data may be collected and/or stored by the SACS as time-series data. This abstraction may be employed for the design and implementation of the SACS because it provides a common abstraction allowing for the storage of data at all granularities, from hours to minutes to seconds to microseconds. As a result, the SACS may be able to absorb all clinical signal data whether it is waveform data that is sampled at 1 kHz or numeric data sampled once an hour.

[0061] A SACS can store these data in any suitable storage system. Within a non- limiting example storage system, the SACS 12 stores the data in HBase, a distributed column store, such that each row represents a slice in time, identified by a timestamp encoded as the number of milliseconds since the epoch (January 1, 1970). Each column contains the data of a particular signal for a particular patient. For example, one column might store the arterial pressure of patient A, another stores the temperature of patient A, and another might store the arterial pressure of patient B. As a distributed column-store, these data are stored sparsely such that gaps in the data do not take up any storage space. All of the data within HBase are stored as binary data in order to maximize storage efficiency. As a result, the column identifiers are stored as 4-byte integer values in their binary form. The mapping between these identifiers and their human-understandable counterpart is stored in a separate database.

[0062] Stored data can be configured to be easily and quickly recallable, minimizing access latency. For archival purposes, data that do not require low-latency access can be moved to a more efficient storage system. By offloading the long-term storage of data, a primary storage system is kept lean allowing it to continue to serve low-latency requests.

[0063] While the clinical signal data itself is stored within a database (e.g., HBase with the MapReduce environment, etc.), the signal metadata may be stored in a separate database instance. In the SACS 12, this may be a database that stores metadata, such as MongoDB, a relational database, a non-relational database, and the like). In an embodiment, the metadata resides in a document-oriented database. However, this could also be implemented in a traditional relational database.

[0064] The SACS may integrate a signal processing system allowing for in-situ processing of the data. This is desirable considering the large anticipated amounts of data (e.g., upwards of 50 TB per year of clinical signal data for a 200-bed medical center). With increasing amounts of data, an in-situ processing environment negates the need to marshal data between the storage platform and the processing/analysis platform.

[0065] The processing environment may be implemented within the Hadoop platform and/or other suitable platforms and can leverage the MapReduce programming paradigm. A SACS could also, or instead, integrate other processing platforms such as R, Matlab, or other home-grown systems, though such integration may prevent in-situ processing of the data. The SACS may process the data and store the results of the processing back into the storage system. By storing the output of processing in the same storage system, the derived data is immediately available to subsequent processing tasks. This facilitates the chaining of multiple processing steps easily and efficiently.

[0066] As processing algorithms are run on input data, the results may be stored back to the storage layer (e.g., HBase) with the corresponding metadata stored back to the Metadata RDBMS (e.g., Mongo and/or another suitable database platform). This metadata may be used to ensure that the HBase setup is as efficient as possible. Nearly all of the data in HBase may be stored in a binary format so it is not easily consumed by humans. The metadata stored in the RDBMS may provide the bridge between the binary storage and the human-consumable data. Currently, the metadata consists of the name and description of the algorithm, the numeric identifier that is used as the column header in HBase, as well as any other parameters of the algorithm (e.g. sliding window size, thresholds, etc.).

Integrated Data Warehouse (IDW)

[0067] An IDW stores some or all of the clinical data that is associated with a patient.

Examples of this data include data from the electronic health record (EHR), data from laboratory and pharmacy databases, or data from the nursing documentation system. While these data can be collected in a centralized manner in any sort of relational database, the IDW stores the data in such a way that it is easily computable (e.g. in a standard format, and/or as semantically harmonized data). The IDW can provide a means for bringing in data in a semantically interoperable manner and providing access to said data via a SOAP- or REST-based interface.

[0068] The IDW is utilized to analyze the clinical signal data in the context of the clinical data. For instance, the IDW can enable larger scale studies to take advantage of the data already available in the patient's clinical record. As with the in-situ data processing, the IDW can make the clinical data available and semantically meaningful.

[0069] IDW information may come from many different sources. Accordingly, semantic harmonization and/or normalization of the data may be useful or required to make the data useful. Combining two or more such types of information may be useful for establishing a more comprehensive picture of the overall health of patients than may be accomplished using only one source of information. For example, a term such as "MS," when used within a clinical finding, may have many different meanings. If the finding were a cardiology finding, one may conclude that MS stands for "mitral stenosis." If the finding were a finding related to

Anesthesia, one may assume that MS stands for "morphine sulfate." There are likely hundreds or more of such domain specific interpretations of clinical data that exist. Additionally, multiple possible biological pathways or forms of environmental stress may cause the exact same "clinical phenotype." For example, an ITP patient may have low platelet counts due to Graves' Disease, or due to exposure to Helicobacter pylori bacteria and can sometimes be originally detected following abnormal serum liver tests and Graves patients often have liver disease. If the original clinical findings are on the topic of Graves disease or a bacterial infectious disease then within which domain should the term "ΓΓΡ" be later interpreted? The term "ITP" does not necessarily mean the same thing within these clinical domains as it sometimes refers to "Inosine triphosphate" which is associated with gene defects leading to SAE's after liver transplants. Accordingly, by aggregating multiple sources of information about the patient, in a semantically harmonized manner, the nature of clinical information may be made easier to interpret.

[0070] A "Data Processing Engine" is used herein to describe an engine that is responsible for traditional signal processing of the sensor data as well as normalization of the data (with and/or without context from the IDW). In certain aspects, data from existing databases may be normalized and/or semantically harmonized by a Data Processing Engine before loading, or after loading, into a IDW by applying biomedical terminologies and/or ontologies. Such biomedical terminologies or ontologies may be obtained from a terminology server. A "terminology server," as used herein, is used broadly and generically to describe a computer-accessible resource which provides standardized, machine-readable terminologies and ontologies. In accord with this definition, a terminology server may, for example, be a computer physically connected to a database, a computer connected to a local network, a computer connected to a proprietary network, or a computer to which one may interface via a web portal. Any convenient means of accessing the information on the terminology server may be employed. As appreciated by one of skill in the art, the particular means of connecting to a terminology server (e.g. through an application programming interface (API)) will be dictated by the particular terminology server employed. For example, if the terminology server employed is NCBO BioPortal, the means of connecting to the terminology server may include using an API based on BioPortal REST services.

[0071] The data may be normalized using the terminology server via any convenient means known in the art. Such means may include methods described by, for example, AL Rector, et al, Methods Inf Med. 1995 Mar; 34(l-2):147-57; CG Chute, et al, Proc AMIA Symp. 1999:42-6; AM Hogarth, et al, AMIA Annu Symp Proc. 2003; 2003: 861; and PL Whetzel, et al, Nucleic Acids Res. 2011 Jul;39(Web Server issue):W541-5. Epub 2011 Jun 14; the disclosures of which are incorporated herein by reference.

[0072] The normalized data may then be translated by processing with an ontology mapping engine, using the biomedical terminologies and ontologies obtained from the terminology server. A general description of ontology mapping is contained in, for example, Y. Kalfoglou and M. Schorlemmer. The Knowledge Engineering Review Journal (KER), (18(1)): 1- 31, 2003; and Wynden R, et al. Ontology Mapping and Data Discovery for the Translational Investigator. AMIA CRI Summit 2010; the disclosures of which are incorporated herein by reference. Any convenient ontology mapping engine may be employed (e.g. Snoggle, Health Ontology Mapper (HOM), X-SOM, and the like). For example, clinical notes (pathology and radiology findings) must first be translated into clean lexicons from all possible clinical domains of interest. The same note is translated into clean lexicons specific to radiology, pathology, ob/gyn, orthopedics, and the like, then later in the mapping process the billing and clinical encounter data is examined to determine which specific areas of interest are relevant. In this example, maps may normalize lab data, medications, procedures, etc. and by building on each other these mapped results can be used to map data into a computable description of the patient encounter and may facilitate development of a computable state of the patient's health at that time. An Example implementation of an ontology mapping engine in a MICC system is provided in Example 3.

Analysis Engine

[0073] As described above and in FIG. 1, an analysis engine 16 may be

communicatively coupled to the SACS 12 and the IDW 14 and configured to associate signal data collected by the SACS 12 with data associated with the user obtained via the IDW 14. As used herein, an "engine" is one or more software modules responsible for carrying out a task. An analysis engine may include a dedicated computer or cluster of computers, distinct from the SACS, IDW, or other hardware of a MICC. In certain aspects, an analysis engine uses hardware that is also used in whole or in part for SACS, IDW, or other components of a MICC. For example, in certain aspects an analysis engine may utilize all or part of the nodes of a SACS cluster to perform one or more analyses.

[0074] An analysis engine may perform feature extraction on signal and/or clinical data.

That is, an analysis engine may comprise a feature extraction engine. A "Feature Extraction Engine" is used herein to describe a processor that produces feature series using algorithms applied to the sensor data. The algorithms may be specific to a particular use or may be more general algorithms (such as mathematical algorithms). When necessary, data from the IDW is fed into the Feature Extraction Engine to provide contextual information for the feature extraction algorithms. This includes both non-sensor as well as phenotype data and information. Formally, this engine reduces the input data down to only non-redundant and relevant content. Sometimes, this can also be referred to as automated annotation.

[0075] The term "feature extraction" is meant to broadly encompass the application of one or more algorithms that operate on the input signal and return a derived set of data or a derived signal. The term "feature series" is used to refer to the derived set of data or derived signal that is obtained from feature extraction. Examples of feature extraction algorithms include, but are not limited to, physiologic algorithms such as systolic peak detection from the arterial pressure waveform or RR-interval extraction from the ECG, and mathematical algorithms such as the rolling mean or permutation entropy. However, as taken from computer science and machine learning, feature extraction is a specific type of dimensionality reduction. Generally speaking, the raw data itself is difficult to process and use from a computational standpoint - there are increasingly large amounts of data and most of it is not directly useful. But, through feature extraction (or dimensionality reduction), it is possible to transform and distill the data into a smaller set that is more pertinent and useful to the question being asked.

[0076] An analysis engine may perform change detection on a feature series. The term

"change detection" is used broadly to refer to the application of one or more algorithms to detect one or more changes in a feature series, and is an example of a specific class of feature extraction. Examples of change detection algorithms of interest include, for example, change- point detection algorithms (e.g., Difference-of-Means algorithm and the like). As a subset of feature extraction algorithms, the change detection algorithms return derived sets of data. The term "change series" is used to refer to the derived set of data that is obtained from change detection. The change series may include the same size or a smaller set of data when compared to the input data. Change series may contain additional parameters quantifying the change.

[0077] An analysis engine may be used to perform learning (e.g., supervised and/or unsupervised machine learning, and/or statistical learning) on data. An analysis engine in a MICC system supports several approaches to the analysis of signal data, using both supervised and unsupervised machine and statistical learning in order to derive relationships between the clinical signal data and the status of the patient. That is, an analysis engine may comprise a statistical/machine learning engine. A "Statistical/Machine Learning Engine" is an engine that combines the extracted features from the time series data with data from the IDW to develop computable phenotypes or models of conditions of interest. The term "machine learning system" is used broadly and generically to refer to a system that learns to recognize complex patterns in empirical input data and then makes intelligent decisions based future empirical input data. Accordingly, a "machine learning algorithm" is used to encompass an algorithm, such as a supervised and/or unsupervised machine learning, and/or statistical learning algorithm, that may be used to recognize complex patterns in empirical data input. An analysis may comprise an inference engine. An "Inference engine" is used broadly and generically herein to refer to an artificial intelligence (AI) computer program that attempts to derive answers from a knowledge base and a set of input parameters. This may also be referred to herein as "Expert System," with the terms used interchangeably.

[0078] Several approaches can be used to analyze the signal data relative to the clinical data. Examples of machine learning algorithms of interest include, but are not limited to, AODE; artificial neural network; backpropagation; Bayesian statistics; Naive Bayes classifier; Bayesian network; Bayesian knowledge base; Case -based reasoning; Decision trees; Inductive logic programming; Gaussian process regression; Learning Vector Quantization; Instance-based learning; Nearest Neighbor Algorithm; Analogical modeling; Probably approximately correct learning (PAC) learning; Symbolic machine learning algorithms; Subsymbolic machine learning algorithms; Support vector machines; Random Forests; Ensembles of classifiers; Regression analysis; Information fuzzy networks (IFN); Linear classifiers; Fisher's linear discriminant; Logistic regression; Quadratic classifiers; k-nearest neighbor; C4.5; Hidden Markov models; Data clustering; Expectation-maximization algorithm; Self-organizing maps; Radial basis function network; Vector Quantization; Generative topographic map; A priori algorithm; Eclat algorithm; FP-growth algorithm; Hierarchical clustering; Single-linkage clustering; Conceptual clustering; Partitional clustering; K-means algorithm; Fuzzy clustering, dynamic Bayesian networks; and the like.

[0079] In certain aspects, one or more machine learning algorithms may be obtained from a machine learning library, such as Apache Mahout. The machine learning algorithm(s) may be modified or optimized so as to be run on multiple nodes simultaneously, such as on a cluster (e.g., a compute cluster or a cluster of nodes in a SACS implementation). In certain aspects, the machine learning algorithm(s) are optimized to be implemented on top of an Apache Hadoop implementation. In other aspects, the machine learning algorithm(s) may be optimized to be implemented in a non-parallel environment, such as those provided by Weka, R, and the like. [0080] A MICC system is not limited to just these approaches. In an embodiment the

MICC system is implemented in Java and designed to be extensible, allowing advanced users the ability to create the functionality they may require.

[0081] In certain aspects, one machine learning algorithm is applied to data of a SACS and/or IDW. In other aspects, 2 or more different machine learning algorithms may be applied, e.g., about 2 or more, including 3 or more, e.g., 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, about 10 or more, such as about 10 to 20 or more, about 20 to 30 or more. Where multiple machine learning algorithms are applied, the output(s) or prediction(s) from the machine learning algorithms may themselves be fed in to one or more machine learning algorithms. Further, the outputs or predictions from a plurality of machine learning algorithms may be combined. In certain aspects, simple voting procedures may be used (e.g., consensus), while in other aspects individual predictions are giving varying weights.

[0082] FIGs. 7-8 show the processes of feature extraction, change detection, and machine learning by an analysis engine. Here, information from one or more sensors is collected as signal data by a SACS (not shown). The analysis engine performs feature extraction to produce a feature series. The analysis engine then performs change detection on the feature series to produce a change series. In FIG. 7, this change series is used in one or more machine learning approaches by the analysis engine, combining data obtained from the IDW, to validate whether an alarm is valid. In FIG. 8, an analysis engine is shown where the IDW contains a computable phenotype. In this instance, the computable phenotype from the IDW is applied to the change series to detect a seizure in the subject.

Data Store

[0083] As depicted in FIG. 1 , a data store may be communicatively coupled to the analysis engine 18 (and optionally the SACS 12) and configured to store at least a portion of the signal data or associations between the signal data and the data associated with the user.

[0084] In certain aspects, a data store may consist of one database (e.g., a relational database, distributed column-store, etc.). In other aspects, a data store may instead refer to two or more databases. Any convenient database and/or data structure may be used to store at least a portion of the signal data or associations between the signal data and the data associated with the user. Additional Components

[0085] In addition to the components, subsystems, and engines described above, a MICC may further include one or more additional components, such as various input/output devices, and the like.

[0086] For example, a MICC may include a variety of network switches, firewalls, and servers. Such servers may be, for instance, outbound messaging servers (e.g., running on Tomcat, GlassFish, etc.), web servers (e.g., dashboard web servers running on Tomcat); data ingestion/buffer servers (e.g., sFTP, servers running on Tomcat); job, analytic, and/or workflow management servers (e.g., Oozie, Hue, servers running on Tomcat); and the like.

[0087] Specific components of interest include, but are not limited to, the following:

Support Server Cluster (SSC)

[0088] In order to provide additional services and access to the MICC system, a cluster of physical or virtual servers are managed as part of each subsystem. These servers provide the infrastructure for operating and managing MICC. This includes providing basic network services (e.g., FTP/web servers, firewalls and intrusion detection systems for enhancing network security, domain name servers, and routers) as well as providing more technical services. For example, web-based data visualization and interaction tools or connectors to the electronic medical record may be hosted within the SSC.

Dashboard

[0089] As biomedical engineers create new biomedical sensors, the amount of data being generated in the clinical environment has rapidly increased. For example, in the typical ICU, a patient is connected to numerous sensors; in order to effectively process all of this information, a clinician must spend valuable time sorting through all of the data, manually seeking the events of interest. As the amount of data increases, this manual process is error-prone and needlessly time consuming.

[0090] Accordingly, a "clinical dashboard" may be utilized by MICC, where information derived from clinical signal data can be combined with information derived from other clinical data to provide the clinician with a high-level view of the patient's clinical status. Details can be immediately available to the clinician through a "zoom" feature, but only if desired by the clinician. This allows the clinician to focus on the areas of interest without being distracted by large amounts of unnecessary or unrelated data. The dashboard combines information and data from the various subsystems described above. Signal data are pulled from the SACS and visually combined with data from the IDW. Clinical data from the IDW are overlaid on top of the signal data as annotations that are coded by color and shape. This allows the user to easily interpret the signal data in the context of the patient's other data. The dashboard can also be rearranged and group the signals such that the display can be customized to a particular patient, disease, or user.

[0091] In certain aspects, the dashboard may be deployed as a server-based application with a thin client. This centralizes the security of the system and the only data actually sent to the client machines are a series of images. None of the actual patient data is ever sent directly to the client. This enables the use of mobile devices such as smart phones and tablets with negligibly increased security risk. The dashboard can be implemented as a web application (e.g., Java and JavaScript-based, etc.) that is deployed within the Support Server Cluster. The dashboard can be deployed as a tiered set of applications tailored to classes of client devices. For instance, high-end, dedicated workstations may have a rich feature set while more limited devices such as smartphones may have a more limited feature set.

[0092] Sample dashboard screens that may be rendered by the MICC as described herein are illustrated by FIGs. 9-14, respectively.

Communications Module

[0093] In certain aspects, a MICC system may include a communications module so as to "push" information to a user. As used herein, a "user" is anyone or anything that uses the data/information generated by the MICC system, and/or from whom sensor data is collected (e.g., a subject, patient, etc.). This includes, but is not limited to, other software programs, systems, clinicians, patients, caregivers, family members, employers, payers, and hospital administrators. In certain aspects, information may be sent to a user using any number of communication protocols such as SMS/text message, email, Facebook message, automated phone call, alpha(numeric) page, and the like. Any of a variety of communications modules known in the art may be employed herein.

Implementation Architecture

[0094] FIG 4. presents a non-limiting example a MICC implementation. In this particular example, a plurality of sensors may collect information for each of a plurality of subjects. The sensors may include, for example, those collected by a bedside monitor, by mobile sensors, and/or by home-based sensors. These sensors may be the same for all subjects, or may be different for different subjects. Additionally, certain or all subjects may also generate user generated data (e.g., in response to surveys and the like).

[0095] The sensor data and user generated data may be transmitted by wired and/or wireless transmission to a public and/or private cloud. The cloud may then transmit the information to the particular MICC implementation. User generated data is stored in the IDW, which also includes information obtained from existing databases (e.g., EHR, labs, and the like). The IDW also stores computable phenotypes, and non-sensor data for the subjects. The cloud transmits the sensor data to a database (e.g., as part of a SACS implementation).

[0096] Data from the sensor data and non-sensor data may be processed by a Data

Processing Engine. The information may also be processed by a Feature Extraction Engine, which produces a feature series. The feature series, sensor data, and other data from the IDW may be processed by a Rule/CEP engine. A "Rule/CEP Engine" is an engine responsible for executing sets of "rules" that are part of the computable phenotype to trigger actions. These actions may include, but are not limited to, updating data in the IDW, updating the computable phenotype, or pushing information to a user through a dashboard or other communication medium. Accordingly, the Rule/CEP Engine may in certain aspects push an output (e.g., notification) to a user. The phrase "Complex Event Processing," also referred to herein as "CEP," is used to refer to the processing of many events happening across all the layers of an organization or a plurality of organizations or entities, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.

[0097] FIG. 5 presents a network diagram of a non- limiting example of a MICC implementation. In this implementation, SACS includes a plurality of slave nodes ("Hadoop Slave N"), as well as a JobTracker and NameNode. The SACS also includes a network switch. The IDW in this implementation includes several databases (e.g., MongoDB, MonetDB).

Search is provided using Solr. Ontology mapping is provided with a HOM instance. A network switch is used to facilitate communication among the components of the IDW.

[0098] The network switches of the SACS and of the IDW are in communication with a firewall and/or router. The firewall/router is further in communication with a rule engine responsible for executing sets of rules to trigger actions (e.g., update the dashboard, update data in the IDW). The firewall is also in communication with an outbound messaging server, and a dashboard, which comprises a dashboard web server, job/analy tic/workflow management server, and a data ingestion/buffer server. The components of the MICC are further housed behind a firewall, which connects to the internet. Here, the internet may comprise a cloud (e.g., a public and/or private cloud), or the internet generally. [0099] Further implementations of MICC are possible, including implementations where there is an existing infrastructure. For instance, FIG. 6 presents an example workflow of how a MICC may be integrated and/or used within an existing infrastructure.

[00100] In certain embodiments, one MICC system may be implemented to serve a population (e.g., a physician practice, insurance group, and the like). In certain settings, a plurality of MICC systems may be implemented. These systems may be identical or substantially identical.

[00101] For example, in certain aspects, the systems provided herein are designed to facilitate the discovery, clinical trial, and deployment of decision support technology based on clinical signal data. This may be achieved by, for instance, deploying multiple MICC systems. In such instances, each MICC system may contain data that is identical to the other system(s). For instance, services developed within the Retrospective Research Cohort System can be promoted to the Clinical Study Subsystem for clinical trials; after successful trials, the same service can be promoted to the Decision Support Advisor Subsystem for deployment in the clinical setting. These various implementations are described below.

Retrospective Research Cohort System (RRCS)

[00102] The RRCS is a MICC implementation designed to conduct research and retrospective studies on the potential efficacy of MICC services for improved clinical decision support.

[00103] The RRCS is not a clinical operational system, but instead can be used to conduct informatics or clinical research on decision support; it provides a platform for the initial discovery of associations between signal data and the clinical phenotype. Retrospective studies conducted on the RRCS on MICC services that show promise for clinical improvement can then be promoted to other MICC subsystems as appropriate, following adequate funding and IRB approval. The RRCS does not need to be housed on the same subnet as the other MICC subsystems.

Clinical Study System ( CSS)

[00104] The CSS is a clinical operational subsystem that enables the conduct of clinical trials on the safety and efficacy of MICC services. Upon published proof of safety and efficacy, MICC services can be introduced into regular clinical decision support. Decision Support Advisor System (DSAS)

[00105] The DSAS is a clinical operation subsystem. It is used to house robust MICC services on which proof of safety and efficacy have been provided following clinical trials. The DSAS is used to alert users (e.g., a physician, caregiver, etc.) about potential problems detected by MICC regarding patient health. The alerts are also appended to the patient's chart and made available to researchers to be considered at patient treatment review boards. The DSAS is not configured to govern any clinical treatment; in contrast, the DSAS is a "copiloting" system that alerts trained physicians about potential problems in patient health that may otherwise be very difficult to detect. The final decision about the proper treatment of the patient will remain the responsibility of the physician.

INFORMATION MODEL/INFORMATION ARCHITECTURE

[00106] An example information flow that can be employed in the context of a MICC as described herein provided as follows:

1. Clinical signal data from biomedical sensors connected to the patient (or animals).

2. Storage of clinical signal data from the entire institution without losing any data. Clinical signal data is defined as raw waveform data as well as data from feature extraction algorithms performed at the point of data collection. An example is the SEDline Patient Sleep Index, which is derived from the associated EEG waveforms.

3. Feature extraction on a parallelized architecture to support real-time and/or scalable processing.

4. Extracted features are stored as annotations.

5. Consolidation of clinical signals, annotations, and detected phenotype on a single, realtime dashboard deployed to the clinical environment.

6. Machine learning on a parallelized architecture to support computable phenotype generation.

7. Complex event processing (CEP) system that facilitates the deployment of the computable phenotype to provide real-time clinical decision support.

8. Association of signal data with clinical data (e.g. the electronic health record, pharmacy and laboratory databases, nursing databases, etc.).

9. Implementation of the computable phenotype for decision support as a set of rules as well as the output of the machine learning. 10. Plug-ins that implement specific clinical use cases. Examples of such plug-ins include automated neonatal seizure detection, false arrhythmia detection, neuromonitoring for intraoperative nerve compression, sleep/sedation index, etc.

Clinical Plugins

[00107] MICC is designed to be a platform on which plugins can be developed, tested, and deployed. Plugins may be use-case centric. For example, a plugin may be developed for specific use cases on the RRCS, then moved through the CSS and DSAS for clinical trials and production deployment, respectively.

[00108] As a development platform, MICC provides an environment to facilitate clinical research rooted in the clinical signal data. The end product of such research would be a plugin that can be deployed as a clinical decision support tool. As a result, the MICC system is also designed to facilitate the clinical trial of a developed algorithm without any retooling of the plugin. Once the plugin has been found to be safe and effective, it can then be deployed on a production instance of the MICC platform (the Decision Support Advisor System). The components of the plugin include the algorithms with which to process the data, parameters of the algorithms, training data from machine or statistical learning algorithms, as well as rules to manage the application of the various algorithms or to provide expert system functionality. The plugin also contains the components of the dashboard that provide the visualization layer for the plugin. The plugin architecture allows for the deployment of a centralized MICC system and subsequent deployment of plugins in a modular manner enabling different sites the ability to purchase and deploy only those plugins that benefit their local patient population.

Computable Phenotype

[00109] Once the data have been processed, the next step may be to conduct informatics- driven clinical research to generate improved clinical phenotypes. By focusing on clinical signal data, clinicians can take a data-driven approach to classifying patient clinical states or conditions. An example of an improved clinical phenotype can be found when looking at blood pressure. Currently, a patient's blood pressure can be hypertensive, hypotensive, or

normotensive, simple thresholds that are relative to established standards. However, studies have shown that the "dosage" of blood pressure can be an important indicator of clinical status and future outcome. In the typical clinical environment, however, the blood pressure monitors are not currently designed to measure the "dosage" of a patient's blood pressure. [00110] By combining the clinical signal data in MICC with the clinical data in the IDW, improved techniques of describing a patient's clinical status are generated using combinations of machine and statistical learning algorithms, signal processing, and statistics.

Agent-based Modeling

[00111] MICC can be used to identify associations between patient physiology data and a complex assortment of patient data including all contents of the IDW with a full record of clinical EMR data, clinical lab data, biomarker data, procedure and risk adjusted diagnosis as well as information regarding the discharge disposition and patient encounter. As such, a computable phenotype is defined herein as a patient level concept instead of only a description of biochemical states. While complex associations of that sort are useful to researchers, tools that can assist researchers with the interpretation of those associations are desirable. To assist with that task, MICC includes an embedded agent-based modeling environment. The agent based modeling subsystem is used to discover and characterize multiple hypotheses that potentially can be used to explain the basis for associations found with the MICC in patient data. After development, using agent-based models that can be verified to predict future patient phenotypes based on a current patient data state, the MICC can predict the likely future phenotype of a patient. These agent-based predictions are communicated back by the system as an additional means of predicting or determining the patient phenotype.

[00112] Referring next to FIG. 15, a process 100 of managing user data includes the stages shown. The process 100 is, however, an example only and not limiting. The process 100 can be altered, e.g., by having stages added, removed, rearranged, combined, and/or performed concurrently. Still other alterations to the process 100 as shown and described are possible.

[00113] At stage 102, signal data are collected from one or more sensors. At stage 104, data associated with a user are stored. At stage 106, respective ones of the signal data are associated with respective ones of the data associated with the user. At stage 108, at least a portion of the signal data or associations between the signal data and the data associated with the user are stored at a data store.

[00114] It is noted that at least some implementations have been described as a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram.

Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, a program, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.

[00115] Moreover, embodiments may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium such as a storage medium or other storage(s). A processor may perform the necessary tasks. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.

[00116] Systems of the present disclosure may include a plurality of SACS, IDWs, processors, and the like. Any convenient number of SACS, IDWs, processors, and the like are contemplated. For example, a system may include 2, 3, 4, 5, 6, 7, 8, 9, 10 or more IDWs.

Likewise, a system may include 2, 3, 4, 5, 6, 7, 8, 9, 10 or more processors. In some instances, the system may be a distributed grid system.

[00117] The methods or algorithms described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executable by a processor or a plurality of processors, or in a combination of both, in the form of processing unit, programming instructions, or other directions, and may be contained in a single device or distributed across multiple devices. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.

Suitable Subjects

[00118] The subject methods and systems may be used to monitor, analyze, and/or treat a variety of subjects. In many embodiments the subjects are "mammals" or "mammalian", where these terms are used broadly to describe organisms which are within the class mammalia, including the orders carnivore (e.g., dogs and cats), rodentia (e.g., mice, guinea pigs, and rats), and primates (e.g., humans, chimpanzees, and monkeys). In many embodiments, the subjects are humans. The subject methods may be applied to human subjects of both genders and at any stage of development (i.e., fetal, neonates, infant, juvenile, adolescent, adult), where in certain embodiments the human subject is a juvenile, adolescent or adult. While the present invention may be applied to a human subject, it is to be understood that the subject methods may also be carried-out on other animal subjects (that is, in "non-human subjects") such as, but not limited to, birds, mice, rats, dogs, cats, livestock and horses.

[00119] Moreover, suitable subjects of this invention include those who have and those who have not previously been afflicted with a condition, those that have previously been determined to be at risk of suffering from a condition, and those who have been initially diagnosed or identified as being afflicted with or experiencing a condition.

[00120] Systems and methods of the instant disclosure may be used on one or more subjects simultaneously. For example, in certain aspects a MICC may collect and/or analyze data from 2 or more subjects, including 10 or more, e.g., 3 or more, 5 or more, 10 or more, such as about 10 to 20 subjects, about 20 to 30 subjects, about 30 to 40 subjects, about 40 to 50 subjects, about 50 to 60 subjects, about 60 to 70 subjects, about 80 to 90 subjects, about 90 to 100 subjects, about 100 to 125 subjects, about 125 to 150 subjects, about 150 to 175 subjects, about 175 to 200 subjects, about 200 to 250 subjects, about 250 to 300 subjects, about 300 subjects, about 300 to 350 subjects, about 350 to 400 subjects, about 450 to 500 subjects, about 500 to 600 subjects, about 600 to 700 subjects, about 700 to 800 subjects, about 800 to 900 subjects, about 900 to 1000 subjects, about 1000 to 2000 subjects, about 2000 to 3000 subjects, about 3000 to 4000 subjects, about 4000 to 5000 subjects, about 5000 to 7500 subjects, about 7500 to 10,000 subjects, about 10,000 to 20,000 subjects, about 20,000 to 30,000 subjects, about 30,000 to 40,000 subjects, about 40,000 to 50,000 subjects, about 50,000 to 75,000 subjects, about 75,000 to 100,000 subjects, about 100,000 to 250,000 subjects, about 250,000 subjects, about 250,000 to 500,000 subjects, about 500,000 to 750,00 subjects, about 750,000 to

1,000,000 subjects, or about 1,000,000 subjects or more.

[00121] Subjects may be of the same general type (e.g., all rodents) or heterogeneous

(e.g., rodents and humans). In certain aspects, all or a substantial fraction of subjects have one or more characteristics in common, such as age, geographic location, sex, common experience (e.g., former or current military service), and the like.

[00122] Further, the subject systems and methods may find use in the treatment, monitoring, analysis, or prevention of a variety of different conditions including, but not limited to, cardiovascular conditions including cardiovascular disease, e.g., atherosclerosis, coronary artery disease, hypertension, hyperlipidemia, eclampsia, pre-eclampsia, cardiomyopathy, volume retention, congestive heart failure, QT interval prolongation, aortic dissection, aortic aneurysm, arterial aneurysm, arterial vasospasm, myocardial infarction, reperfusion syndrome, ischemia, sudden adult death syndrome, arrhythmia, fatal arrythmias, coronary syndromes, coronary vasospasm, sick sinus syndrome, bradycardia, tachycardia, thromboembolic disease, deep vein thrombosis, coagulopathy, disseminated intravascular coagulation ("DIC"), mesenteric ischemia, syncope, venous thrombosis, arterial thrombosis, malignant hypertension, secondary

hypertension, primary pulmonary hypertension, secondary pulmonary hypertension, Raynaud's, paroxysmal supraventricular tachycardia, atrial fibrilation, and the like; neurodegenerative conditions including neurodegenerative diseases, e.g., Alzheimer's Disease, Pick's Disease, Parkinson's Disease, dementia, delirium, amyotrophic lateral sclerosis, and the like;

neuroinflammatory conditions including neuroinflammatory diseases, e.g., viral meningitis, viral encephalitis, fungal meningitis, fungal encephalitis, multiple sclerosis, Charcot joint, schizophrenia, myasthenia gravis, and the like; orthopedic inflammatory conditions including orthopedic inflammatory diseases, e.g., osteoarthritis, inflammatory arthritis, regional idiopathic osteoporosis, reflex sympathetic dystrophy, Paget' s disease, osteoporosis, antigen- induced arthritis, juvenile chronic arthritis, and the like; lymphoproliferative conditions including lymphoproliferative diseases, e.g., lymphoma, lymphoproliferative disease, Hodgkin's disease, inflammatory pseudomotor of the liver, and the like; autoimmune conditions including automimmune diseases, e.g., Graves disease, Raynaud's, Hashimoto's, Takayasu's disease, Kawasaki's diseases, arteritis, scleroderma, CREST syndrome, allergies, dermatitis, Henoch- schlonlein purpura, goodpasture syndrome, autoimmune thyroiditis, myasthenia gravis, Reiter's disease, lupus, and the like; inflammatory conditions, e.g., acute respiratory distress syndrome ("ARDS"), multiple sclerosis, rheumatoid arthritis, juvenile rheumatoid arthritis, juvenile chronic arthritis, migraines, chronic headaches, and the like; infectious diseases, e.g., sepsis, viral and fungal infections, diseases of wound healing, wound healing, tuberculosis, infection, AIDS, human immunodeficiency virus, and the like; pulmonary conditions including pulmonary diseases, e.g., tachypnea, fibrotic lung diseases such as cystic fibrosis and the like, interstitial lung disease, desquamative interstitial pneumonitis, non-specific interstitial pneumonitis, lymphocytic interstitial pneumonitis, usual interstitial pneumonitis, idiopathic pulmonary fibrosis, pulmonary edema, aspiration, asphyxiation, pneumothorax, right-to-left shunts, left-to- right shunts, respiratory failure, and the like; transplant-related conditions such as transplant related side effects such as transplant rejection, transplant-related tachycardia, transplant related renal failure, transplant related bowel dysmotility, transplant-related hyperreninemia, and the like; gastrointestinal conditions including gastrointestinal diseases, e.g., hepatitis, xerostomia, bowel mobility, peptic ulcer disease, constipation, ileus, irritable bowel syndrome, postoperative bowel dysmotility, inflammatory bowel disease, typhilitis, cholelethiasis, cholestasis, fecal incontinence, cyclic vomiting syndrome, and the like; endocrine conditions including endocrine diseases, e.g., hypothyroidism, hyperglycemia, diabetes, obesity, syndrome X, insulin resistance, polycycstic ovarian syndrome ("PCOS"), and the like; genitourinary conditions including genitourinary diseases, e.g., bladder dysfunction, renal failure, hyperreninemia, hepatorenal syndrome, pulmonary renal syndrome, incontinence, arousal disorder, menopausal mood disorder, premenstrual mood disorder, renal tubular acidosis, pulmonary renal syndrome, and the like; skin conditions including skin diseases, e.g., wrinkles, cutaneous vasculitis, psoriasis, and the like; aging associated conditions including aging associated diseases, e.g., Shy Dragers, multi-system atrophy, age related inflammation conditions, cancer, aging, and the like; neurologic conditions including neurologic diseases such as epilepsy, depression, schizophrenia, seizures, stroke, insomnia, cerebral vascular accident, transient ischemic attacks, stress, bipolar disorder, concussions, post-concussive syndrome, cerebral vascular vasospasm, central sleep apnea, obstructive sleep apnea, sleep disorders, headaches incuding chronic headaches, migraines, acute disseminated encephalomyelitis ("ADEM"), and the like; Th-2 dominant conditions including Th-2 dominant diseases, e.g., typhilitis, osteoporosis, lymphoma, myasthenia gravis, lupus, and the like; conditions, including diseases, that cause hypoxia, hypercarbia, hypercapnia, acidosis, acidemia, e.g., ventilation/perfusion (V/Q) mismatch, Chronic Obstructive Pulmonary Disease ("COPD"), emphysema, any chronic lung disease that causes acidosis, acute pulmonary embolism, sudden adult death syndrome ("SADS"), chronic pulmonary embolism, pleural effusion, cardiogenic pulmonary edema, non-cardiogenic pulmonary edema, acute respiratory distress syndrome (ARDS), neurogenic edema,

hypercapnia, acidemia, asthma, renal tubular acidosis, asthma, acidosis, chronic lung diseases that cause hypoxia, hypercarbia or hypercapnia, and the like; OB-GYN conditions including OB-GYN diseases, e.g., amniotic fluid embolism, menopausal mood disorders, premenstrual mood disorders, pregnancy-related arrhythmias, fetal stress syndrome, fetal hypoxia, amniotic fluid embolism, gestational diabetes, pre-term labor, cervical incompetence, fetal distress, peri- partum maternal mortality, peripartum cardiomyopathy, labor complications, premenstrual syndrome, dysmenorrhea, endometriosis, fertility and subfertility conditions such as infertility, early pregnancy loss, spontaneous abortion, failure of implantation, amenorrhea, luteal insufficiency, and the like; sudden death syndromes, e.g., sudden adult death syndrome, and the like; menstrual related disorders, e.g., pelvic pain, dysmenorrhea, gastrointestinal disease, nausea, and the like; peripartum and pregnancy related conditions, e.g., peripartum

cardiomyopathy, and the like; fibrosis; post-operative recovery conditions such as post-operative pain, post operative ileus, post-operative fever, post-operative nausea, and the like; post- procedural recovery conditions such as post- procedural pain, post procedural ileus, post- procedural fever, post-procedural nausea, and the like; chronic pain; trauma; hospitalization; glaucoma; disorders of thermoregulation; fibromyalgia; and the like.

[00123] Non-limiting exemplary embodiments of the present disclosure are provided as follows:

1. A medical informatics system that implements a medical informatics compute cluster (MICC), the system comprising:

a Signal Archiving and Computation System (SACS) comprising a processor programmed to collect signal data from one or more body sensors;

an Integrated Data Warehouse (IDW) comprising memory comprising data associated with a user;

an analysis engine communicatively coupled to the SACS and the IDW and comprising a processor programmed to associate respective ones of the signal data with respective ones of the data associated with the user; and

a data store communicatively coupled to the analysis engine and configured to store at least a portion of the signal data, the data associated with the user, or associations between the signal data and the data associated with the user.

2. The system of 1 wherein the SACS comprises a database, wherein the processor is

programmed to store data from the one or more body sensors in the database.

3. The system of 1 or 2 wherein the SACS is further configured to process the signal data via at least one of feature extraction or signal processing.

4. The system of any of 1-3, further comprising a Support Server Cluster (SSC) comprising a processor programmed to provide network services for at least one of the SACS, the IDW, the analysis engine or the data store.

5. The system of 4 wherein the SSC is further configured to secure at least a portion of the data associated with the user, the signal data or the data stored by the data store. The system of any of 1-5, further comprising a dashboard module communicatively coupled to at least one of the data store, the SACS, or the IDW and configured to render a visual representation of at least a portion of data stored in the MICC for display. The system of 6 wherein the dashboard module is further configured to obtain the visual representation from a remote entity such that no data stored by the MICC are sent to the dashboard module. The system of 6 further comprising a module configured to display alerts to the dashboard module indicative of detection of a computable phenotype. The system of any of 1-8, further comprising a phenotype generation module communicatively coupled to the data store and comprising instructions that, when executed by a processor, generate at least one phenotype based on data stored by the MICC. The system of 9 wherein the phenotype generation module is further configured to generate the at least one phenotype via machine learning. The system of 9 further comprising a complex event processing module

communicatively coupled to the MICC and the phenotype generation module and configured to generate decision support information based at least in part on a phenotype generated by the phenotype generation module and the data stored by the MICC. The system of 9 wherein the phenotype comprises one or more rules and the system further comprises a plugin communicatively coupled to the phenotype generation module and configured to initiate at least one action according to the one or more rules. The system of 12 wherein the system is configured to continuously process and analyze incoming signal data by applying components of the plugin to detect rich phenotypes. The system of any of 1-13, wherein the SACS collects signal data from two or more body sensors. The system of any of 1-14, wherein the SACS collects signal data from four or more body sensors. The system of any of 1-15, wherein the SACS collects signal data from ten or more body sensors. The system of any of 1-16, wherein the SACS collects signal data from 50 or more body sensors. The system of any of 1-17, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. The system of any of 1-18, wherein the one or more body sensors measure at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. The system of any of 1-19, wherein at least one body sensor comprises an EEG or MEG sensor. The system of 20, wherein the EEG sensor is contained in an EEG array. The system of 20 or 21, wherein at least one body sensor comprises a plurality of EEG sensors The system of any of 1-22, wherein at least one body sensor is an accelerometer. The system of any of 1-23, wherein at least one body sensor is a pulse oximeter. The system of any of 1-24, wherein at least one body sensor measures the user's blood glucose. The system of 25, wherein the user's blood glucose is measured continuously. The system of any of 1-26, wherein the signal data comprises one or more spirometry measures. The system of any of 1-27, wherein the signal data comprises location data. The system any of 1-28, wherein the SACS collects at least a portion of the signal data wirelessly. The system of 29, wherein the SACS collects at least a portion of the signal data via Bluetooth. The system of 29 or 30, wherein the SACS collects at least a portion of the signal data via Wi-Fi. The system of 29, wherein the SACS collects at least a portion of the signal data via a cellular network. The system of any of 1-32, wherein the SACS collects at least a portion of the signal data from the user' s smartphone. The system of any of 1-33, wherein the SACS comprises 10 to 1000 compute nodes or more. The system of any of 1-34, wherein the data associated with a user comprises at least one of calendar data, survey data, or EHR data. The system of 10, wherein the generation of the phenotype comprises application of a supervised machine learning algorithm. The system of 36, wherein the generation of the phenotype comprises application of at least one of AODE; artificial neural network; backpropagation; Bayesian statistics; Naive Bayes classifier; Bayesian network; Bayesian knowledge base; Case-based reasoning; Decision trees; Inductive logic programming; Gaussian process regression; Learning Vector Quantization; Instance-based learning; Nearest Neighbor Algorithm; Analogical modeling; Probably approximately correct learning (PAC) learning;

Symbolic machine learning algorithms; Subsymbolic machine learning algorithms; Support vector machines; Random Forests; Ensembles of classifiers; Regression analysis; Information fuzzy networks (IFN); Linear classifiers; Fisher's linear discriminant; Logistic regression; Quadratic classifiers; k-nearest neighbor; C4.5; Hidden Markov models; Data clustering; Expectation- maximization algorithm; Self-organizing maps; Radial basis function network; Vector Quantization; Generative topographic map; A priori algorithm; Eclat algorithm; FP-growth algorithm; Hierarchical clustering;

Single-linkage clustering; Conceptual clustering; Partitional clustering; K-means algorithm; or Fuzzy clustering. The system of any of 1-37, wherein the system is configured to collect signal data from 10 users or more. The system of any of 1-38, wherein the system is configured to collect signal data from 1,000,000 users or more. The system of any of 1-39, further comprising a module communicatively coupled to the data store and configured to render a notification to an end-user. The system of 40, wherein the notification comprises e-mail, SMS, social message, or phone call. A system comprising:

a dashboard module comprising a processor programmed to display data associated with an informatics system; and

a display module communicatively coupled to the dashboard module, the display module configured to display alerts to the dashboard module indicating detection of a computable phenotype associated with the informatics system. The system of 42, wherein the informatics system comprises a Signal Archiving and Computation System (SACS) comprising a processor programmed to collect signal data from one or more body sensors; and an Integrated Data Warehouse (IDW) comprising memory comprising data associated with a user. The system of 43, wherein the informatics system comprises a Support Server Cluster (SSC) comprising a processor programmed to provide network services for at least one of the SACS or the IDW. The system of any of 42-44, wherein the dashboard module is further configured to obtain the visual representation from a remote entity such that no data stored by the informatics system are sent to the dashboard module. A system comprising:

a data store module comprising a database and a processor programmed to collect and store incoming signal data in the database;

a plugin operably coupled to the data store module comprising instructions for processing signal data, wherein the instructions, when executed by the processor of the data store module, cause the processor to process at least part of the incoming signal data; and

a module configured to the data store module and the plugin and configured to continuously process and analyze the incoming signal data by applying components of the plugin to detect computable phenotypes. The system of 46, wherein the incoming signal data comprises data from two or more body sensors. The system of 46 or 47, wherein the incoming signal data comprises data from four or more body sensors. The system of any of 46-48, wherein the incoming signal data comprises data from ten or more body sensors. The system of any of 46-49, wherein at least one body sensor measures at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. The system of any of 46-50, wherein at least one body sensor measures at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. A computer system, the system comprising:

a processor; and

memory operably coupled to the processor, wherein the memory includes instructions stored therein for generating a clinical phenotype for a subject, wherein the instructions, when executed by the processor, cause the processor to:

collect signal data from one or more body sensors;

store data associated with a user;

associate respective ones of the signal data with respective ones of the data associated with the user; and

generate at least one clinical phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user. The system of 52, wherein signal data is collected from two or more body sensors. The system of 52 or 53, wherein signal data is collected from four or more body sensors. The system of any of 52-54, wherein signal data is collected from ten or more body sensors. The system of any of 52-55, wherein signal data is collected from 50 or more body sensors. The system of any of 52-56, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. A computer-readable medium having computer-executable instructions stored thereon to generate a computable phenotype for a subject, wherein the instructions, when executed by one or more processors of a computer, causes the one or more processors to:

collect signal data from one or more body sensors;

store data associated with a user;

generate at least one computable phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user. A system for managing medical data, the system comprising:

means for collecting signal data from one or more body sensors;

means for storing the signal data;

means for storing data associated with a user;

means for associating respective ones of the signal data with respective ones of the data associated with the user; and

means for storing at least a portion of the signal data or associations between the signal data and the data associated with the user at a data store. A system for managing medical data, the system comprising:

means for collecting signal data from one or more body sensors;

means for storing the signal data;

means for storing data associated with a user;

means for associating patterns within the signal data with patterns within the data associated with a user; and

means for storing at least a portion of the signal data or associations between patterns within the signal data with patterns within the data associated with a user at a data store. A method of managing data via a medical informatics compute cluster (MICC), the method comprising:

collecting and storing signal data from one or more sensors;

storing data associated with a user; associating respective ones of the signal data with respective ones of the data associated with the user; and

storing at least a portion of the signal data or associations between the signal data and the data associated with the user at a data store. The method of 61 further comprising securing at least a portion of the data associated with the user, the signal data or the data stored by the data store. The method of 62 further comprising rendering a visual representation of at least a portion of data stored by the data store for display. The method of 63 further comprising obtaining the visual representation from a remote entity such that no data stored by the data store are processed at an entity that performs the rendering. The method of 61 further comprising generating at least one phenotype based on data stored by the MICC. The method of 61 further comprising generating decision support information based at least in part on the at least one phenotype and the data stored by the data store. The method of any of 61-66, wherein the signal data is collected and stored from two or more body sensors. The method of any of 61-67, wherein the signal data is collected and stored from four or more body sensors. The method of any of 61-68, wherein the signal data is collected and stored from ten or more body sensors. The method of any of 61-69, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. The method of any of 61-70, wherein the one or more body sensors measure at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂. A method of generating a computable phenotype via a medical informatics compute cluster (MICC), the method comprising:

collecting, with a processor, signal data from one or more body sensors;

storing, with the processor, data associated with a user;

associating, with the processor, respective ones of the signal data with respective ones of the data associated with the user; and

generating, with a processor, at least one computable phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user. A method of producing a computable phenotype, the method comprising:

generating a data set by, for each of a plurality of users having clinical presentations of a condition:

obtaining signal data from one or more body sensors;

obtaining non-signal data associated with user;

associating respective ones of the signal data with respective ones of the data associated with the user; and

storing at least a portion of the signal data or associations between the signal data and the data associated with the user; and

applying at least one machine learning algorithm to the data set, thereby producing a computable phenotype. A method of detecting the occurrence of an event in a user, the method comprising: collecting, with a processor, signal data from one or more body sensors;

storing, with the processor, data associated with a user;

associating, with the processor, patterns within the signal data with patterns within the data associated with the user; and comparing, with the processor, the associated patterns of the user a computable phenotype to detect the presence or absence of an event in the user.

EXAMPLES

[00124] As can be appreciated from the disclosure provided above, the present disclosure has a wide variety of applications. Accordingly, the following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Those of skill in the art will readily recognize a variety of noncritical parameters that could be changed or modified to yield essentially similar results. Thus, the following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used but some experimental errors and deviations should be accounted for.

EXAMPLE 1: DESIGN AND IMPLEMENTATION OF A SIGNAL ARCHIVING AND COMPUTATION SYSTEM

[00125] Described herein is a non- limiting example implementation of a Signal Archiving and Computation System. Components of this particular implementation of a SACS are first described, followed by a description of the system. The SACS described here is used in the MICC implementations used in the Examples that follow.

Dataset

[00126] A subset of the MIMICII dataset was utilized as the dataset to test this implementation of a SACS. The MIMICII dataset is described in Saeed, et al. 2002. Computers in Cardiology 29 (September): 641-644; the disclosure of which is incorporated herein by reference. The subset that was utilized consists of waveform data with a recorded sampling frequency of 125 Hz, and numeric time-series data sampled at 1 minute intervals. This data was collected from nearly 500 patient-records, totaling approximately 45,000 hours of waveform data. Other clinical data, such as the laboratory, pharmacy, and nursing databases, are also available and time-correlated with the signal data. All of the data have been de-identified. MapReduce

[00127] Map and Reduce may be utilized for parallelizing computation. One benefit of the MapReduce approach is the ability to focus solely on the computation, and not the shuffling of data between processors. A second benefit of MapReduce is in data-locality. With the MapReduce paradigm, most of the computation is done on the slave node that contains a copy of the input data. This results in a minimal amount of data being sent over the network, increasing overall efficiency.

[00128] Generally, each "job" that needs to be run is split into two separate tasks - a map task and a reduce task. The map task is a user-defined function and handles the initial "mapping" of the input data, taking a <keyl, value 1> pair as input and outputting an intermediate <key2, value2> pair. Oftentimes, the map task maps multiple different values to the same key.

[00129] The reduce task is also user-defined and takes the intermediate <key2, value2> pairs and merges all of the values that correspond to the same key. One example of a reduce task is to take the running sum or mean of values with the same key.

Hadoop

[00130] Hadoop is an open-source implementation of the MapReduce parallel programming paradigm. Hadoop provides both a distributed file system (called HDFS) as well as the MapReduce parallel computation framework. Data are stored in the HDFS and made available to the various slave nodes for computation. Hadoop is an Apache Foundation project and is written in Java though hooks are in place to facilitate the deployment of code written in other languages such as C or Python. Hadoop is a master-slave architecture where a single master node coordinates many slave machines which carry out the actual computation.

[00131] To enable data-local processing, each slave machine prioritizes computations on data of which it has a copy. This minimizes the shuffling of data over the network, decreasing the necessary network IO bandwidth. Additional nodes can be added to the cluster to increase storage capacity or computational power as necessary, during which the data are automatically rebalanced to the new nodes. Hadoop has been used in clusters ranging from a handful of nodes to thousands of nodes (e.g., 4,000 or more), demonstrating its ability to scale as needed.

HBase

[00132] HBase is a distributed column-store that runs on top of the Hadoop system. As a result, it depends on the distributed file system as well as the MapReduce framework. HBase provides a storage system where the full power of the MapReduce paradigm is available while also providing convenient, lower latency, random-access to the data. There are other possible alternatives within the Hadoop project that may serve a similar purpose in a SACS, such as Pig and Hive.

[00133] As a "NoSQL" database, HBase's approach to tables, rows, and columns is different than in traditional relational databases. Within HBase, each row is identified by a sortable row key. Each row can contain an arbitrary number of columns resulting in the sparse storage of tables. The columns are identified by a combination of "column family" and "label." The column family is important during schema design because data are stored in a per-column family basis. When a value is written to the database, it is stored with its row key, column family and label identifiers along with other internal data. This results in substantial storage overhead if the identifiers are large in size.

MongoDB

[00134] MongoDB is a document-store database and also has an integrated MapReduce computational framework. It is also a "NoSQL" database and is designed to be deployed in a clustered environment.

System Description

[00135] FIG. 2 presents an overview of the primary components of this implementation of a SACS. In this embodiment, the primary components involve: 1) data storage, 2) metadata storage, 3) MapReduce, and 4) data visualization. These are implemented in this particular, non- limiting example of a SACS using: 1) HBase, 2) MongoDB, 3) Hadoop, and 4) Chronoscope with the Google Web Toolkit, respectively.

[00136] In this example, the signal data are stored in HBase, while the signal metadata and other clinical data are stored in MongoDB. Data in HBase and MongoDB are accessible from the Hadoop/MapReduce environment for processing as well as from the data visualization layer. The hardware comprises 1 master node, 6 slave nodes, and several supporting servers. In alternative examples, hardware was provided via a cloud environment, namely the Amazon EC2 cloud environment.

Data Storage with HBase and MongoDB

[00137] The data were stored within HBase as time-series data, such that each row key was the timestamp of a signal's value at a particular point in time. This timestamp was recorded as an 8-byte value of the number of milliseconds from the epoch (January 1, 1970 00:00:00.000). Within each row, each column contained the value of a particular signal for a particular patient corresponding to the row key timestamp. For example, each of the following would be stored in a separate column in a hypothetical use case: (i) arterial blood pressure values for patient A, (ii) arterial blood pressure values for patient B, and (iii) ECG Lead I values for patient A.

[00138] The columns can also contain the values resulting from different feature extraction algorithms. For example, one particular feature extraction algorithm extracts the beat- to-beat systolic pressures from an arterial blood pressure waveform. In this case, the column contains the systolic pressure value at a particular timestamp for a particular patient.

[00139] Each value is stored along with its row key and column identifier. As a result, there is a noticeable increase in storage overhead. To mitigate this, the identifiers were stored in binary form. Because the row key identifiers are numeric values representing the number of milliseconds from the epoch, they are stored as standard 8-byte binary data. The column identifiers, on the other hand, are a combination of a patient identifier as well as a signal identifier. Currently, the number of patients has been limited to a 2-byte value and the signal identifier to a 4-byte value. These values, however, are easily changed should the underlying dataset require it.

[00140] Because the patient and signal identifiers are not intuitive when stored as numeric values, a database was set up to store the metadata. In this implementation, an instance of MongoDB served to store the metadata; MongoDB may be replaced with any convenient database (e.g., any standard relational database) while preserving functionality. This database contained all of the patient demographic information, patient clinical data (e.g. data from other clinical databases) as well as the mappings to the column identifiers used in HBase.

MapReduce with Hadoop

[00141] Because HBase runs within the Hadoop environment, the MapReduce framework can be directly leveraged for data stored within HBase. When writing a MapReduce job for processing the signal data, the user only needs to provide the code for the actual computation. While Hadoop handles the low level management of the data, several Java classes have also been provided here to handle the selection of the appropriate row keys and column identifiers. The user only needs to write the actual computational code and select the input columns, dramatically simplifying the process. [00142] Given the nature of the MapReduce approach and the Hadoop implementation, one potential issue must be addressed when working with feature extraction algorithms that are not stateless. Because each slave machine only processes data that it contains, there is no default mechanism to communicate the status or results of computation occurring on another slave. So, for an algorithm such as the rolling mean, each slave machine is computing its own rolling mean of the subset of data it has a copy of, independent of any of the other slave machines. One general solution is to force each map task to process an entire patient record. This, however, negates the data locality benefit since a single slave machine will be requesting data stored on other slave machines.

[00143] For problems that only require some overlap of data (e.g. only the previous 5 data points are needed), there are hooks built-in to the Hadoop system that allow for custom

"splitting" of the data. The splits can be made such that there is adequate overlap between one map task and the next. While this results in some additional shuffling of data over the network, the majority of the computation is still data-local.

Data Visualization

[00144] In order to visualize both the raw signal data as well as the extracted features, a web-based visualization tool was integrated into the system. This provides the ability to easily visualize the data without having to extract it into a separate plotting tool.

[00145] In this particular implementation, the visualization layer was built using the

Google Web Toolkit (GWT) along with Chronoscope (a charting tool) from Timepedia. A user may specify which signals they want to plot, both raw signals as well as derived signals from feature extraction algorithms, along with the time period of interest. The visualization tool has direct access to the MongoDB instance in order to properly map the binary identifiers to their human-readable counterparts. This data visualization system has the ability to overlay annotations on top of the signal data, allowing for the display of pertinent events from the patient's clinical history over the signal data time (FIGs. 11-16).

Calculation of Sliding Window Central Tendency Measure

[00146] Sliding Window Central Tendency Measure (CTM) calculations were first run within the Matlab environment to take advantage of both its processing and visualization capabilities. However, the Sliding Window CTM was very slow when run via Matlab. To take advantage of the Hadoop MapReduce environment, the data was imported into Hadoop and processed via MapReduce. In order to visualize the data, it first needed to be exported from Hadoop and imported into Matlab for plotting. Because many different window sizes were tested, both of these approaches were tedious and time-consuming.

[00147] The SACS implementation addressed both of these issues by providing an environment that can efficiently process the data while making it immediately available for viewing. This allowed for experimenting with various parameters of the Sliding Window CTM in a fraction of the time otherwise.

EXAMPLE 2: AUTOMATED ARRHYTHMIA DETECTION USING A MEDICAL INFORMATICS COMPUTE CLUSTER

[00148] Clinical alarms represent one of the primary means that clinicians use to monitor the status of their patients. These alarms are critical to ensuring a workflow that allows clinicians to care for more than one patient. These alarms are based on physiological sensors such as an electrocardiogram (ECG/EKG), blood pressure, or intracranial pressure. These physiological sensors offer one of few truly objective windows into a patient's clinical status or condition.

[00149] In order to simplify the problem of data mining physiological signal data, one useful abstraction is to treat any physiological signal as time-series data. Whether looking at daily blood glucose measurements or an EKG waveform, any physiological signal can be described as a set of key-value pairs where the key is a timestamp and the value is the measurement. This measurement can be a voltage, concentration, or any other recording by a sensor. By treating all of these data as time-series data, in certain aspects a system (MICC) has been designed that is able to absorb data from nearly any type of sensor and process and analyze it accordingly.

[00150] An implementation of such a MICC system is described herein. The system utilizes the SACS described above in Example 1.

Physiological Signal Data

[00151] Historically, there have been several major issues when dealing with

physiological signal data, and especially the data mining of such data. These problems are particularly acute for clinical data, and include: (i) difficulty synchronizing sensors; (ii) gaps in the data; and (iii) artifacts or noise in the data stream. MICC was designed to enable any workflow, including as a three-stage computational pipeline addressing these and other issues associated with storing, analyzing, and/or mining physiological signal data. The primary stages of this pipeline involve (i) feature extraction; (ii) change detection; and (iii) machine learning. The result is a computational framework in which any clinical condition can be described as a set of changes of extracted features of a set of physiological signals.

Feature Extraction

[00152] Feature extraction is typically used to refer to a specific form of dimensionality reduction: the transformation of data into a smaller set of features that can accurately describe the original data. When used in this context, feature extraction is a data processing algorithm that is being applied to highlight a particular characteristic of the underlying data.

[00153] Sliding-window algorithms were implemented, such as the Sliding Window

Central Tendency Measure algorithm. The Central Tendency Measure calculates the chaos or variability within a system by first calculating the second-order difference, then determining the percentage of points within a certain radius of the origin. This radius is user-defined and is typically dependent on the characteristics of the dataset and/or question being studied. Given a set of data consisting of points at where t=l,2,3. . . , the CTM can be computed as:

I 0 ^■ o herwise

On its own, the CTM algorithm only provides a single piece of information from an entire input dataset: the percentage of points within a certain radius. While this has been used with some success in medicine, it would not work in MICC since a change series is required, in which specific change points can be detected. Thus, the CTM algorithm was adapted to use a sliding window, and the output was changed to instead give the radius which contains N-percentage of the points, where N is a user-defined parameter.

[00154] A Sliding- Window Power Spectral Density algorithm was also implemented.

There are several different possibilities for incorporating some type of frequency domain analysis with the most common being the Fast Fourier Transform, or FFT. In order to combine some sort of temporal aspect with the frequency domain analysis, the Fourier transform was applied on top of a sliding window, as in short- time Fourier transforms. Additionally, in order to use the Fourier transform within MICC, the Fourier transform was further distilled down to produce a time series in which change points can be detected. This is achieved by calculating the power spectral density and taking the root-sum-square of the PSD, giving a one-dimensional time series that is representative of the underlying frequency components of the signal. Change (Point) Detection

[00155] Each of the feature extraction algorithms effectively creates a new time-series data stream. For each of these streams, change detection algorithms are applied to detect any change points. Like most algorithms, change detection algorithms perform differently depending on characteristics of the underlying data, parameter choices, thresholds, etc. A difference-of- means algorithm was incorporated into the computational pipeline describe here. The

Difference-of-Means algorithm can be described mathematically as follows:

ML(t)

N MR t)

N

\MR (t) - ML t) \

ML(t)

μ ί) : if μ ί) > τ

8{t)

. 0 : if μ(ί)≤ τ

Two values of τ were used: 0.1 and 0.9.

Machine Learning

[00156] MICC may describe changes in a patient' s clinical status or condition in terms of a set of detected changes in physiological signals. In order to address the problem is synchronization of the plethora of sensors, sliding windows are again used to aid in the machine learning phase. After the application of a sliding window to the change points, a combination of supervised and unsupervised methods was used to examine the data.

[00157] Because we had a gold standard, the primary machine learning approach was super- vised learning. The unsupervised learning was used as an exploration tool. In order to setup the learning instances, we took each change point and created a window around it spanning N seconds.

Application of MICC to Arrvthmia Detection

MLMLCLL Dataset

[00158] The dataset used was the MIMICII dataset. The data were collected at the Beth

Israel Deaconess Medical Center from Philips bedside monitors. The exact type and number of sensors varies from patient to patient due to differences in the underlying clinical condition and treatment protocol. However, most of the patient records contain at least one EKG signal and the arterial blood pressure.

[00159] For this work, a subset of the MIMICII dataset (approximately 400 patient records) was used, where every patient record has at least one EKG signal and the arterial blood pressure. Additionally, this subset also has the corresponding alarms that were generated by the Philips EKG monitors and reviewed by expert cardiologists from Harvard. Each alarm was reviewed by two different cardiologists with disagreements being arbitrated by a third. As per the IEEE standard, each valid alarm occurs within 10 seconds of the onset of the arrhythmia. Only those alarms involving ventricular fibrillation were kept in this subset.

In the ventricular fibrillation subset, there are 143 patient records that have been imported into MICC. Each of these contains at least one V-Fib alarm (as determined by the Philips EKG built- in algorithms). This dataset was reduced to 138 patient records to include those that have lead II of the EKG, so as to simplify the application of the computational pipeline to the EKG waveform data. The data were recorded at 125 Hz despite higher sampling rates in some cases such as the EKG.

Feature Extraction: Sliding Window Size

[00160] The effect of sliding window size on temporal specificity and resolution was analyzed. Smaller sliding windows improved the sensitivity of the feature extraction algorithm. In FIG. 16, the bottommost plot shows three variants of the Sliding Window PSD algorithm with varying parameters - the red plot has a window size of 1024 points (roughly 8 seconds) and a skip size of 500 points (4 seconds); the blue plot has a window size of 1024 points and a skip size of 125 points (1 second); the brown plot has a window size of 512 points (roughly 4 seconds) and a skip size of 128 points (roughly 1 second); all have a period of 0.008 sec (125 Hz) which is period of the recorded samples.

[00161] The skip size had a slight smoothing effect but was otherwise insignificant.

However, changing the skip size has a very large impact on the computational efficiency of the algorithm. By increasing the skip size from 125 points to 500 points, the number of computations was decreased by a factor of 4 with almost no noticeable effect on the

performance of the algorithm. Results from feature extraction are shown in FIG. 18. Feature Extraction: Initial Values

[00162] By focusing in single -pass algorithms and using rolling parameters where necessary, the weight or impact of the initial values in the source time series is effectively increased. In FIG. 17, for example, a behavior was seen that is characteristic of a new connection in the MIMICII dataset. The sensor dithers between two values then increases and dithers again before settling to actual readings and is present in most waveform signals in this dataset.

[00163] The top plot is zoomed out on the Y-axes (measured in volts) where the signal ranges from 0 to 50 V. However, if we move forward in time 20 seconds, we see that the EKG typically reads between -300 and 300 mV. The R-wave detection algorithm depends on the maximum value seen and is unable to detect any R-waves due to the abnormally high initial values. This issue is somewhat mitigated by the splitting mechanism inherent in the parallel mapreduce paradigm.

[00164] For example, when analyzing a person's intracranial pressure, it is possible to have values upwards of 300 mmHg during the drainage of cerebral spinal fluid. Prior to draining CSF, the nurse will engage a stopper valve which causes the pressure transducer to read the pressure of the attached fluid reservoir (which is typically 300 mmHg). Typical values for ICP range from about 7-15 mmHg; as a result, algorithms that happen to have a map split that coincides with a drainage event may suffer from the same problem.

[00165] One possible solution is to filter the data using some sort of rule-based approach.

This would work very well in the two situations described above since the erroneous values are well beyond any physiological possibility. However, it would not work for those signals that have a naturally wide range of possible values. Depending on the algorithm, another possibility is to give more weight to more recent values.

Change Detection

[00166] The two parameters for the Difference-of-Means algorithm are the window size

(in number of points) and the threshold (as a percentage of the mean). Initially, a very large window size (1000) was chosen, which resulted in virtually no changes being detected, even with a threshold as low as 10%. The size of the window was decreased by an order of magnitude, but the number of change points being detected remained small. Experimentation resulted in 11 points being determined as the ideal window size for this data set. The second parameter is a threshold: the percent difference of the mean before a change point is detected. Two thresholds were used: 10% and 60%. Machine Learning

[00167] Expectation-maximization clustering was applied to the processed sensor data

(FIGs. 19-20). A cluster was identified (cluster 0, FIG. 19) as warranting further interest. The identified cluster is subsequently reviewed with a clinician to help identify condition(s) that were present at the time of the event.

EXAMPLE 3: PROCESSING DATA FROM EXISTING DATABASES FOR INCLUSION IN MICC

[00168] FIG. 4 depicts a flowchart of an example MICC system. In this particular implementation, data contained in existing databases (e.g., non-signal data) is incorporated into the IDW, where it may be used by the Data Processing Engine during the processing of sensor data.

[00169] Data contained in existing databases may be in a variety of forms. Accordingly, this data may need to be converted into a normalized format prior to aggregation in MICC, so that data from multiple sources may be combined and analyzed. Such normalization may be achieved using an ontology mapping engine.

[00170] In this particular implementation of MICC, the particular ontology mapping engine is the Health Ontology Mapper (HOM). The HOM leverages a single terminology server to allow multiple hospitals or other health care providers to translate clinical information by leveraging the same definitions of clinical terminology, the same data dictionaries representing source clinical software environments and the same set of instance maps used to translate clinical information into standard clinical terminology. Each instance of HOM connects to the terminology server using an API (Application Program Interface) based on BioPortal REST services. These REST services have been extended to support HOM queries for clinical instance data maps. HOM can query these services in a dynamic fashion allowing the application of instance maps to clinical data to occur after the data has already been loaded into a warehouse. The implementation of HOM as incorporated into MICC is now described.

Data Loading

[00171] HOM facilitates the loading of clinical data into a warehouse to enable further analysis. By enabling the translation of instance data after information has been loaded into a warehouse, HOM alleviates the need to translate clinical information statically and no longer requires the employment of IT development staff to translate clinical data during the warehouse loading process. Traditional warehouse data loading is called ETL (Extract Transform Load) processing whereas HOM uses ELT (Extract Load Transform) processing using a set of tag handler components called HOM UETL (Universal ETL).

[00172] The HOM UETL process generates two sets of files. First it generates bulk import files for loading the warehouse. These bulk import files are a native database format supported by all database vendors and UETL currently supports the bulk import file formats of Oracle, Sybase IQ, MonetDB, InfiniDB and SQL Server. Bulk import files are the fastest possible means of importing data into a warehouse. Data loaded into the warehouse in this manner is unmodified and is stored within the warehouse in the same format as it was read from the data source.

[00173] The second set of files generated by UETL is the concept dimension files. These concept dimension files encode the location of the data within the data source as a simple hierarchical list of parent child terms relating the name of the source, the name of the table and the name of the source column from which the data was read.

DataSourceName \ TableName \ ColumnName

[00174] Using this simple representation for the location from which the source data was loaded, both a concept path for the data warehouse and an URI (universal resource indicator) can be constructed for the NCBO BioPortal. Concept dimension files are generated in bulk import format and can therefore be directly loaded into the data warehouse to provide a complete concept ID for each of the facts stored. The concept dimension files are then also loaded into Protege Mapping Master, a program used to load information into NCBO BioPortal and to translate hierarchical terms into OWL (Object Web Language) format. The resulting OWL based representation of the data source is then also loaded into NCBO BioPortal.

[00175] Once the concept dimension files and raw source data have all been loaded in both the data warehouse and the BioPortal reference, the same concept identifiers for source data based on their common set of concept paths.

System access using traditional IT technology

[00176] HOM also includes a feature called HOM ViewCreator that provides the capability to make the results of mapped data more easily accessible by researchers. The ViewCreator can allow access to mapped data, using JDBC, from within Microsoft Excel or biostatistics packages such as SAS, ST ATA, SPSS and R. ViewCreator based views can also be used to build downstream databases or to load information into data mining tools such as Cognos or Business Objects.

Personal Health Information Data Handling for HIPAA Compliance

[00177] The UETL data loading process also includes methods for the removal of PHI

(personal health information) from the incoming source data. That feature is referred to as the ProxyGen Service that replaces PHI with proxy identifiers.

[00178] Specifically when the UETL process loads data from clinical sources it replaces

HIPAA protected patient identifiers with proxy IDs. For example, proxy identifiers replace the patient' s name, the name of his physician, and the patient' s social security number during the data loading process. Further, if the same HIPAA protected source information is encountered from any subsequent data source the same proxy identifiers are returned. This allows the data warehouse to link patient data without any need to store the PHI within the warehouse. By using this technique it is possible to build a warehouse that is a HIPAA Limited Data set that contains only limited dates of service and which has a HIPAA de-identified user interface. By supporting the ProxyGen feature HOM can greatly lower the potential legal liability of using patient data for research or other purposes such as quality improvement.

[00179] The ProxyGen service also provides the ability to de-identify any downstream database that is connected to the data warehouse. This is provided via a set of REST services (a web based application program interface) that can be called by any database that extracts information from the warehouse. The ProxyGen REST services allow PHI to be submitted from downstream databases as well as from UETL loaded source databases. Downstream databases can send PHI to the ProxyGen service and if the same PHI data are submitted the same proxy values will again be returned as those used previously during the UETL loading process. By providing this service ProxyGen not only scrubs PHI from any incoming clinical data source but it can also remove PHI from any downstream database that is connected to the data warehouse as well. The ProxyGen REST services eliminate the need to retain PHI within any warehouse or warehouse-connected database, as PHI is no longer required for record linkage.

[00180] Additionally by using the ProxyDB a report can be created that allows investigators to contact patients. Investigators may supply of a list of proxy ID's for patients that they are interested in. If IRB (Institutional Review Board) approval to contact those patients has been provided then by accessing the ProxyDB a listing of patient contact information for those patients can be provided. This is possible because the ProxyDB contains an association between the proxy ID's and each patient's PHI. Unstructured text handling during load

[00181] The HOM UETL component also optionally contains an embedded copy of the

NCBO Annotator service for annotating unstructured text. By using Annotator clinical findings extracted from source clinical environments can be annotated with BioPortal medical terminologies such as SNOMED/CT. The annotator feature supports named entity recognition and negation. Annotator is not a fully featured NLP (natural language processing) environment but instead is packaged as an automated annotation component used internally by HOM and only during the data loading process. When HOM runs Annotator on incoming full-text (unstructured) data it first identifies a set of BioPortal URI' s for portions of medical

terminologies stored on BioPortal. HOM selects multiple URI' s to be annotated for topic areas of interest so that the same unstructured data can be interpreted within multiple contexts. For example if HOM uses Annotator to select terms of interest in Cardiology, Orthopedic Surgery, and Pediatrics then annotations would be subsequently generated on the same unstructured text multiple times, once for each of those 3 domains. In this manner HOM UETL can select specific types of unstructured clinical findings and annotate those findings for usage within multiple domains of interest.

Instance Mapping

[00182] After the data is loaded HOM and the BioPortal can then be used to dynamically translate warehoused information by traversing maps defined on BioPortal. These maps translate information from source data format into standard medical terminologies. For example, the local hospital discharge data stored within both GE UCare as well as within EPIC can be translated into the same HL7 Discharge Disposition format. Subsequent maps that utilize discharge disposition can then reference the standard HL7 Discharge format. After a map has run the same data exists within the warehouse in both its raw untranslated form and in one or more translated standard medical terminologies. Additional mappings of the same source data can then be added at any time in the future without any need to reload the source data.

[00183] The HOM Interpreter dynamically translates local clinical instance data by communicating with the BioPortal REST services API (application program interface). This translation into standard ontologies happens when requested by the researcher and after the data has already been loaded into the warehouse.

[00184] The instance maps stored on BioPortal can define three difference classes of clinical instance data maps, including 1-to-l maps; many-to-1 maps; and automatic maps (many- to-many). The HOM 1-to-l maps will translate a single term within the value set of the source data system into a single term for the value set of the target medical terminology. The HOM many-to- 1 maps will look for the presence of multiple value set terms from the source data and translate that information into a single target terminology term. These 1-to-l and many-to- 1 maps are defined using Protege and the BioPortal web interface.

[00185] Automatic maps allow a terminologist to check-in algorithms that execute on source data to determine the target terms. Examples of these "auto maps" include the normalization of clinical lab data into bins of "Low", "Low-Normal", "Normal", "High-Normal" and "High". Automatic maps can also include calls to third party terminology servers such as RxNav and may contain biostatistical programs or calls to machine learning libraries.

EXAMPLE 4: DETECTING DROWSINESS OR INTOXICATION

[00186] To monitor drowsiness and/or intoxication in positions requiring the user to be alert (e.g., train conductor, pilot, doctor), a subject is provided with a series of sensors for continuous recording of physiological parameters. The sensors include (i) an accelerometer, (ii) pulse oximeter, (iii) a head-mounted sensor array comprising a plurality of dry EEG sensors, and (iv) an EKG sensor.

[00187] Each sensor transmits the data it receives from the user via a built-in Bluetooth module to the user's smartphone. The user's smartphone runs a custom software application that receives this sensor data, establishes a connection to a MICC, and uploads the sensor data to the MICC. To handle proper authentication, the custom software application presents a login screen to the user where they can login to their account over a secure (e.g. SSL) connection. Upon successful authentication, an SSH public/private key pair is created on the server and the private key is sent from the server to the user' s smartphone via a secure connection. The public key is copied to the data ingestion server and is used by the SSH authentication protocol to authenticate the user's smartphone. The user can remove this public key from their account to disable access by the smartphone in situations such as a lost smartphone. This public/private key pair is used by the SSH protocol to establish secure access to the data ingestion server via the sFTP protocol. Once the sFTP connection is established, the custom software application can send buffer files of the sensor data and any tags or annotations to the server. The transmission can occur at any interval, which, in this case, is every minute. The custom software application uses the sFTP protocol to ensure successful transmission of the file(s). In situations where there is no or poor wireless (e.g. cellular, wifi, etc.) connection quality, the custom software application will automatically reattempt transmission when a suitable wireless connection is reestablished. Once the file(s) are successfully transmitted to the data ingestion server, the file(s) are parsed by a server-side daemon that monitors for new files. This daemon will extract the sensor and non-sensor data as well as associated timestamps and store them within the MICC system.

[00188] The MICC records that sensor data for the user and aggregates the sensor data with the non-sensor data it has stored for that user. Non-sensor data that is stored for the user includes the user' s calendar, location data, and annotations that are entered by the user in response to questions posed to him by the application (e.g., "What did you eat for dinner?" or "How many guests did you have dinner with?") The readings are used to compare the user to a model of a tired state, or an intoxicated state.

[00189] In a prior step, these models are produced as follows. First, a group of healthy volunteers is recruited for participation in a drowsiness study. Each volunteer is provided with (i) an accelerometer; (ii) pulse oximeter, (iii) a head-mounted sensor array comprising a plurality of dry EEG sensors, and (iv) a EKG sensor. The user is then asked to perform a motor task. The user is kept awake, and is subsequently asked to perform that identical motor task each hour for 18 consecutive hours. A drowsy state is determined as a performance decrease of about 10% over the initial control. The sensor data is collected and aggregated with the available non- sensor data in a MICC. Feature extraction algorithms are run on the data for all patients. A plurality of machine learning algorithms is run on the data set. The resulting models are used as definitions as drowsy a state. These models are stored within MICC and are continually applied to new, incoming data.

[00190] Subsequently, when in use if the algorithm flags the user as potentially drowsy and/or intoxicated, an alarm is automatically triggered to notify one or more user-defined persons. The alarm is sent automatically by MICC, via SMS, e-mail, to a display, or other user- defined output device.

EXAMPLE 5: MONITORING DIABETES IN A JUVENILE POPULATION

[00191] To monitor diabetes and patient compliance in a juvenile population, individual juvenile subjects are provided with a networked (e.g., Bluetooth) blood glucose monitor. The monitor is configured to communicate with a MICC upon the completion of a blood glucose reading.

[00192] For each subject, MICC keeps track of the time elapsed since the last reading was recorded. An alarm is triggered if either of two conditions is present: (i) the time since the last reading was received exceeds a user-defined threshold, or (ii) the time since the last reading was received exceeds a patient- specific threshold modeled based off of the patient's glucose reading history. Upon triggering the alarm, a message is sent via MICC to the user(s) specified in that subject's MICC record. The manner in which the message is sent is user-defined, and selectable from SMS, e-mail, and other communication means. A user-configurable option is to also, or instead, be notified each time a reading has been taken, as well as the value of that reading.

[00193] MICC also stores the value of each reading taken by the subject. Recorded values are compared with the subject's prior readings to determine activity level trending. If the value change exceeds a MICC-learned threshold, an alarm is sent via MICC to the user(s) specified in that subject's MICC record. In such instances, the default setting is to notify the parent(s) of the juvenile, as well as the health care provider at his or her school, if the reading was taken during school hours.

[00194] In addition to the sensor data described above, MICC also aggregates the sensor data with non-sensor data, such as the subject's recorded diet and activity information. Non- sensor information includes the subject's calendar information. The calendar information is annotated by a parent to identify potentially hazardous events (e.g., a soccer practice), and MICC can trigger an alarm based on the non-existence of sensor data within a given time of that event. Further, potentially dangerous activity is automatically detected by the system based upon the aggregate of the information. For example, MICC detects in a user that in prior cases where the subject had a blood glucose range comparable to his most recent recording, and failed to have a meal within 90m of his scheduled soccer practice, he suffered a hypoglycemic episode. Alerts are automatically sent to one or more individual(s) where potentially dangerous activity is detected a priori.

EXAMPLE 6: MONITORING MOBILITY AND ACTIVITY IN AGED POPULATIONS

[00195] To monitor mobility and activity in aged populations, individual subjects are provided with at least one of the following networked sensors: (i) an accelerometer, (ii) pedometer, (iii) GPS sensor, or (iv) smartphone containing any one or more of the foregoing sensors.

[00196] For each subject, MICC keeps track of the subject's movement over a given period. The period is user-definable, with a default being a 24 hour period. The amount of movement is compared with the subject's prior recordings, and decreases in a day's recorded movement are reported via an alarm to a user-defined caregiver.

[00197] Further, MICC automatically triggers an alarm when movement is not detected for a given interval at a time of day in which the user is normally active. For example, if the user's prior history indicates that she does not spend greater than 2 hours sedentary, an alarm may be triggered automatically by MICC when that time without movement is exceeded. In this aspect, MICC may automatically serve to alert others of an adverse event, such as a fall that has prevented the subject from moving or calling for help.

[00198] Further, MICC compares the mobility of that user to those of other users of comparable age and/or overall health. Decreases in a user' s mobility relative to this population are flagged and an alarm is triggered. The precise alarm that is triggered depends upon the severity of the change. The thresholds for what constitutes a severe or minor change are user- definable, as are the person(s) who receive the alarm for each classification.

EXAMPLE 7: SLEEP APNEA DIAGNOSIS

[00199] To diagnose sleep apnea and/or measure the effectiveness of a sleep apnea intervention, a patient is provided with a pulse oximeter for home-based use by the physician. The sensor is configured to communicate via wireless protocol (e.g., Bluetooth) with the user's smartphone. The smartphone uploads the sensor data from the pulse oximeter to the physician' s MICC. A custom application on the smartphone also asks the patient to rate the quality of the previous night's sleep and sends this information to the MICC.

[00200] The MICC aggregates the sensor data obtained for a given patient with all non- sensor data available for that patient. The resulting data is then compared to a local model of sleep apnea for diagnosis.

EXAMPLE 8: SLEEP APNEA INTERVENTION OUTCOMES MEASUREMENT IN A PHYSICIAN PRACTICE

[00201] To measure the effectiveness of different sleep apnea interventions in a physician practice, each patient is provided with a pulse oximeter for home-based use. The sensor is configured to communicate via Bluetooth with the user's smartphone. The smartphone uploads the sensor data from the pulse oximeter to the physician practice's MICC. A custom application on the smartphone also asks the patient to rate the quality of the previous night' s sleep and sends this information to the MICC.

[00202] The MICC aggregates the sensor data obtained for a given patient with all non- sensor data available for that patient. This procedure is simultaneously completed for all other patients being treated in the practice for the same condition. The resulting data is then queried and processed by managers of the practice to gain insights into which treatment(s) are effective. EXAMPLE 9: DETERMINING THE COST EFFECTIVENESS OF A HYPERTENSION EDUCATION PROGRAMS

[00203] To determine the cost effectiveness of various hypertension interventions, an insurance payer provides its members with a home blood pressure measurement device. The measurement device is networked, such that measurements are uploaded to the insurance company's MICC.

[00204] The MICC aggregates the sensor data obtained for a given patient with all other non-sensor data available for the patient, specifically including the amount spent in aggregate for the subject's hypertension control over the period. The insurance carrier offers education programs to a subset of its members. The effectiveness of the programs at reducing

hypertension is measured by comparing the blood pressure measurements of those patients with those of a control group.

EXAMPLE 10: DIABETES OUTCOMES MEASUREMENT TO EVALUATE CLINIC EFFECTIVENESS

[00205] To determine which health-care providers are providing the most cost-effective management of a diabetic condition, a self-insured employer provides its employees with a continuous glucose monitoring device. The device is networked, and communicates directly with the self-insured employer's MICC. The sensor data is aggregated with the non-sensor data known for that patient, including previous treatments, and the costs that the employer has incurred for the treatment of that patient's condition. By querying and processing the data for all diabetic employees in the MICC, benefits managers of the employer can develop insights into which provider(s) are most effective at controlling diabetes from both the cost and glucose level management standpoints.

[00206] Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this disclosure that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

[00207] Accordingly, the preceding merely illustrates the principles of the invention. It will be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Furthermore, all examples and conditional language recited herein are principally intended to aid the reader in understanding the principles of the invention and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents and equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. The scope of the present invention, therefore, is not intended to be limited to the exemplary embodiments shown and described herein. Rather, the scope and spirit of present invention is embodied by the appended claims.

Claims

CLAIMS What Is Claimed Is:

1. A medical informatics system that implements a medical informatics compute cluster

(MICC), the system comprising:

2. The system of claim 1 wherein the SACS comprises a database, wherein the processor is programmed to store data from the one or more body sensors in the database.

3. The system of claim 1 or 2 wherein the SACS is further configured to process the signal data via at least one of feature extraction or signal processing.

4. The system of any preceding claim, further comprising a Support Server Cluster (SSC) comprising a processor programmed to provide network services for at least one of the SACS, the IDW, the analysis engine or the data store.

5. The system of claim 4 wherein the SSC is further configured to secure at least a portion of the data associated with the user, the signal data or the data stored by the data store.

6. The system of any preceding claim, further comprising a dashboard module

communicatively coupled to at least one of the data store, the SACS, or the IDW and configured to render a visual representation of at least a portion of data stored in the MICC for display.

7. The system of claim 6 wherein the dashboard module is further configured to obtain the visual representation from a remote entity such that no data stored by the MICC are sent to the dashboard module.

8. The system of claim 6 further comprising a module configured to display alerts to the

dashboard module indicative of detection of a computable phenotype.

9. The system of any preceding claim, further comprising a phenotype generation module communicatively coupled to the data store and comprising instructions that, when executed by a processor, generate at least one phenotype based on data stored by the MICC.

10. The system of claim 9 wherein the phenotype generation module is further configured to generate the at least one phenotype via machine learning.

11. The system of claim 9 further comprising a complex event processing module

communicatively coupled to the MICC and the phenotype generation module and configured to generate decision support information based at least in part on a phenotype generated by the phenotype generation module and the data stored by the MICC.

12. The system of claim 9 wherein the phenotype comprises one or more rules and the system further comprises a plugin communicatively coupled to the phenotype generation module and configured to initiate at least one action according to the one or more rules.

13. The system of claim 12 wherein the system is configured to continuously process and

analyze incoming signal data by applying components of the plugin to detect rich phenotypes.

14. The system of any preceding claim, wherein the SACS collects signal data from two or more body sensors.

15. The system of any preceding claim, wherein the SACS collects signal data from four or more body sensors.

16. The system of any preceding claim, wherein the SACS collects signal data from ten or more body sensors.

17. The system of any preceding claim, wherein the SACS collects signal data from 50 or more body sensors.

18. The system of any preceding claim, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

19. The system of any preceding claim, wherein the one or more body sensors measure at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

20. The system of any preceding claim, wherein the one or more body sensors measure

temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

21. The system of any preceding claim, wherein at least one body sensor comprises an EEG or MEG sensor.

22. The system of claim 21, wherein the EEG sensor is contained in an EEG array.

23. The system of claim 21 or 22, wherein at least one body sensor comprises a plurality of EEG sensors

24. The system of any of claims 21-23, wherein the EEG sensor is a dry EEG sensor.

25. The system of any preceding claim, wherein at least one body sensor is an accelerometer.

26. The system of any preceding claim, wherein at least one body sensor is a pulse oximeter.

27. The system of any preceding claim, wherein at least one body sensor measures the user's blood glucose.

28. The system of claim 27, wherein the user's blood glucose is measured continuously.

29. The system of any preceding claim, wherein the signal data comprises one or more

spirometry measures.

30. The system of claim 29, wherein the spirometry measures are selected from FVC, FEV1, FEV1 , PEF, FET, MVV, and MEF75*.

31. The system of any preceding claim, wherein the signal data comprises location data.

32. The system of any preceding claim, wherein the SACS collects at least a portion of the signal data wirelessly.

33. The system of claim 32, wherein the SACS collects at least a portion of the signal data via Bluetooth.

34. The system of claim 32 or 33, wherein the SACS collects at least a portion of the signal data via Wi-Fi.

35. The system of claim 32, wherein the SACS collects at least a portion of the signal data via a cellular network.

36. The system of any preceding claim, wherein the SACS collects at least a portion of the signal data from the user's smartphone.

37. The system of any preceding claim, wherein the SACS comprises 10 compute nodes or more.

38. The system of any preceding claim, wherein the SACS comprises 100 compute nodes or more.

39. The system of any preceding claim, wherein the SACS comprises 1000 compute nodes or more.

40. The system of any preceding claim, wherein the data associated with a user comprises at least one of calendar data, survey data, or EHR data.

41. The system of any preceding claim, wherein the user is mammalian.

42. The system of any preceding claim, wherein the user is human.

43. The system of any preceding claim, wherein the user is a juvenile.

44. The system of any preceding claim, wherein the user is a senior citizen.

45. The system of any preceding claim, wherein the user has been diagnosed as having a

condition.

46. The system of any preceding claim, wherein the user has diabetes.

47. The system of any preceding claim, wherein the user has hypertension.

48. The system of any preceding claim, wherein the user has an arrhythmia.

49. The system of claim 10, wherein the generation of the phenotype comprises application of a supervised machine learning algorithm.

50. The system of claim 10 or 49, wherein the generation of the phenotype comprises

application of an unsupervised machine learning algorithm.

51. The system of any of claims 10 or 49-50, wherein the generation of the phenotype comprises application of a statistical classifier.

52. The system of any of claims 10 or 49-51, wherein the generation of the phenotype comprises application of at least one of AODE; artificial neural network; backpropagation; Bayesian statistics; Naive Bayes classifier; Bayesian network; Bayesian knowledge base; Case-based reasoning; Decision trees; Inductive logic programming; Gaussian process regression;

Learning Vector Quantization; Instance-based learning; Nearest Neighbor Algorithm; Analogical modeling; Probably approximately correct learning (PAC) learning; Symbolic machine learning algorithms; Subsymbolic machine learning algorithms; Support vector machines; Random Forests; Ensembles of classifiers; Regression analysis; Information fuzzy networks (IFN); Linear classifiers; Fisher's linear discriminant; Logistic regression;

Quadratic classifiers; k-nearest neighbor; C4.5; Hidden Markov models; Data clustering; Expectation-maximization algorithm; Self-organizing maps; Radial basis function network; Vector Quantization; Generative topographic map; A priori algorithm; Eclat algorithm; FP- growth algorithm; Hierarchical clustering; Single-linkage clustering; Conceptual clustering; Partitional clustering; K-means algorithm; or Fuzzy clustering.

53. The system of any of claims 10 or 48-52, wherein the generation of the phenotype comprises application of at least five of AODE; artificial neural network; backpropagation; Bayesian statistics; Naive Bayes classifier; Bayesian network; Bayesian knowledge base; Case-based reasoning; Decision trees; Inductive logic programming; Gaussian process regression;

Learning Vector Quantization; Instance-based learning; Nearest Neighbor Algorithm;

Analogical modeling; Probably approximately correct learning (PAC) learning; Symbolic machine learning algorithms; Subsymbolic machine learning algorithms; Support vector machines; Random Forests; Ensembles of classifiers; Regression analysis; Information fuzzy networks (IFN); Linear classifiers; Fisher's linear discriminant; Logistic regression;

54. The system of any preceding claim, wherein the system is configured to collect signal data from 10 users or more.

55. The system of any preceding claim, wherein the system is configured to collect signal data from 100 users or more.

56. The system of any preceding claim, wherein the system is configured to collect signal data from 1000 users or more.

57. The system of any preceding claim, wherein the system is configured to collect signal data from 10,000 users or more.

58. The system of any preceding claim, wherein the system is configured to collect signal data from 100,000 users or more.

59. The system of any preceding claim, further comprising a module communicatively coupled to the data store and configured to render a notification to an end-user.

60. The system of claim 59, wherein the notification comprises e-mail, SMS, social message, or phone call.

61. A system comprising:

a display module communicatively coupled to the dashboard module, the display module configured to display alerts to the dashboard module indicating detection of a computable phenotype associated with the informatics system.

62. The system of claim 61, wherein the informatics system comprises a Signal Archiving and Computation System (SACS) comprising a processor programmed to collect signal data from one or more body sensors; and an Integrated Data Warehouse (IDW) comprising memory comprising data associated with a user.

63. The system of claim 62, wherein the informatics system comprises a Support Server Cluster (SSC) comprising a processor programmed to provide network services for at least one of the SACS or the IDW.

64. The system of any of claims 61-63, wherein the dashboard module is further configured to obtain the visual representation from a remote entity such that no data stored by the informatics system are sent to the dashboard module.

65. The system of any of claims 61-64, wherein the display module displays alerts for a plurality of users.

66. The system of any of claims 61-64, wherein the display module displays alerts for 100 users or more.

67. A system comprising:

a plugin operably coupled to the data store module comprising instructions for processing signal data, wherein the instructions, when executed by the processor of the data store module, cause the processor to process at least part of the incoming signal data; and a module configured to the data store module and the plugin and configured to continuously process and analyze the incoming signal data by applying components of the plugin to detect computable phenotypes.

68. The system of claim 67, wherein the incoming signal data comprises data from two or more body sensors.

69. The system of claim 67 or 68, wherein the incoming signal data comprises data from four or more body sensors.

70. The system of any of claims 67-69, wherein the incoming signal data comprises data from ten or more body sensors.

71. The system of any of claims 67-70, wherein the incoming signal data comprises data from 50 or more body sensors.

72. The system of any of claims 68-71, wherein at least one body sensor measures at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

73. The system of any of claims 68-72, wherein at least one body sensor measures at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

74. The system of any of claims 68-73, wherein the body sensors measure temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

75. The system of any of claims 68-74, wherein at least one body sensor comprises an EEG or MEG sensor.

76. The system of claim 75, wherein the EEG sensor is contained in an EEG array.

77. The system of claim 75 or 76, wherein at least one body sensor comprises a plurality of EEG sensors

78. The system of any of claims 75-77, wherein the EEG sensor is a dry EEG sensor.

79. The system of any of claims 68-78, wherein at least one body sensor is an accelerometer.

80. The system of any of claims 68-79, wherein at least one body sensor is a pulse oximeter.

81. The system of any of claims 68-80, wherein at least one body sensor measures the user's blood glucose.

82. The system of claim 81, wherein the user's blood glucose is measured continuously.

83. The system of any of claims 67-82, wherein the incoming signal data comprises one or more spirometry measures.

84. The system of claim 83, wherein the spirometry measures are selected from FVC, FEV1, FEV1 , PEF, FET, MVV, and MEF75*.

85. The system of any of claims 67-84, wherein the incoming signal data comprises location data.

86. The system of any of claims 67-85, wherein at least a portion of the incoming signal data is collected wirelessly.

87. The system of claim 86, wherein at least a portion of the incoming signal data is collected via Bluetooth.

88. The system of any of claims 86-87, wherein at least a portion of the incoming signal data is collected via Wi-Fi.

89. The system of any of claims 86-88, wherein at least a portion of the incoming signal data is collected via a cellular network.

90. The system of any of claims 86-89, wherein at least a portion of the incoming signal data is collected from the user' s smartphone.

91. The system of any of claims 67-90, wherein processing and analyzing the incoming signal comprises applying at least one of feature extraction or signal processing.

92. A computer system, the system comprising:

a processor; and

collect signal data from one or more body sensors;

store data associated with a user;

generate at least one clinical phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user.

93. The system of claim 92, wherein signal data is collected from two or more body sensors.

94. The system of claim 92 or 93, wherein signal data is collected from four or more body

sensors.

95. The system of any of claims 92-94, wherein signal data is collected from ten or more body sensors.

96. The system of any of claims 92-95, wherein signal data is collected from 50 or more body sensors.

97. The system of any of claims 92-96, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

98. The system of any of claims 92-97, wherein the one or more body sensors measure at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

99. The system of any of claims 92-98, wherein the one or more body sensors measure

100. The system of any of claims 92-99, wherein at least one body sensor comprises an EEG or MEG sensor.

101. The system of claim 100, wherein the EEG sensor is contained in an EEG array.

102. The system of claim 100 or 101, wherein at least one body sensor comprises a plurality of EEG sensors

103. The system of any of claims 101-102, wherein the EEG sensor is a dry EEG sensor.

104. The system of any of claims 92-103, wherein at least one body sensor is an

accelerometer.

105. The system of any of claims 92-104, wherein at least one body sensor is a pulse

oximeter.

106. The system of any of claims 92-105, wherein at least one body sensor measures the user's blood glucose.

107. The system of claim 106, wherein the user's blood glucose is measured continuously.

108. The system of any of claims 92-107, wherein the signal data comprises one or more spirometry measures.

109. The system of claim 108, wherein the spirometry measures are selected from FVC, FEV1, FEV1 , PEF, FET, MVV, and MEF75*.

110. The system of any of claims 92-109, wherein the signal data comprises location data.

111. The system of any of claims 92-110, wherein at least a portion of the signal data is collected wirelessly.

112. The system of claim 111, wherein at least a portion of the signal data is collected via Bluetooth.

113. The system of claim 111 or 112, wherein at least a portion of the signal data is collected via Wi-Fi.

114. The system of any of claims 111-113, wherein at least a portion of the signal data is collected via a cellular network.

115. The system of any of claims 111-114, wherein at least a portion of the signal data is collected from the user' s smartphone.

116. A computer-readable medium having computer-executable instructions stored thereon to generate a computable phenotype for a subject, wherein the instructions, when executed by one or more processors of a computer, causes the one or more processors to:

collect signal data from one or more body sensors;

store data associated with a user; associate respective ones of the signal data with respective ones of the data associated with the user; and

generate at least one computable phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user.

117. A system for managing medical data, the system comprising:

means for collecting signal data from one or more body sensors;

means for storing the signal data;

means for storing data associated with a user;

means for storing at least a portion of the signal data or associations between the signal data and the data associated with the user at a data store.

118. A system for managing medical data, the system comprising:

means for collecting signal data from one or more body sensors;

means for storing the signal data;

means for storing data associated with a user;

means for storing at least a portion of the signal data or associations between patterns within the signal data with patterns within the data associated with a user at a data store.

119. A method of managing data via a medical informatics compute cluster (MICC), the method comprising:

collecting and storing signal data from one or more sensors;

storing data associated with a user;

storing at least a portion of the signal data or associations between the signal data and the data associated with the user at a data store.

120. The method of claim 119 further comprising securing at least a portion of the data

associated with the user, the signal data or the data stored by the data store.

121. The method of claim 119 further comprising rendering a visual representation of at least a portion of data stored by the data store for display.

122. The method of claim 121 further comprising obtaining the visual representation from a remote entity such that no data stored by the data store are processed at an entity that performs the rendering.

123. The method of claim 119 further comprising generating at least one phenotype based on data stored by the MICC.

124. The method of claim 119 further comprising generating decision support information based at least in part on the at least one phenotype and the data stored by the data store.

125. The method of any preceding claim, wherein the signal data is collected and stored from two or more body sensors.

126. The method of any preceding claim, wherein the signal data is collected and stored from four or more body sensors.

127. The method of any preceding claim, wherein the signal data is collected and stored from ten or more body sensors.

128. The method of any preceding claim, wherein the one or more body sensors measure at least one of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

129. The method of any preceding claim, wherein the one or more body sensors measure at least two of temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

130. The method of any preceding claim, wherein the one or more body sensors measure temperature, blood pressure, pulse, respiratory rate, oxygen saturation, and end tidal C0₂.

131. The method of any preceding claim, wherein at least one body sensor comprises an EEG or MEG sensor.

132. The method of claim 131, wherein the EEG sensor is contained in an EEG array.

133. The method of claim 131 or 132, wherein at least one body sensor comprises a plurality of EEG sensors

134. The method of any of claims 131-133, wherein the EEG sensor is a dry EEG sensor.

135. The method of any preceding claim, wherein at least one body sensor is an

accelerometer.

136. The method of any preceding claim, wherein at least one body sensor is a pulse oximeter.

137. The method of any preceding claim, wherein at least one body sensor measures the

user's blood glucose.

138. The method of claim 137, wherein the user's blood glucose is measured continuously.

139. The method of any preceding claim, wherein the signal data comprises one or more

spirometry measures.

140. The method of claim 139, wherein the spirometry measures are selected from FVC, FEV1, FEV1 , PEF, FET, MVV, and MEF75*.

141. The method of any preceding claim, wherein the signal data comprises location data.

142. The method of any preceding claim, wherein at least a portion of the signal data is

collected wirelessly.

143. The method claim of 142, wherein at least a portion of the signal data is collected via Bluetooth.

144. The method of any preceding claim, wherein at least a portion of the signal data is collected from the user' s smartphone.

145. The method of any preceding claim, wherein the user is mammalian.

146. The method of any preceding claim, wherein the user is human.

147. The method of any preceding claim, wherein the user is a juvenile.

148. The method of any preceding claim, wherein the user is a senior citizen.

149. The method of any preceding claim, wherein the user has been diagnosed as having a condition.

150. The method of any preceding claim, wherein the user has diabetes.

151. The method of any preceding claim, wherein the user has hypertension.

152. The method of any preceding claim, wherein the user has an arrhythmia.

153. The method of claim 123, wherein the generation of the phenotype comprises application of a supervised machine learning algorithm.

154. The method of claim 123 or 153, wherein the generation of the phenotype comprises application of an unsupervised machine learning algorithm.

155. The method of any of claims 123 or 153-154, wherein the generation of the phenotype comprises application of a statistical classifier.

156. The method of any preceding claim, wherein signal data is collected and stored from 10 users or more.

157. The method of any preceding claim, wherein signal data is collected and stored from 100 users or more.

158. The method of any preceding claim, wherein signal data is collected and stored from 1000 users or more.

159. The method of any preceding claim, wherein signal data is collected and stored from 10,000 users or more.

160. The method of any preceding claim, further comprising rendering a notification to an end-user.

161. The method of claim 160, wherein the notification comprises e-mail, SMS, social

message, or phone call.

162. A method of generating a computable phenotype via a medical informatics compute cluster (MICC), the method comprising:

collecting, with a processor, signal data from one or more body sensors; storing, with the processor, data associated with a user;

generating, with a processor, at least one computable phenotype based upon at least a portion of the signal data or associations between the signal data and the data associated with the user.

163. A method of producing a computable phenotype, the method comprising:

obtaining signal data from one or more body sensors;

obtaining non-signal data associated with user;

applying at least one machine learning algorithm to the data set, thereby producing a computable phenotype.

164. A method of detecting the occurrence of an event in a user, the method comprising: collecting, with a processor, signal data from one or more body sensors; storing, with the processor, data associated with a user;

associating, with the processor, patterns within the signal data with patterns within the data associated with the user; and

comparing, with the processor, the associated patterns of the user a computable phenotype to detect the presence or absence of an event in the user.

165. An invention according to the specification.