Download View Slide Presentation - Association for Pathology Informatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Mobile security wikipedia , lookup

Transcript
caTIES 2.0
APIII 2006
Rebecca Crowley
Kevin Mitchell
Presentation Overview

caTIES – Goals

Tissue Banking Collaboration

Grid Trust Fabric

Concept coding and recoding

Data stewardship, data sharing and honest brokering

Interoperability within a grid community
University of Pittsburgh
caTIES – Goals
The Cancer Text Information Extraction System (caTIES) pilot project will focus on
two important challenges of bioinformatics: (1) information extraction from free text
and (2) access to tissue. Specifically, caTIES has four primary goals:
1.
Extract coded information from free text Surgical Pathology Reports (SPRs) using controlled
terminologies to populate caBIG-compliant data structures.
2.
Provide researchers with the ability to query, browse, and acquire annotated tissue data and
physical material across a network of federated sources.
3.
Provide a collaboration space in which researchers may construct and manage retrospective
tissue distribution protocols.
4.
Pioneer research for distributed text information extraction within the context of caBIG.
caTIES modules will be developed as generalized components available in caBIG, to
encourage reuse by other caBIG projects that require tissue information extraction.
University of Pittsburgh
Tissue Banking Collaboration

Administrator initiation of a Research Protocol – The IT System’s
administrator is responsible for providing support for the electronic capture
of research information. The Administrator works with Researchers, Health
Care Professionals, IRBs and others to establish repositories of electronic
data often categorized by study

Researcher case discovery and order generation – In conducting
tissue sample based retrospective research studies, Researchers examine
free text descriptions of those tissue or delegate the responsibility of
gathering a tissue collection to Honest Brokers.

Honest Broker order facilitation – Work with Tissue Bank personnel to
acquire tissue and tissue related materials. Work with courier system to
deliver orders to researchers. These orders often need to maintain a
degree of atomicity
University of Pittsburgh
Administrator – Create New Study
University of Pittsburgh
Administrator – Assign Organization Role
University of Pittsburgh
Administrator – Add User to Study as Role
University of Pittsburgh
Researcher Perspective
University of Pittsburgh
Researcher - Graphical Search Specification
University of Pittsburgh
Honest Broker – Verifies Physical Material
University of Pittsburgh
Honest Broker – Relays Order Status back to Researcher
University of Pittsburgh
Grid Trust Fabric

Electronic Components (4 Pillars of security)
Identity (DN or public key)
Isolation
Traceability
Authentication (TLS handshake)
Prevent Identity Theft
Authorization (gridmapfile or Globus+OGSA-AuthZ+Services)
Access Control
Resource Control
Audit (logfiles)
Troubleshooting
Forensics
Accounting
University of Pittsburgh
Grid Trust Fabric (cont)

Social Fabric

Narrative DeIdentification defined by levels or kind
of DeIdentification.





Narrative redactors
Concept Coders
Information Extraction to Synoptic Structures
IRB must endorse federated environment
Individuals must maintain a level of integrity
University of Pittsburgh
Current caTIES Security
Summary of caTIES’ current security solution
1.
User Registration with IMS – GUMS
2.
User Registration with caTIES System – CTRM
3.
Authentication and Authorization – GUMS + CTRM
4.
User Access to caTIES Resources – caTIES Client
University of Pittsburgh
User Authentication - GUMS
User Authentication Scenario:

Users log into the caTIES client
with their GUMS username and
password.

The caTIES client securely
connects to GUMS with the user’s
GUMS X.509 certificate and
retrieves the GUMS user proxy.

The caTIES client uses the user
proxy to securely connect to the
EVS service exposed by caTIES.
This is essentially a connectivity
check, and any caTIES secured
service could be used.
University of Pittsburgh
User Authentication
User Authorization - CTRM

CTRM contains user authorization information. It contains information about how
users are related to organizations. It classifies these user-organization
relationships by the following roles - Researcher, Institution Honest Broker or
Local Administrator.

The CTRM service is responsible for issuing queries to the CTRM. When a user
is authenticated, the user proxy’s distinguished name is sent as a query
parameter to the CTRM service by the caTIES Client.

CTRM Services in turn fetches the user’s role from CTRM and sends the user’s
role information to the client.
University of Pittsburgh
De-Identification


caTIES De-Identification service scrubs pathology report, creates
de-identified identifiers, loads ‘De-Identified’ caTIES datastore
caTIES de-identification service wraps the de-ID™ software;
easy to switch




Safe-Harbor method removes HIPAA mandated identifiers
Creates tokens for names and preserves temporal relationships
De-ID will work with adopters as each site comes on-line
Currently evaluating Harvard Scrubber open-source option
University of Pittsburgh
Concept Coding and Recoding

Changing dimensions necessitate recoding




Vocabulary revisions
Algorithmic enhancements and bug fixes
De-Identification redactor errors
What is the necessary level of auditing for
recoding?
University of Pittsburgh
Tokenization
Sectioning
Concept Mapping with MMTx
Negation and Semantic Type Categorization
RegEx Finding Attribute Value
Concept Coded Structured Data
University of Pittsburgh
Data stewardship, data sharing
and honest brokering
CaTIES maintains data in three databases that are

schematically equivalent but differ in their deployment
location, security configuration, and the data being held.
Each Role has limited access to the set of data sources



public datastore –
(Researcher)
private datastore –
(Honest Broker)
central tissue resource manager datastore
(Administrator, Researcher, Honest Broker)
University of Pittsburgh
caTIES Model
Three points for Data Access:
University of Pittsburgh
Interoperability within a grid
community



MDA - caBIG uses Model Driven Architecture
to automatically generate Object Relational
Mapping (ORM) middleware.
Following caBIG’s semi-automated guidelines
for application development guarantees grid
compliant data services.
caBIG annotates data and service interfaces
with a conceptual ontology. This provides an
environment for intelligent discovery and
automatic data transformation.
University of Pittsburgh
caTIES Development Process
1.
Design UML Model in Enterprise Architect
2.
Metadata annotation using NCIT (public model only)
3.
CDEs are registered in the caDSR in the ‘caBIG’ context
4.
5.
6.
Run Model through caCORE SDK to generate API and caTIES Silver
Application
Implement API generated by the SDK for caTIES’ Client’s functions
Utilize caGrid SDK to generate Gold front-end to the caTIES Silver
Application
University of Pittsburgh
cd CaTIES Reference Model
domain::Patient
#
#
#
#
#
#
#
domain::Application
id: java.lang.Long
uuid: java.lang.String
race: java.lang.String
ethnicity: java.lang.String
birthDate: java.util.Date
gender: java.lang.String
conceptCodeSet: java.lang.String
#patient
#pathologyReportCollection
1
#
#
#
#
id: java.lang.Long
uuid: java.lang.String
version: java.lang.String
name: java.lang.String
#application
#patient
1
0..1
0..*
domain::PathologyReport
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
id: java.lang.Long
uuid: java.lang.String
originalId: java.lang.String
collectionDateTime: java.util.Date
patientAgeAtCollection: java.lang.Integer
patientAgeAtCollectionMvr: java.lang.String #pathologyReport
documentText: java.lang.String
1
documentXml: java.lang.String
documentBinary: java.lang.String
conceptCodeSet: java.lang.String
classifiedConceptCodeSet: java.lang.String
isFlaggedForReview: java.lang.Boolean
isTissueAvailable: java.lang.Boolean
isQuarantined: java.lang.Boolean
reviewComment: java.lang.String
honestBrokerComment: java.lang.String
#pathologyReport
domain::ConceptClassification
#
#
#
id: java.lang.Long
uuid: java.lang.String
name: java.lang.String
#conceptReferentCollection
#conceptClassification #conceptReferentCollection
1
domain::Concept
#
#
#
#
#
#
id: java.lang.Long
uuid: java.lang.String
cui: java.lang.String
tui: java.lang.String
name: java.lang.String
semanticType: java.lang.String
#concept
1
0..* #
#
#
#
#conceptReferentCollection #
#
1..* #
#executionCollection
domain::Execution
#executionCollection #
0..* #
#
#
id: java.lang.Long
uuid: java.lang.String
startTime: java.util.Date
endTime: java.util.Date
0..1
0..*
#conceptReferentCollection
0..*
domain::ConceptReferent
id: java.lang.Long
uuid: java.lang.String
documentFragment: java.lang.String
startOffset: java.lang.Long
endOffset: java.lang.Long
isModifier: java.lang.Boolean
isNegated: java.lang.Boolean
University of Pittsburgh
0..*
caTIES Phase 2 Grid-Enabled [Public] Model
Development Process
Summary
University of Pittsburgh
Access to caTIES Public
Resources
Dual Access to caTIES
1. Via caTIES Client
2. Via caGrid Gold API.
The caTIES Gold
Service provides
programmatic access
to caTIES’ resources.
The caGrid Browser
implements this API to
query resources.
University of Pittsburgh
Sample Query
Silver Format
DetachedCriteria p = DetachedCriteria.forClass(PathologyReport.class);
p.add(Restrictions.like(“uuid","e44ddc0f-c589-11da-bbee-5103a71c2a47"));
List resultList = appService.query(p,PathologyReport.class.getName()) ;
for(int i=0;i<resultList.size();i++){
PathologyReport pr = (PathologyReport)reslutSet.get(i);
pr.getDocumentText();
}
Gold Format
<caBIGXMLQuery name="MyQueryTest3">
<Target name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport">
<Objects name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport">
<Property name="uuid" predicate="equal" value="e44ddc0f-c589-11da-bbee-5103a71c2a47"/>
</Objects>
</Target>
</caBIGXMLQuery>
University of Pittsburgh
Query run by caTIES Client
University of Pittsburgh
Query run through caGrid
Browser
University of Pittsburgh
Query run through caGrid
Browser
University of Pittsburgh
Query run through caGrid
Browser
University of Pittsburgh
Equivalent Results

Both methods return the same Pathology Report
caGRID Browser
caTIES Client
University of Pittsburgh
CaDSR CDEs CAP Protocols
University of Pittsburgh
Shallow Structure Derivation based on conceptual
matching.
University of Pittsburgh
Deep Structure Inference Based on Discourse Reasoning
University of Pittsburgh