 |
|
 |
|
Active Projects |
|
-
DDI:
The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. DDI 4.0 will have an abstract data model in UML (Unified Modelling Language) together with different implementations in XML Schema, RDF/OWL Ontology, relational database schema and other languages. This next generation DDI, developed within the DDI Moving Forward project, will be easier to use and better able to quickly adapt to changing future needs.
-
CSPA:
The Common Statistical Production Architecture (CSPA) provides a reference architecture for official statistics. It complements and uses pre-existing frameworks (like GSBPM and GSIM) by describing the mechanisms to design, build and share components with well-defined functionality that can be integrated in multiple processes easily. CSPA focuses on relating the strategic directions of the HLG to shared principles, practices and guidelines for defining, developing and deploying Statistical Services in order to produce statistics more efficiently. CSPA will facilitate the sharing and reuse of Statistical Services both across and within statistical organizations. CSPA also provides a starting point for concerted developments of statistical infrastructure and shared investment across statistical organizations. CSPA is sometimes referred to as a "plug and play" architecture. The idea is that replacing Statistical Services should be as easy as pulling out a component and plugging another one in. CSPA is a project of the
High-Level Group for the Modernisation of Statistical Production and Services (HLG).
-
GSIM:
The Generic Statistical Information Model (GSIM) provides a common language to describe information that supports the whole statistical production process, from the identification of user needs through to the dissemination of statistical products. It is designed to bring together statisticians, methodologists and IT specialists to modernize and streamline the production of official statistics. GSIM is a reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information (called “information objects” in GSIM) that are used in the production of official statistics. By defining objects common to all statistical production, regardless of subject matter, GSIM enables statistical organizations to rethink how their business could be more efficiently organized. GSIM is a project of the
High-Level Group for the Modernisation of Statistical Production and Services (HLG)..
|
|
Past Projects |
|
-
Contextual data quality for business
intelligence: Business analytics applications require data quality
assessment at high levels of abstraction, where subjectivity, usefulness,
sense and interpretation play a central role. From this perspective, the
meaning and quality of the data are context dependent. In
our framework, the context is given as a system of integrated data and
metadata of which the data source under quality assessment is a particular
and special component. In addition to the data under assessment, the context
can have an expanded schema, additional data, or even be virtually defined
as a system of integrated views. Clean
answers to queries posed to the data under assessment will be relative to what
is available in the context. More details on our contextual framework can be found
here. This project is part of the
Business Intelligence Network.
-
User-centric, model-driven data
integration: Data warehouses were envisioned to facilitate reporting and analysis by
providing a model for the flow of data from operational systems to decision
support environments. Typically, there is an impedance mismatch between the conceptual,
high-level
view of business intelligence users (and tools) accessing the data
warehouse and the physical representation of the multidimensional data,
often stored in DBMSs. To bridge these two levels of
abstraction we developed the Conceptual Integration Modeling (CIM)
Framework. The CIM tool takes the user's high-level visual specification and
compiles it into a complex set of mappings and views to be used at runtime
by business analytics applications, as described
here. The CIM models and system
architecture are described in
CoRR (arXiv:1009.0255).
CIM can also be integrated with a business layer by providing data to
business processes and key performance indicators via mappings, as described
here. This project is part of the
Business Intelligence Network.
-
Papyrus: A
multinational European project for building a
cross-discipline digital library engine that draws content from one
domain and makes it available to a community of users who belong to a
totally different discipline. Some ontology management research issues in
this context
include modeling concept evolution and semantic updates, and support for
dynamic attributes (attributes for which domains and ranges are specified
declaratively).
-
DescribeX: A Framework for Exploring and Querying XML Web Collections.
My PhD thesis introduced a framework that supports constructing heterogeneous
XML synopses that can be declaratively defined and manipulated by means of
regular expressions on XPath axes. The tool implementing this framework is tailored to data intensive
applications in information integration, XQuery/XPath evaluation, XML
retrieval and Web services. The thesis can be downloaded from
CoRR (arXiv:0807.2972) and the
University of
Toronto Libraries TSpace website.
-
Temporal XML:
A proposal for modeling and querying temporal data in XML. Our
implementation validates temporal XML documents against the temporal
constraints imposed by the model and summarizes metadata by adding the
time dimension to structural path summaries.
-
ToX
(the Toronto XML Server): A
repository of XML data and metadata that provides the key functions in
document management, including registering documents, indexing document
structure (with ToXin),
defining logical views of distributed data sources, and querying document
content and structure. My master's thesis introducing ToXin can be
downloaded from here and the
University of
Toronto Libraries TSpace website.
|
|
|
 |
|