Professor Maria del Pilar Angeles is a PostDoctoral scientist at the Engineering Research Centre of the National University of Mexico (UNAM). She has a PhD in Data Quality from the Heriot Watt University, and a M.Sc. in Computer Science, regarding Quality in Software Engineering from the UNAM. Her research interests are in information quality for heterogeneous databases and quality of software engineering. She has 15 years of profesional experience as a Technical Support Engineer for Databases at the industry.
Data Quality on Heterogeneous Systems
Users querying a Database System get returned a set of data with no indication of the qualitative value of that data, so the presumptions have to be that data is perfect, original and atomic.
Existing database systems are based on these Presumptions of Perfection, Primary Authorship, and Atomicity. However, we know these presumptions are invalid through a considerable body of existing research. Therefore, this research seeks to challenge these presumptions.
Our research hypothesis was to identify usable quality criteria to meassure and assess data quality of data sources at multiple leveles of granularity, and derived data. These can be enhanced by the use of provenance, and the qualitative measures can be used to derive ranking of data sources based on the specification of context by the users utilising this known criteria, all within heterogeneous multi-database environment. We propose a Data Quality Manager (DQM) composed by a generic Data Quality Reference Model, a Measurement Model, and an Assessment Model.
The Reference Model provides a new general structured classification of
existing data quality properties considering different user perspectives.
The Measurement Model extends existing metrics for the estimation of data quality at database, relation, tuple
and attribute levels of granularity, which is novel.
The assessment of derived data through the use of data provenance is novel.
We identify a new assessment-oriented classification based on the levels of granularity assessed.
The facility to permit users to define query context in terms of quality criteria, quality prorities,
and levels of granularity is also novel.
We implemented the DQM as a proof of concept of our hypothesis and demonstrate that the prototype performs
approapriately according to specific requirements and can provide qualitative information,
which varies according to the context.
Some topics for a good starting...
Heterogeneous Systems
Federated database systems for managing distributed, heterogeneous, and autonomous databases ,
by Amit P. Sheth and James A. Larson,1990
Schema Integration
A comparative Analysis of Methodologies for Database Schema Integration,
by Batini C. et al. 1986
Data Inconsistencies using Data Quality
FusionPlex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources,by Philipp Anokhin, Amihai Motro
Some interesting links...
Data Provenance
Data Provenance Articles ,by Peter Buneman et al.
Data Quality
The MIT Total Data Quality Program,
Quality on query processing
High Quality Information Querying homepage,
Data Quality Criteria Quiz
Quiz for Data Quality Properties,
Data Quality Books
This page was last updated on September 2009 by © Maria del Pilar Angeles