Professor Maria del Pilar Angeles is a PostDoctoral scientist at the Engineering Research Centre of the National University of Mexico (UNAM). She has a PhD in Data Quality from the Heriot Watt University, and a M.Sc. in Computer Science, regarding Quality in Software Engineering from the UNAM. Her research interests are in information quality for heterogeneous databases and quality of software engineering. She has 15 years of profesional experience as a Technical Support Engineer for Databases at the industry.

Data Quality on Heterogeneous Systems

Users querying a Database System get returned a set of data with no indication of the qualitative value of that data, so the presumptions have to be that data is perfect, original and atomic.

Existing database systems are based on these Presumptions of Perfection, Primary Authorship, and Atomicity. However, we know these presumptions are invalid through a considerable body of existing research. Therefore, this research seeks to challenge these presumptions.

Our research hypothesis was to identify usable quality criteria to meassure and assess data quality of data sources at multiple leveles of granularity, and derived data. These can be enhanced by the use of provenance, and the qualitative measures can be used to derive ranking of data sources based on the specification of context by the users utilising this known criteria, all within heterogeneous multi-database environment. We propose a Data Quality Manager (DQM) composed by a generic Data Quality Reference Model, a Measurement Model, and an Assessment Model.

The Reference Model provides a new general structured classification of existing data quality properties considering different user perspectives. The Measurement Model extends existing metrics for the estimation of data quality at database, relation, tuple and attribute levels of granularity, which is novel. The assessment of derived data through the use of data provenance is novel. We identify a new assessment-oriented classification based on the levels of granularity assessed. The facility to permit users to define query context in terms of quality criteria, quality prorities, and levels of granularity is also novel.

We implemented the DQM as a proof of concept of our hypothesis and demonstrate that the prototype performs approapriately according to specific requirements and can provide qualitative information, which varies according to the context.


Some topics for a good starting...


Some interesting links...


Data Quality Books

  • Data quality by Richard Y. Wang, Mostapha Ziad, Yang W. Lee. Kluwer international series on advances in database systems, 2001 ISBN 0792372158

  • Data quality for the information age by Thomas C. Redman Artech House, 1996 ISBN 0890068836

  • Data quality : the field guide by Thomas C. Redman Digital Press, 2001. ISBN 1555582516

  • Data quality : the accuracy dimension by Jack Olson Elsevier Science, 2003. ISBN 1558608915





This page was last updated on September 2009 by © Maria del Pilar Angeles