|
|
Andrea Calì, Giuseppe De Giacomo, Diego Calvanese, Domenico Lembo, Maurizio Lenzerini
DescrizioneThe Internet-Based Information System (IBIS) is a tool for the semantic integration of heterogeneous data sources, developed at the same time in the project D2I and in the context of a collaboration between the "Dipartimento di Informatica e Sistemistica" (DIS) of the University of Rome "La Sapienza" and CM Sistemi. IBIS adopts innovative solutions to deal with all aspects of a complex data integration environment, including source wrapping, limitations on source access, and query answering under integrity constraints. With regard to the last two aspects, it is worth underlining that the attention of CM sistemi was originally devoted to the problem of query answering in the presence of limitations in accessing the sources, whereas, within the D2I project, DIS mainly studied the problem of query answering in the presence of integrity constraints on the global schema, as described in the deliverables D1.R5: "Survey on methods for query answering and query rewriting using views" and D1.R11:"Methodology and Tools to Reconcile Data". The relevance of the first problem in data integration applications led us to also investigate it in the context of the D2I project, and to study techniques and algorithms to properly process queries in such a setting. Such algorithms are actually implemented in the IBIS system. IBIS uses a relational global schema to query the data at the sources, and is able to cope with a variety of heterogeneous data sources, including data sources on the Web, relational databases, and legacy sources. Each non-relational source is wrapped to provide a relational view on it. Also, each source is considered sound. The system allows for the specification of integrity constraints on the global schema; in addition, it considers the presence of some forms of constraints on the source schema, in order to perform runtime optimization during data extraction. In particular, key and foreign key constraints can be specified on the global schema, and functional dependencies and full-width inclusion dependencies, i.e., inclusions between entire relations, can be specified on the source schema. The system has been designed to allow for the specification of either GAV or LAV mappings, and for properly processing queries in both the approaches. However, the current implementation of IBIS supports only the definition of GAV mappings, and implements only query processing techniques for this approach. Furthermore, the framework adopted in IBIS enables for dealing with both incomplete and inconsistent data sources. Actually, the techniques developed in D2I to cope with inconsistent data have not yet been implemented in the system. More in the details, query processing in IBIS is separated in three phases:
Query unfolding and execution are the standard steps of query processing in GAV data integration systems, while for the expansion phase IBIS makes use of the algorithms presented in the deliverable D1.R11. The expanded query has to be evaluated over the retrieved global database in order to produce the certain answers to the original query. As the construction of the retrieved global database is computationally costly, the IBIS Expander module does not construct it explicitly. Instead, it unfolds the expanded query and evaluates the unfolded query over the retrieved source database, whose data are extracted by the Extractor module that retrieves from the sources all the tuples that may be used to answer the original query. It is worth noticing that for LAV mappings, phases 2 and 3 of the query answering process in IBIS can be easily replaced by a query rewriting procedure, as the one described in the deliverable D1.R11. Ambiente di sviluppo e di esecuzioneRunnable under Windows 2000 Advanced Server. | ||||||||||||||||||||||