Logo - springer
Slogan - springer

Popular Science | Filtering the Web to Feed Data Warehouses

Filtering the Web to Feed Data Warehouses

Abramowicz, Witold, Kalczynski, Pawel J., Wecel, Krzysztof

2002, XII, 267 p.

Available Formats:

Springer eBooks may be purchased by end-customers only and are sold without copy protection (DRM free). Instead, all eBooks include personalized watermarks. This means you can read the Springer eBooks across numerous devices such as Laptops, eReaders, and tablets.

You can pay for Springer eBooks with Visa, Mastercard, American Express or Paypal.

After the purchase you can directly download the eBook file or read it online in our Springer eBook Reader. Furthermore your eBook will be stored in your MySpringer account. So you can always re-download your eBooks.


(net) price for USA

ISBN 978-1-4471-0137-6

digitally watermarked, no DRM

Included Format: PDF

download immediately after purchase

learn more about Springer eBooks

add to marked items


Hardcover version

You can pay for Springer Books with Visa, Mastercard, American Express or Paypal.

Standard shipping is free of charge for individual customers.


(net) price for USA

ISBN 978-1-85233-579-3

free shipping for individuals worldwide

usually dispatched within 3 to 5 business days

add to marked items


Softcover (also known as softback) version.

You can pay for Springer Books with Visa, Mastercard, American Express or Paypal.

Standard shipping is free of charge for individual customers.


(net) price for USA

ISBN 978-1-4471-1107-8

free shipping for individuals worldwide

usually dispatched within 3 to 5 business days

add to marked items

  • About this book

Information is a key factor in business today, and data warehousing has become a major activity in the development and management of information systems to support the proper flow of information. Unfortunately, the majority of information systems are based on structured information stored in organizational databases, which means that the company is isolated from the business environment by concentrating on their internal data sources only. It is therefore vital that organizations take advantage of external business information, which can be retrieved from Internet services and mechanically organized within the existing information structures. Such a continuously extending integrated collection of documents and data could facilitate decision-making processes in the organization. Filtering the Web to Feed Data Warehouses discusses areas such as:
- how to use data warehouse for filtering Web content
- how to retrieve relevant information from diverse sources on the Web
- how to handle the time aspect
- how to mechanically establish links among data warehouse structures and documents filtered from external sources
- how to use collected information to increase corporate knowledge
and gives a comprehensive example, illustrating the idea of supplying data warehouses with relevant information filtered from the Web.

Content Level » Research

Keywords » Internet - Web - data warehouse - database - databases - information retrieval - information system - knowledge management - multimedia - organization - warehousing

Related subjects » Database Management & Information Retrieval - Information Systems and Applications - Popular Science - Security and Cryptology

Table of contents 

1 Introduction.- 1.1 Information Systems.- 1.2 Information Filtering Systems.- 1.3 Database Systems.- 1.3.1 Transactional Systems.- 1.3.2 Analytical Systems.- 1.4 Organization of this Book.- 2 Data Warehouse: Corporate Knowledge Repository.- 2.1 Introduction.- 2.2 Data Warehouse Definition and Features.- 2.2.1 Definition.- 2.2.2 Metadata.- 2.2.3 Characteristic Features of Data in the Data Warehouse.- 2.3 Data Warehouse System.- 2.3.1 Architecture of the Data Warehouse System.- 2.3.2 Metadata Structures.- 2.3.3 Data Warehouse Products.- 2.4 Deploying Data Warehouse in the Organization.- 2.4.1 Data Warehouse Life Cycle.- 2.4.2 Analysis and Research.- 2.4.3 Identifying Architecture and Demands.- 2.4.4 Design and Development.- 2.4.5 Implementation and On-going Administration.- 2.5 Knowledge Management in Data Warehouses.- 2.5.1 Knowledge Management.- 2.5.2 Knowledge in Terms of Data Warehousing.- 2.5.3 Knowledge Discovery in Data Warehouses.- 2.5.4 Significance of Business Metadata.- 2.6 Evolution of the Data Warehouse.- 2.6.1 Criticism of the Traditional Data Warehouse.- 2.6.2 Virtual Data Warehouse.- 2.6.3 Information Data Superstore.- 2.6.4 Exploration Warehouse.- 2.6.5 Internet/Intranet Data Warehouse.- 2.6.6 Web Farming.- 2.6.7 Enterprise Information Portals.- 2.7 Chapter Summary.- 2.8 References.- 3 Knowledge Representation Standards.- 3.1 Introduction.- 3.1.1 Basic Concepts.- 3.1.2 Metadata Representation.- 3.1.3 Metadata Interoperability.- 3.1.4 Theory of Metadata.- 3.2 Markup Languages.- 3.2.1 Background.- 3.2.2 XML Document.- 3.2.3 Document Presentation.- 3.2.4 Document Linking.- 3.2.5 Programming Interfaces.- 3.3 Dublin Core.- 3.3.1 Dublin Core Metadata Elements.- 3.3.2 Dublin Core in HTML.- 3.4 Warwick Framework.- 3.5 Meta Content Framework.- 3.5.1 Origins of MCF.- 3.5.2 Conceptual Building Blocks of MCF.- 3.5.3 XML Syntax.- 3.5.4 Directed Labelled Graph Formalism.- 3.6 Resource Description Framework.- 3.6.1 Background.- 3.6.2 Formal RDF Data Model.- 3.6.3 The RDF Syntax.- 3.6.4 RDF Schema.- 3.7 Common Warehouse Metamodel.- 3.7.1 History of OMG Projects.- 3.7.2 Objectives of the CWM.- 3.7.3 Metadata Architecture.- 3.7.4 CWM Elements.- 3.7.5 Conclusions for CWM.- 3.8 Chapter Summary.- 3.9 References.- 4 Information Filtering And Retrieval From Web Sources.- 4.1 Introduction.- 4.1.1 Document, Information, Knowledge.- 4.1.2 Indexing.- 4.1.3 Hypertext.- 4.1.4 Information on the Web.- 4.1.5 Constraints of this Book.- 4.2 Information Retrieval Systems.- 4.2.1 Definitions.- 4.2.2 Information Retrieval System Architectures and Models.- 4.2.3 Sample Information Retrieval Systems.- 4.3 Information Filtering Systems.- 4.3.1 Filtering Versus Retrieval.- 4.3.2 Information Filtering Models and Architectures.- 4.3.3 Sample Filtering Systems.- 4.4 Internet Sources of Business Information.- 4.4.1 Business View on Internet Information Sources.- 4.4.2 General Characteristics of Business Information Sources.- 4.4.3 Information Overflow.- 4.5 Filtering the Web to Feed Business Information Systems.- 4.5.1 Problems with Web Filtering and Retrieval.- 4.5.2 New Information Filtering System Model Proposal.- 4.5.3 Transparent Filtering and Retrieval.- 4.6 Chapter Summary.- 4.7 References.- 5 Enhanced Data Warehouse.- 5.1 Introduction.- 5.2 Justification of the Need for Integration.- 5.2.1 Value of Knowledge.- 5.2.2 Attention Economy.- 5.2.3 Content Management and Lifecycle of Content.- 5.2.4 Example of Integration: Metadata and Data.- 5.3 Preliminary Vision of the System.- 5.3.1 Analytical Point of View.- 5.3.2 Trends.- 5.3.3 Goals of the System.- 5.3.4 User Requirements Towards the Information Retrieval Systems.- 5.4 Software Agents.- 5.4.1 Introduction.- 5.4.2 Intelligent Agents or Just Agents?.- 5.4.3 Software Agents or Just Agents?.- 5.4.4 Possible Applications of Agents.- 5.4.5 Definitions of Software Agents.- 5.4.6 Agent Properties.- 5.4.7 Classifications of Software Agents.- 5.4.8 Agent-based Systems and Multi-agent Systems.- 5.5 Proposed Solution: enhanced Data Warehouse.- 5.5.1 Introduction.- 5.5.2 Overview of the eDW System.- 5.5.3 Assumptions for the eDW System.- 5.5.4 Components.- 5.5.5 Agent-based System Architecture.- 5.5.6 Logging Server.- 5.5.7 Profiling Server.- 5.5.8 Source Agent Server.- 5.5.9 Document Server.- 5.5.10 Properties of eDW Agents.- 5.6 Formal Model of eDW.- 5.6.1 CSL: The Extension of the Organizational Metamodel.- 5.6.2 Time Consistency among Documents and Warehouse Data.- 5.6.3 DWL: The Intranet Collection of Relevant Documents for the Data Warehouse.- 5.6.4 enhanced Data Warehouse Report: The Final Product of the eDW System.- 5.6.5 Formal Definitions of eDW Agents.- 5.7 System Implementation.- 5.7.1 Programming Environment.- 5.7.2 System Control Centre.- 5.7.3 Communication.- 5.7.4 Status.- 5.7.5 Configuration File.- 5.7.6 Logging Server.- 5.8 Chapter Summary.- 5.9 References.- 6 Profiling.- 6.1 Introduction.- 6.2 Personalization and Data Warehouse Profiles.- 6.2.1 Classification of Information.- 6.2.2 Personalization.- 6.2.3 Personalization in Data Warehouses and its Aspects.- 6.2.4 Overview of Profile Creation.- 6.2.5 Data Warehouse Profiles.- 6.3 Algorithms Specification.- 6.3.1 Algorithm for Creating Warehouse Profiles.- 6.3.2 Computational Complexity.- 6.3.3 Thesauri.- 6.4 Profiling Server.- 6.4.1 Basic Assumptions.- 6.4.2 Profiling Agent.- 6.4.3 User Interface in Profiling Application.- 6.4.4 Sample Results.- 6.5 Chapter Summary.- 6.6 References.- 7 Source Exploitation.- 7.1 Introduction.- 7.2 Sample Business Content Providers.- 7.2.1 Sample Business Gateways.- 7.2.2 Sample Business Search Engines.- 7.2.3 Sample Business Portals and Vortals.- 7.2.4 Sample Business Online Databases.- 7.3 Information Ants to Filter Information from Internet Sources.- 7.3.1 Introduction.- 7.3.2 Ant Colony Optimization.- 7.3.3 Environment for Information Ants.- 7.3.4 Information Ants to Filter Information from the Web.- 7.3.5 Experiment with Ant-like Navigation.- 7.3.6 Advantages and Drawbacks of the Proposed Solution.- 7.4 Indexing Parser.- 7.4.1 Parsing Web Documents.- 7.4.2 Indexing Web Documents.- 7.5 Transparent Filtering in the eDW System.- 7.5.1 Building Warehouse Profiles.- 7.5.2 Registering Sources.- 7.5.3 Source Exploration.- 7.5.4 Source Penetration.- 7.6 Chapter Summary.- 7.7 References.- 8 Building Data Warehouse Library.- 8.1 Introduction.- 8.1.1 Characteristics of WWW: A Dream of Non-volatile Internet.- 8.1.2 Digital Libraries.- 8.2 Time Indexing.- 8.2.1 Finite State Automaton.- 8.2.2 Time Indexer.- 8.2.3 Trapezoidal Time Indices.- 8.2.4 Simple Overlap Measure for Trapezoidal Time Indices.- 8.3 Experiment with Time Indexing.- 8.3.1 Experiment with Time Indexing Real-World Documents.- 8.3.2 Conclusions for the eDW System.- 8.4 Future Trends: Multimedia Indexing.- 8.4.1 Introduction.- 8.4.2 Filtering Web Documents.- 8.4.3 Neural Nets for Image Categorization.- 8.4.4 The Proposed Solution ¡ª Perceptron Categorization Tree.- 8.4.5 Advantages and Drawbacks.- 8.4.6 Application for eDW.- 8.5 Chapter Summary.- 8.6 References.- 9 Context Queries And Enhanced Reports.- 9.1 Introduction.- 9.2 Context Queries.- 9.2.1 Definition of Context.- 9.2.2 Justification of Transparent Retrieval.- 9.2.3 Elements of Context.- 9.2.4 Conceptual Similarity Measure.- 9.2.5 Simple Temporal Similarity Measure.- 9.2.6 Parameterized Temporal Similarity Measure.- 9.2.7 Pertinence.- 9.3 enhanced Report.- 9.3.1 User Interface in Accessing the Information.- 9.3.2 How enhanced Report is Created.- 9.4 Reporting Application.- 9.4.1 Basic Assumptions.- 9.4.2 Description of the Algorithms.- 9.4.3 Context Query Agent.- 9.4.4 Computational Complexity.- 9.4.5 User Interface in Reporting Application.- 9.4.6 Results.- 9.5 Histograms: The Helpful Tool for Analysis.- 9.5.1 Non-parameterized Histogram.- 9.5.2 Past-oriented Analysis.- 9.5.3 Future-oriented Analysis.- 9.5.4 General Documents.- 9.5.5 Detailed Documents.- 9.5.6 Compact and Dispersed Histograms.- 9.6 Chapter Summary.- 9.7 References.- 10 Conclusions.- 10.1 Concluding Remarks.- 10.2 Improvements.- 10.3 Open Issues and Future Work.

Popular Content within this publication 



Read this Book on Springerlink

Services for this book

New Book Alert

Get alerted on new Springer publications in the subject area of Popular Computer Science.