Free and Latest article publishing for websites and ezines!

Study on Key Techniques of Web Mining for Intelligent Information Retrieval

Since WWW came into the world in 1991, it has been developed quickly and is becoming an important information source of human society. With the rapid development and perfection of Internet techniques, WWW will serve as an important medium from which people obtain information. In the past years, it is convenient for people to search for the useful information, but with the huge increment of the amount of information in the Internet, people feel it is more and more difficult to search what he needs. The reason is that the traditional information retrieval technology has not adapted well to the massive information any longer. Thus it is urgent to expect the appearance of a more intellectualized information retrieval technology for the massive information retrieval in Internet.This dissertation researches some key techniques on Web mining for intelligent information retrieval. It mainly focuses on data preprocessing, classification/clustering of Web pages or Web users, conceptual retrieval and personalized services. We propose or improve some Web mining algorithms for intelligent information retrieval. And we also develop an intelligent information retrieval prototype system.Data preprocessing includes information extraction from PDF documents, Chinese word segmentation and Web log preprocessing. For information extraction from PDF documents, we propose a rule extraction algorithm based on format infusion, and an information extraction algorithm based on tree model; For Chinese word segmentation, a method based on gradual enriching dictionary was proposed. Comparing with the single dictionary matching or statistic method respectively, this new method obtains much better result; For Web log preprocessing, the path complement is mainly discussed and a new algorithm is given in this dissertation.In the researches on Web pages' classification, this dissertation discusses various methods of text classification and mainly discuss the k-nearest neighbor (k-NN) that has higher classification accuracy of text classification. To improve the efficiency of k-NN, we propose a training samples reduction method based on the density of class and a gradual classification pattern. By computing each density of class in training set and the average density of the whole training set, some samples in the high-density class can be deleted using the training samples reduction method. The gradual classification pattern reduced the proportion of analyzing the whole document by simulating manual classification intelligently.

Recommended Articles from the IT Science Category:

Most Viewed ScienceArticles in the IT Science Category:

  1. Channel Model Simulation and Spread Spectrum OFDM for HF Communication
  2. Study on the Political Function of Mass Media
  3. Research on Algorithms of GPU-Based 3D Medical Image Processing
  4. Study on Radar Tracking and Discrimination for Ballistic Missiles
  5. Research on QoS Based Multicast Routing Protocols in Mobile Ad Hoc Networks
  6. Study on Robot Joint Based on Reversing Ball Screw Mechanism
  7. Research on Real Time Pulse Train Deinterleaving for Radar Intercept System
  8. Reaearch on Optimization Problem of Manufacturing Process in a Discrete Manufacturing Industry
  9. Study of Parallel FDTD Algorithm and EM Scattering in Layered Half-space
  10. Spatial Three Degree-of-Freedom Parallel Mechanisms: Configurations, Performances and Applications
  11. Channel Estimation in MIMO-OFDM Wireless Communication System
  12. Preparation and Investigation of p-ZnO Film and ZnO Light Emitting Device
  13. The Application and Study of Electrochemical Biosensors Based on Nanomaterials
  14. A Study of Space-Frequency Coding and Signal Detection in MIMO-OFDM Systems
  15. Research on Optical Fiber Sensor Based on Metal Nanoparticles


© 2004-2009 Latest-Science-Articles.com - All Rights Reserved Worldwide.