您的位置:首页 > 百科 > 正文

现代信息检索(英文第2版)

《现代信息检索(英文第2版)》是2011年机械工业出版社出版的图书,作者是(西班牙)RicardoBaeza-Yates,(巴西)BerthierRibeiro-Neto。

  • 中文名 现代信息检索(英文第2版)
  • 上架时间 2011年3月7日
  • 出版社 机械工业出版社
  • ISBN 9787111331742
  • 出版日期 2011年3月

内容简介

  《现代信息检索(英文方写食农设谁倍与东注沉版.第2版)》详细介绍带斗了信息检索的所有主要概念和来自技术,以及有关信息检索方面的所有新变化,使读者既可以对现代信息命顶因图条许福看副东检索有一个全面的了解,又可以获取现代信息检索所有关键主题的详细知识。《现代信息检索(英文版.第2版)》的主要内容由信息检索领域的代表人居温矿甚太反分向袁全物baeza-yates和ribeiro-neto编著;对于那些希望深入研究关键领域的读者,《现代信息检索(英文版.第2版)》中还提供了由其他主要研究人员编写的关于特殊主题的发展现状。

  与上一版相比,《现代信息检索(英文版.第2版)》在内容和结构上都有大量调整、更新和充实,其中新增内容在60%到70%左右。具体更新情况如下:

  ·新增了文本分类、网络信息爬取、结构化文本检索和企业搜索等章材弱称节,以及关于开源搜索的一个附录。

  ·全面改写了用户界面、多媒体检索和数字图书馆等内容。

  ·拓品袁尼穿快松丝系聚体展了一些章节,介绍了信息检索方面的新的重要进展,古们稳略如语言模型、新的评价方法、查询的特点、基于聚类和分布式信息检索等。

目录

  1 introduct360百科ion 1

  1.1 information retrieval 1

  1.1.1 early developments 1

  1.1.2 inf校怀走其通ormation retrie尼上控福val in libraries and digital libraries 3

  1.1.3 ir at the center o了吃因构f the stage 3

  1.2 the ir problem 3

  1.2.1 the user's task 4

  1.2.2 information versus data retrie重轮身科州帮斗想val 5

  1.3 the ir 那表点神system 5

 不跳想收等著沿式氢 1.3.1 software architecture of the ir system 5

  布混跟温七免旧粉右1.3.2 the retrieval and ranki而额剂求后树确吧题ng processes 7

  1.4 theweb 8

  1.4.1 a brief history 8

  1.4.2 the e-publishing era 9

  1未王模句呀叫益地危善品.4.3 how the web changed search 10

  1.4.4 practical issues on the web 12

  1.5 organ扩双请费ization of the book 12

  1.5.1 focus of the book 12

  1.5.2 book contents 13

  1.6 the book web site: a teaching resource 16

  .1.7 bibliographic discussion 17

  2 user interfaces for search 21

  by marti hearst

  2.1 introduction 21

  2.2 how people search 21

  preface to the second edition v

  preface to the first edition vii

  authors' acknowledgements to the second edition viii

  authors' acknowledgements to the first edition x

  publishers' acknowledgements xii

  contents xvii

  2.2.1 information lookup versus exploratory search 22

  2.2.2 classic versus dynamic model of information seeking 23

  2.2.3 navigation versus search 24

  2.2.4 observations of the search process 24

  2.3 search interfaces today 25

  2.3.1 getting started 25

  2.3.2 query specification 26

  2.3.3 query specification interfaces 27

  2.3.4 retrieval results display 29

  2.3.5 query reformulation 32

  2.3.6 organizing search results 35

  2.4 visualization in search interfaces 40

  2.4.1 visualizing boolean syntax 42

  2.4.2 visualizing query terms within retrieval results 43

  2.4.3 visualizing relationships among words and documents 47

  2.4.4 visualization for text mining 49

  2.5 design and evaluation of search interfaces 50

  2.6 trends and research issues 54

  2.7 bibliographic discussion 54

  3 modeling 57

  3.1 ir models 57

  3.1.1 modeling and ranking 57

  3.1.2 characterization of an ir model 58

  3.1.3 a taxonomy of ir models 59

  3.2 classic information retrieval 61

  3.2.1 basic concepts 61

  3.2.2 the boolean model 64

  3.2.3 term weighting 66

  3.2.4 tf-idf weights 68

  3.2.5 document length normalization 75

  3.2.6 the vector model 77

  3.2.7 the probabilistic model 79

  3.2.8 brief comparison of classic models 86

  3.3 alternative set theoretic models 87

  3.3.1 set-based model 87

  3.3.2 extended boolean model 92

  3.3.3 fuzzy set model 95

  3.4 alternative algebraic models 98

  3.4.1 generalized vector space model 98

  3.4.2 latent semantic indexing model 101

  3.4.3 neural network model 102

  3.5 alternative probabilistic models 104

  3.5.1 bm25 104

  3.5.2 language models 107

  3.5.3 divergence from randomness 113

  3.5.4 bayesian network models 116

  3.6 other models 124

  3.6.1 the hypertext model 124

  3.6.2 web based models 125

  3.6.3 structured text retrieval 126

  3.6.4 multimedia retrieval 126

  3.6.5 enterprise and vertical search 126

  3.7 trends and research issues 127

  3.8 bibliographic discussion 128

  4 retrieval evaluation 131

  4.1 introduction 131

  4.2 the cranfield paradigm 132

  4.2.1 a brief history 132

  4.2.2 reference collections 134

  4.3 retrieval metrics 134

  4.3.1 precision and recall 135

  4.3.2 single value summaries: p@n, map, mrr, f 139

  4.3.3 user-oriented measures 144

  4.3.4 dcg: discounted cumulated gain 145

  4.3.5 bpref: binary preferences 150

  4.3.6 rank correlation metrics 153

  4.4 reference collections 158

  4.4.1 the trec collections 159

  4.4.2 other reference collections 166

  4.4.3 other small test collections 167

  4.5 user-based evaluation 168

  4.5.1 human experimentation in the lab 168

  4.5.2 side-by-side panels 168

  4.5.3 a/b testing 169

  4.5.4 crowdsourcing 170

  4.5.5 evaluation using clickthrough data 171

  4.6 practical caveats 173

  4.7 trends and research issues 174

  4.8 bibliographic discussion 174

  5 relevance feedback and query expansion 177

  5.1 introduction 177

  5.2 a framework for feedback methods 178

  5.3 explicit relevance feedback 180

  5.3.1 relevance feedback for the vector model: rocchio method 181

  5.3.2 relevance feedback for the probabilistic model 183

  5.3.3 evaluation of relevance feedback 184

  5.4 explicit feedback through clicks 185

  5.4.1 eye tracking and relevance judgements 185

  5.4.2 user behavior 186

  5.4.3 clicks as a metric of user preferences 187

  5.5 implicit feedback through local analysis 190

  5.5.1 implicit feedback through local clustering 190

  5.5.2 implicit feedback through local context analysis 193

  xviii contents

  5.6 implicit feedback through global analysis 195

  5.6.1 query expansion based on a similarity thesaurus 195

  5.6.2 query expansion based on a statistical thesaurus 198

  5.7 trends and research issues 200

  5.8 bibliographic discussion 200

  6 documents: languages & properties 203

  with gonzalo navarro and nivio ziviani

  6.1 introduction 203

  6.2 metadata 205

  6.3 document formats 206

  6.3.1 text 206

  6.3.2 multimedia 207

  6.3.3 graphics and virtual reality 208

  6.4 markup languages 208

  6.4.1 sgml 209

  6.4.2 html 211

  6.4.3 xml 214

  6.4.4 rdf: resource description framework 216

  6.4.5 hytime 217

  6.5 text properties 218

  6.5.1 information theory 218

  6.5.2 modeling natural language 219

  6.5.3 text similarity 222

  6.6 document preprocessing 223

  6.6.1 lexical analysis of the text 224

  6.6.2 elimination of stopwords 226

  6.6.3 stemming 226

  6.6.4 keyword selection 227

  6.6.5 thesauri 228

  6.7 organizing documents 231

  6.7.1 taxonomies 231

  6.7.2 folksonomies 232

  6.8 text compression 233

  6.8.1 basic concepts 234

  6.8.2 statistical methods 234

  6.8.3 statistical methods: modeling 235

  6.8.4 statistical methods: coding 238

  6.8.5 dictionary methods 245

  6.8.6 preprocessing for compression 246

  6.8.7 comparing text compression techniques 248

  6.8.8 structured text compression 249

  6.9 trends and research issues 250

  6.10 bibliographical discussion 253

  7 queries: languages & properties 255

  with gonzalo navarro

  7.1 query languages 255

  contents xix

  7.1.1 keyword-based querying 256

  7.1.2 beyond keywords 259

  7.1.3 structural queries 262

  7.1.4 query protocols 265

  7.2 query properties 267

  7.2.1 characterizing web queries 267

  7.2.2 user search behavior 269

  7.2.3 query intent 270

  7.2.4 query topic 272

  7.2.5 query sessions and missions 273

  7.2.6 query difficulty 274

  7.3 trends and research issues 278

  7.4 bibliographical discussion 279

  8 text classification 281

  with marcos gon?calves

  8.1 introduction 281

  8.2 a characterization of text classification 282

  8.2.1 machine learning 282

  8.2.2 the text classification problem 283

  8.2.3 text classification algorithms 284

  8.3 unsupervised algorithms 286

  8.3.1 clustering 286

  8.3.2 naive text classification 290

  8.4 supervised algorithms 291

  8.4.1 decision trees 294

  8.4.2 the k-nn classifier 299

  8.4.3 the rocchio classifier 300

  8.4.4 probabilistic naive bayes document classification 303

  8.4.5 the svm classifier 306

  8.4.6 ensemble classifiers 316

  8.4.7 final remarks on supervised algorithms 319

  8.5 feature selection or dimensionality reduction 320

  8.5.1 term–class incidence table 321

  8.5.2 term document frequency 322

  8.5.3 tf-idf weights 322

  8.5.4 mutual information 323

  8.5.5 information gain 323

  8.5.6 chi square 324

  8.5.7 impact of feature selection 325

  8.6 evaluation metrics 325

  8.6.1 contingency table 325

  8.6.2 accuracy and error 326

  8.6.3 precision and recall 327

  8.6.4 f-measure and f1 327

  8.6.5 cross-validation 329

  8.6.6 standard collections 329

  8.7 organizing the classes – building taxonomies 330

  xx contents

  8.8 trends and research issues 333

  8.9 bibliographic discussion 334

  9 indexing and searching 337

  with gonzalo navarro

  9.1 introduction 337

  9.2 inverted indexes 340

  9.2.1 basic concepts 340

  9.2.2 full inverted indexes 341

  9.2.3 searching 345

  9.2.4 ranking 348

  9.2.5 construction 351

  9.2.6 compressed inverted indexes 354

  9.2.7 structural queries 357

  9.3 signature files 357

  9.4 suffix trees and suffix arrays 360

  9.4.1 structure: tries and suffix trees 361

  9.4.2 searching for simple strings 362

  9.4.3 searching for complex patterns 363

  9.4.4 construction 365

  9.4.5 compressed suffix arrays 367

  9.5 sequential searching 372

  9.5.1 simple strings: horspool 373

  9.5.2 complex patterns: automata and bit-parallelism 375

  9.5.3 faster bit-parallel algorithms 379

  9.5.4 regular expressions 382

  9.5.5 multiple patterns 384

  9.5.6 approximate searching 385

  9.5.7 searching compressed text 389

  9.6 multi-dimensional indexing 391

  9.7 trends and research issues 393

  9.8 bibliographic discussion 394

  10 parallel and distributed ir 399

  with eric brown

  10.1 introduction 399

  10.2 a taxonomy of distributed ir systems 402

  10.3 data partitioning 404

  10.3.1 collection partitioning 405

  10.3.2 collection selection 407

  10.3.3 inverted index partitioning 409

  10.3.4 partitioning other indexes 413

  10.4 parallel ir 414

  10.4.1 introduction 414

  10.4.2 parallel ir on mimd architectures 416

  10.4.3 parallel ir on simd architectures 418

  10.5 cluster-based ir 423

  10.6 distributed ir 424

  contents xxi

  10.6.1 introduction 424

  10.6.2 indexing 428

  10.6.3 query processing 431

  10.6.4 web issues 437

  10.7 federated search 438

  10.8 retrieval in peer-to-peer networks 440

  10.9 trends and research issues 444

  10.10bibliographic discussion 445

  11 web retrieval 447

  with yoelle maarek

  11.1 introduction 447

  11.2 a challenging problem 449

  11.3 the web 451

  11.3.1 characteristics 451

  11.3.2 structure of the web graph 452

  11.3.3 modeling the web 454

  11.3.4 link analysis 456

  11.4 search engine architectures 458

  11.4.1 basic architecture 458

  11.4.2 cluster-based architecture 459

  11.4.3 caching 462

  11.4.4 multiple indexes 464

  11.4.5 distributed architectures 466

  11.5 search engine ranking 468

  11.5.1 ranking signals 469

  11.5.2 link-based ranking 470

  11.5.3 simple ranking functions 473

  11.5.4 learning to rank 473

  11.5.5 learning the ranking function 474

  11.5.6 quality evaluation 475

  11.5.7 web spam 476

  11.6 managing web data 477

  11.6.1 assigning identifiers to documents 477

  11.6.2 metadata 478

  11.6.3 compressing the web graph 478

  11.6.4 handling duplicated data 479

  11.7 search engine user interaction 480

  11.7.1 the search rectangle paradigm 481

  11.7.2 the search engine result page 488

  11.7.3 educating the user 497

  11.8 browsing 498

  11.8.1 flat browsing 499

  11.8.2 structure guided browsing and web directories 499

  11.9 beyond browsing 501

  11.9.1 hypertext and the web 501

  11.9.2 combining searching with browsing 501

  11.9.3 web query languages 503

  xxii contents

  11.9.4 dynamic search 503

  11.10related problems 504

  11.10.1 computational advertising 504

  11.10.2web mining 506

  11.10.3 metasearch 508

  11.11trends and research issues 509

  11.11.1 beyond static text data 509

  11.11.2 current challenges 511

  11.12bibliographical discussion 513

  12 web crawling 515

  with carlos castillo

  12.1 introduction 515

  12.2 applications of a web crawler 517

  12.2.1 general web search 517

  12.2.2 topical crawling 518

  12.2.3 web characterization 518

  12.2.4 mirroring 518

  12.2.5 web site analysis 519

  12.3 a taxonomy of crawlers 519

  12.3.1 types of web pages 520

  12.4 architecture and implementation 521

  12.4.1 crawler architecture 521

  12.4.2 practical issues 523

  12.4.3 parallel crawling 526

  12.5 scheduling algorithms 527

  12.5.1 selection policy 528

  12.5.2 revisit policy 530

  12.5.3 politeness policy 535

  12.5.4 combining policies 538

  12.6 evaluation 539

  12.6.1 evaluating network usage 539

  12.6.2 evaluating long-term scheduling 540

  12.7 trends and research issues 541

  12.7.1 crawling the "hidden" web 541

  12.7.2 crawling with the help of web sites 542

  12.7.3 distributed crawling 543

  12.8 bibliographic discussion 543

  13 structured text retrieval 545

  with mounia lalmas

  13.1 introduction 545

  13.2 structuring power 546

  13.2.1 explicit vs. implicit structure 546

  13.2.2 static vs. dynamic structure 547

  13.2.3 single hierarchy vs. multiple hierarchies 548

  13.3 early text retrieval models 549

  13.3.1 model based on non-overlapping lists 549

  contents xxiii

  13.3.2 model based on proximal nodes 550

  13.3.3 ranking structured text results 551

  13.4 xml retrieval 551

  13.4.1 challenges in xml retrieval 551

  13.4.2 indexing strategies 553

  13.4.3 ranking strategies 554

  13.4.4 removing overlaps 565

  13.5 xml retrieval evaluation 566

  13.5.1 document collections 566

  13.5.2 topics 567

  13.5.3 retrieval tasks 568

  13.5.4 relevance 569

  13.5.5 measures 571

  13.6 query languages 573

  13.6.1 characteristics 574

  13.6.2 classification of xml query languages 575

  13.6.3 examples of xml query languages 577

  13.7 trends and research issues 582

  13.8 bibliographic discussion 585

  14 multimedia information retrieval 587

  by dulce poncele′on and malcolm slaney

  14.1 introduction 587

  14.1.1 what is multimedia? 587

  14.1.2 multimedia ir 588

  14.1.3 text ir versus multimedia ir 589

  14.2 the challenges 589

  14.2.1 the semantic gap 589

  14.2.2 feature ambiguity 591

  14.2.3 machine-generated data 591

  14.3 content-based image retrieval 592

  14.3.1 color-based retrieval 593

  14.3.2 texture 593

  14.3.3 salient points 596

  14.4 audio and music retrieval 597

  14.4.1 fingerprinting 598

  14.4.2 speech recognition 599

  14.4.3 speaker identification 601

  14.4.4 spoken document retrieval 602

  14.4.5 audio basics 602

  14.5 retrieving and browsing video 606

  14.5.1 video abstracts 606

  14.5.2 static summaries 607

  14.5.3 mosaics and salient stills 608

  14.5.4 dynamic summaries 609

  14.5.5 interactive summaries 611

  14.5.6 visual vs. audio browsing 612

  14.5.7 evaluating summaries 613

  xxiv contents

  14.6 fusion models: combining it all 614

  14.6.1 naming faces 614

  14.6.2 naming images 615

  14.6.3 naming audio 616

  14.6.4 combining audio and video for avsr 617

  14.6.5 combining audio and video for multimedia 620

  14.7 segmentation 620

  14.7.1 a video segmentation example 620

  14.7.2 segmentation schemes for video 622

  14.7.3 video segmentation with edges 623

  14.7.4 speech segmentation 624

  14.7.5 segmentation evaluation 625

  14.8 compression and mpeg standards 625

  14.8.1 intensity and sampling 626

  14.8.2 color 626

  14.8.3 lossy compression 628

  14.8.4 lossless compression 628

  14.8.5 temporal redundancy 630

  14.8.6 motion prediction 631

  14.8.7 mpeg standards 633

  14.9 trends and research issues 636

  14.10bibliographic discussion 637

  15 enterprise search 641

  by david hawking

  15.1 introduction 641

  15.1.1 characteristics and applications of enterprise search 642

  15.1.2 enterprise search software 643

  15.1.3 workplace search 644

  15.2 enterprise search tasks 644

  15.2.1 examples of search-supported tasks 644

  15.2.2 search types 647

  15.2.3 studying enterprise search 647

  15.3 architecture of enterprise search systems 648

  15.3.1 gathering 648

  15.3.2 extracting 651

  15.3.3 indexing 652

  15.3.4 indexing textual annotations 653

  15.3.5 query processing 654

  15.3.6 presentation of search results 655

  15.3.7 security models 657

  15.3.8 federation/metasearch 659

  15.4 enterprise search evaluation 662

  15.4.1 published test collections for enterprise search 662

  15.4.2 internal enterprise search evaluations 663

  15.4.3 enterprise search tuning 665

  15.4.4 what is it reasonable to expect? 666

  15.5 potential reasons for dissatisfaction 667

  contents xxv

  15.6 context and personalization 668

  15.6.1 controls and levers for contextualization 671

  15.6.2 contextualization: local, enterprise or global? 675

  15.6.3 privacy of profiles 676

  15.6.4 defining, creating and maintaining a profile 677

  15.6.5 user modeling 677

  15.6.6 implicit measures 679

  15.6.7 information filtering 679

  15.6.8 social recommender systems 680

  15.7 trends and research issues 681

  15.8 bibliographic discussion 681

  16 library systems 685

  by edie rasmussen

  16.1 the information environment in the library 685

  16.2 online public access catalogues 687

  16.2.1 opacs and bibliographic records 689

  16.2.2 information retrieval from the ils 691

  16.2.3 integrating the hybrid library 693

  16.2.4 opacs and end users 694

  16.2.5 ils: vendors and products 695

  16.3 ir systems and document databases 697

  16.3.1 bibliographic and full-text databases 698

  16.3.2 content of database records 698

  16.3.3 the online industry: database vendors 701

  16.3.4 information retrieval from document databases 702

  16.4 information retrieval in organizations 706

  16.5 trends and research issues 708

  16.6 bibliographic discussion 709

  17 digital libraries 711

  by marcos gon?calves

  17.1 introduction 711

  17.2 defining digital libraries 712

  17.3 a general architecture 713

  17.4 fundamentals 714

  17.4.1 digital objects and collections 714

  17.4.2 metadata and catalogs 716

  17.4.3 repositories/archives 719

  17.4.4 services 723

  17.5 social-economical issues 725

  17.5.1 social issues 725

  17.5.2 economical issues 726

  17.6 software systems 727

  17.6.1 greenstone 728

  17.6.2 eprints 728

  17.6.3 dspace 728

  17.6.4 fedora 729

  xxvi contents

  17.6.5 open digital libraries 729

  17.6.6 the 5s suite 730

  17.7 dl case studies 731

  17.7.1 the networked dl of theses and dissertations 731

  17.7.2 the national science digital library 732

  17.7.3 the etana-dl archaeological digital library 732

  17.8 trends and research issues 733

  17.8.1 evaluation 733

  17.8.2 integration 733

  17.8.3 other research challenges 734

  17.9 bibliographic discussion 735

  a open source search engines 737

  with christian middleton

  a.1 introduction 737

  a.2 search engines 738

  a.2.1 preliminary selection of search engines 738

  a.2.2 features 741

  a.2.3 evaluation 742

  a.3 methodology 743

  a.3.1 document collections 743

  a.3.2 evaluation tests 744

  a.3.3 experimental setup 744

  a.4 experimental results 745

  a.4.1 test a – indexing 745

  a.4.2 test b – incremental indexing 749

  a.4.3 test c – search performance 749

  a.4.4 global evaluation 752

  a.5 conclusions 753

  b biographies 755

  references 761

  index 893

  contents xxvii

发表评论

评论列表