Information Modeling Method and Application

  1. Internet and Web Fundamentals: 3 네트워크 지수법칙10 www의 출현과 성장 13 W3C 14 web browser 15 웹의 구성 요소 16 URL 18 HTML form 23 HTTP message request / response 30 web services 33 WWW web server web client plugin web proxy 35 browser 39 cookies 43 web proxy 47 cache consistency 유지50 web server server side cache server architecture Apache server hosting server cluster 58 CDN 59 web문서의 유형 static dynamic active 61 server-side script기술 66 JavaScript

  2. Search Engines and Information Retrieval: 13 IR 문서 24 연관성 평가 26 precision recall 28 사용자의 정보요구 30 검색엔진 색인 동적데이터 spam

  3. Architecture of a Search Engine: 3 색인 질의 5 text 획득 crawler feed 변환 저장소 8 text 변환 파서 스타핑 스테밍 link분석 정보 추출 분류기 12 색인 생성 문서 통계량 가중치 도치 색인 분산 14 사용자 상호작용 질의어 입력 질의어 변환 결과 출력 17 ranking 스코어링 성능 최적화 분산 21 평가 로깅 ranking 분석 성능 분석

  4. Crawls and Feeds: 2 web crawler 10 freshness 14 age 17 focused crawling deep web 19 sitemap 23문서 feed RSS 28 변환 29 인코딩 유니코드 34 문서의 저장 38 압축 Big table 41 중복 탐지 43 유사중복 탐지 45 지문 47 simhash 50 noise 제거 content block 찾기

  5. Processing text: 2 text처리 3 text 통계량 zipf의 법칙 12 어휘의 증가 Heaps법칙 17 검색결과 갯수의 추정 19 결과 집합 크기의 추정 sampling 22 전체 문서 모음 크기의 추정 23 tokenization 28 stopping 30 stemming Porter Krovetz 36 phrase POS tagging n-gram 44 문서구조와 마크업 48 PageRank dangling link 56 link 품질 trackback link 57 정보 추출 NER 60 HMM 64 hot topic detection chi-square test 70 Mutual Information 71 KL divergence 72 Coefficient of Variance 73 Gaussian test 79 국제화

  6. Ranking with Indexes: 2 색인과 ranking 5 역색인 proximity match 11 field extent 14 압축 16 delta encoding 18 byte-aligned code v-byte 21 skipping skip pointer 25 lexicon 26 분산색인 27 질의어 처리 문서 용어 30 최적화 conjunctive 처리 31 역치 기법 MaxScore 33 구조화된 질의어 35 분산형태의 평가 문서 분산 용어 분산 caching

  7. Queries and Interfaces: 2 정보 요구4 상호작용 5 keyword질의어 7 질의어 stemming co-occurrence Dice coefficient 11 철자검사 편집거리 Damerau-Levenshtein거리 15 사운덱스 코드 run-on 17 Noisy channel모형 21 유의어 사전 22 질의어 확장 24 단어 연관 척도 Dice MI EMIM 카이제곱 척도 구글 유사도 32 유사도의 특징 집합 유사도 35 bag 유사도 cosine유사도 피어슨 상관 계수 39 유사 관련성 피드백 context 벡터 질의어 log 41 연관성 피드백 유사 연관성 피드백 46 context와 개인화 사용자 모형 질의어 log 위치 기반 검색 51 snippet 생성 52 문장 선택 significance factor 54 snippet 생성 57 광고 스폰서 검색 문맥 광고 58 광고 검색 유사연관도 피드백 61 검색결과 군집화 65 군집화 방법 66 통계적 번역 모형 70 overture

  8. Retrieval Models: 2 검색 모형 3 relevance 4 검색 모형의 종류 5 boolean검색 7 vector space 모형 10 cosine similarity 12 tfidf 13 relevance feedback Rocchio알고리즘 15 TCSR consistency문제 19 검색문제와 분류문제 Bayes classifier 22 binary independence 모형 24 contingency table 26 BM25

  9. Retrieval Models 2: 3 language model | unigram, bigram, topic language model 6 lauage model for search 7 query likelihood model 9 smoothing, Jelinek-Mercer smoothing, tfidf, Dirichlet smoothing 16 relevance model | PRF, KL divergence 25 combining evidence 26 inference network | belief 34 web search 37 search engine optimization | query trap 39 word proximity 41 ML and IR | generative, discriminative 45 ranking SVM 50 topic model | multinomial distribution, Dirichlet distribution 53 LDA process

  10. Search Engine Evaluation: 7 relevance judgement 8 pooling 9 query log 11 preference estimation 13 click filtering 14 effectiveness | F measure 17 ranking effectiveness | average precision, MAP, recall-precision graph, interpolation 28 upper rank documents | rank R precision, riciprocal rank, discounted cumulative gain, NDCG 35 preference | Kendall’s tau coefficient, BPREF 37 efficiency measure 38 significance test | t-test, Wilcoxon signed-ranks test, sign test 46 parameter | cross-validation, SVM optimization 48 online test

  11. Document Classification: 7 ontology 8 naive Bayes classifier 12 multiple Bernoulli event space | smoothing 15 multinomial event space | smoothing 19 Support vector machine, hinge loss function 26 kernel trick 30 OVA OVO 34 nearest neighbor classification 35 generative & discriminative model 37 feature selection | Information Gain 40 clssification application | spam 45 sentiment 48 online ad classification | semantic hierarchy

  12. Document Clustering: 4 hierarchical clustering | divisive, agglomerative 10 cost function | single, complete, average, average group, Ward 13 Kmeans clustering 17 KNN clustering 19 clustering evaluation 20 K | adaptive K

  13. Recommender systems - introduction 7 content aware recommendation 12 recommender system 16 basic models of RSs 17 CF models 18 memory-base | user-based, item-based 20 model-based 26 content-based 29 knowledge-based RS | constaint-based, case-based 33 other RSs 35 evaluation of RS 39 advanced topics

  14. Recommender systems - Neighborhood-based CF: 7 user-based neighborhood model|Pearson/cosine, mean-centered rating 12 variants|discounted similarity, z-score inverse user frequency 16 item-based neighborhood model|Adjusted cosine, complexity 23 comparing user-based & item-based method 26 clustering 28 regression modeling 30 user-based nn regression, sparsity & bias issue 25 item-based nn regression, sparsity & bias issue 38 combine 39 graph models for NB models 40 user-item graphs|Katz measure 44 user-user graph|horting & predictability, rate prediction 48 item-item graphs

  15. Recommender systems - Model-based CF: 4 latent factor models|Frobenius norm, latent vector, latent factor 13 unconstrained MF| gradient descent method, stochastic gradient descent 22 regularization|held out method, cross-validation method 27 alternating least squares 28 coordinate descent 29 incorporating user and item biases 32 incorporating implicit feedback 33 SVD++ 36 non-negative matrix factorization, Lagrangian relaxation, aspects 45 ratings with both likes and dislikes 49 integrating factorization and NB models|non-personalized bias-centric model 52 neighborhood portion of model, item-item impicit feedback 55 latent factor portion of model 59 integrating latent factor models with arbitrary models

  16. Recommender systems - Content-based RS: 5 feature extraction|vector-space representation, active user 9 supervised feature selection and weighting| Gini index, entropy, Chi-statistic, normalized deviation 18 learning user profiles and filtering 20 nearest neighbor classification 22 Bayes classifier| Bernoulli model 27 rule-based classifier|support, confidence 30 regression-based models 35 content-based vs collaborative recommendations

출처: 2018-1 서울대학교 산업공학과 박종헌 교수님 ‘정보모델링기법과 응용’ 수업 자료의 Index