diff --git a/信息检索/2018本科生/信息检索.txt b/信息检索/2018本科生/信息检索.txt new file mode 100644 index 0000000..773bbfe --- /dev/null +++ b/信息检索/2018本科生/信息检索.txt @@ -0,0 +1,267 @@ +һ +1.ϢҪ裺кץȡʴ +2.ۣ棬ڣӦٶȣǷûIJѯҪ +3.SEOŻٺõҲȣ泣ȷõϢҪŻ +4.棺һĹԶץȡάϢij߽ű +5.ץȡҳнԾ޴ıʽݿ +6.UNûûҪõϢ +7.ĵĶ󣬿ıҲͼƵȶýĵ +8.ĵϣдĵɵļ +9.ءضȡƶȣؼ뷵ؽ֮ϵȡûжϣһ۸ +10.ԵǶȣ + ϵͳǶȣϵͳûϢĽ + ûǶȣ۲ûԼķӦϵͳûͶ +11.ϢһѯQĵCмÿƪĵDQضȲ +12.ļҳȡҳԤı(ִʣʸȡȥͣô)ѯRankû +13.ҳԤȥҳṹȥtagurlı +14.ıԲѯıеԤ +15.ıԾııнһõıڲʾͨʾ +16.ıвҰѯı +17.򣺶ıijַʽض +18.IRģʽ +19.LuceneApacheֺ֧ṩһԴȫļ湤߰ + ṩIJѯ棬ı档ҪΪģͼģ + org.apache.Lucene.search index analysis Է + queryParser ѯ document 洢ṹ store ײIO洢ṹ util ݽṹ +20.SolrSolrĵײļʹLuceneʵֵ + SolrLuceneı㣺ҵ͹ + Lucene⣬ǶӦó򣬶SolrӦó + LuceneרעײĽ裬SolrרעҵӦ + Lucene֧ĹSolr + SolrLuceneҵӦõչ + +ڶ 漼 +1.涨壺һԶȡҳݵijűҪɲ֡ͨHTMLԴҪĶ +2.̣һɸʼҳurlʼץȡҳĹУurlǷץȡٸһѡµurlУֱϵͳһֹͣעȣץȡ +3.ĹҪ㣺Ȳ(BFS,DFS,)ʱ(һɢУŷʹÿһַ) +4.ֲ1ϣ̫һװ£2ÿ̨ʼǰɺҪάϣ洢ϣͨŻӰЧ + 2ȷÿ̨ķֹurl֪Ӧһ̨ȥִУͨŵĴ +5.ĿԴ棺HeritrixNutchpyspiderscrapy +6.NutchһԴJAVAʵֵ档ṩԼȫߡȫWeb档 + Ҫص㣺̣߳ȣrobots.txtsocketӲgetInputStreamȡ߽(ȡһҳн洢ԭʼҳͽĵ)ҳ + + + ҳ +1.ʽ壺ַһ߼ʽȶõһЩضַϣ"ַ""ַ"ַһֹ߼ +2.ʽã滻ijֹ(ģʽ)ı +3.ʽƥص㣺ƥٶȿ죻ֻķıʾҳҪ󲻸ߵ¿ʹã +4.DOMĵģͣXMLHTMLתΪ״ṹԪأԪؼǵֺԶͨDOM +5.HTML DOMص㣺ڽHTMLʱٶȽ൱޹ķҳԶҪҳȥ봦ʱʹá +6.HTMLʽIJȽ +7.ٳһHTML(Beauteful soup) +8.ԪĶ壺ͨһͳһûûڶѡúʵʵּǶڶּߵȫֿƻ +9.ҳʩ + robotsЭ飺ҳͨrobotsЭ淶ЩҳȡЩ + һվʱӦȥǷrobotsЭ飬ڣļе淶ԼķʷΧ + IPΣͬһIPƵʺᱻ + ӴIPУȡʱ + ƣ½(ṩû)JSȾ̬ҳ(ں̨) + ͨPhantomJSSeleniumģͨHTTPHTTPݵĿ + ֤ + ýű˹ͼƬ(õ12306ͼƬ)ͼƬ(ָ֤ͼ)ؼֽжԱȲ(ٶʶͼAPIʶͼ) + + +Ľ ʵ +1.ַвֳһϵеḶ́ÿһг֮Ϊһ""token +2.ʹַвֺÿһtokentokenɵĴʵеÿһΪ +3.ͣôʵ壺ͣôԼtermĸСΧЧʣЧ߹ؼܶ +4.ͣôʵȱ㣺һЩĴʿܻᱻto be or not to beʿ +5.ͣôʵķ﷨޳(ڴʣʣ)ĵƵ(Ƶʸߵ)ͣôʱ +6.Stemmingʸɻԭָȥ˴׺ʽ̣ܹٻʣǻή׼ȷʡ +7.õĴʸɻԭ㷨Porter(ӢĴ) +8.LemmatizationԹ鲢ôʻʹη۱仯ʽתΪʽ +9.StemmingLemmatization𣺴岻ͬstemmingָȥ˴׺ʽ̣Lemmatizationָôʻʹη۱仯ʽӶشʵԭ͵Ĺ̡StemmingͨὫشʺϲһ𣬶LemmatizationὫͬһԪIJͬʽкϲStemmingLemmatization˲֮ͬIJԡ + + +彲 ķִ +1.ִ㷨ַƥķִʷ(ֵеĴʾͱʶϴʾƥ䣬ʶĴʾиɵ)ķִʷͳƵķִʷ(ڹֵƵܹϺõطӳʵĿŶ) +2.ͳģͣһпܵķֲ +3.n-gramģͣnʵijֻǰn-1أκδʶأĸʾǸʳָʵij˻ + ص㣺Чֻ˴ʵλùϵûпǴ֮ƶȡ﷨ʹ⣬ϡ() +4.ŵ-ͼƵҪ˼룺һʵõƽ㷨 +5.ɷģͣһδ֪ɷ̡Ҫ +6.õĿԴִҶţͷִ + + + ϢģͺͲ +1.IRģ͵Ԫ飺D,Q,R(q,d),Fĵϣѯϣ(ڹĵѯ֮ϵģ) +2.ݵϢģͣģ(ģ͵)ģ(ռģ͵)ģ(ģ͵) +3.Ķ壺һֽھ伯ۺͲϵļ򵥼ģͣʽʽIJѯ +ʴģ͵ĸ +4.ģ͵ŵ㣺ѯ򵥣⣻ͨʹøӵIJʽԷؿƲѯ൱ЧʵַûдʽͨչʹĹ +5.ģ͵ȱ㣺Ϣﲻ㣬ƥȨƣ޷ûҪѧòʽѯһ¼ĵ٣ѽԶطڴģĵļ̫ +6.ĵռռķֻ¼1λ + + +߽ +1.ĸĺݽṹҪʵ͵ż¼ + ʵ䣺ÿһ洢аĵһб + ż¼ÿһĵһкdocIDʾ + + +ڰ˽ ռģ +1.ģͣϵͳĵѯض򷵻ĵеĵΪѯıѯַʽ +2.ַ + JaccardϵJaccard(A,B) = |AB|/|AB| ȱ㣺ûпǴƵʣûпǺʺͳʵȨزͬ +3.ʴģͣǴĵгֵλõһģ +4.ʴģ͵ĸĽԪʣλ +5.tf-idfĸtf-idfһͳƷһļҪ̶ȡҪڸļгִӣôϿгִС + tfǴƵ(tĵdгֵĴ 1+log10tf)idfĵƵ(log10(N/df))dfĵƵ(ִĵĿ) +6.tf-idf壺tf-idfȨؼһ򷽷ֻǴƵ⣬÷˴ĵеƵʽȨغּ㣬ԳʸȨأԺʸȨأʹü׼ȷ +7.ռģ͵ĸÿƪĵʾһtf-idfȨصʵֵṹһ|V|άʵռ(|V|ǴĿ)ռеÿһάӦһĵʾΪռеһһ +8.ռģ͵ŵ㣺ıݵĴΪռеԿռϵƶȱϵƶȣ˼ƥĵҲԱԻcosineֵչʾû +9.ռģ͵ȱ㣺ռģһִʴģͣǴ໥ģʵϺܿܲǡʣͬʿܱΪDzصĴʣͬĴʲᱻӰռάȷdzҷdzϡ裬ŻЧʻܲ +10.ķռңŷʽ롣Ϊռϣѯĵŷʽܴ + + +ھŽ +1.壺ͨŻûѯضߵڷбǰ棬û +2.׼topKļٷټ(DzѯȨأҪԲѯйһ)NѡK(ҳзƶֵĵѡTopK) +3.Ǿ׼TopKԣ + ȥֻDzѯidfֵһֵĵֻٰ3/4ѯĵ + ʤ߱ڴʵеÿǰrȨصĵʤ߱ѯq󣬶qдʤ߱󲢼ɼA AҳTopK + ̬÷֣ΪʹǰĵصҲȨģΪÿĵһѯ޹صȨȡյĵضȺȨȵ + Ӱ򣺶termӦĵͳһģÿʱ÷ֽۼӼ㡣˼·ۼӵ÷ֵĵĿһǰ(ɨrƪ̶Ŀĵߵǰ¼tfѾijֵ)ǴidfС + ؼֽ-ԤѡNƪȵ ĵȵ߲ȵߣ֮Ϊ׷ߡѯʱͨȵ߼ƶҳrȵߣٶÿȵ߼׷߽ƶȼ㣬ֱҵTopK +4.ӷ㷨Webɾ̬HTMLͨӶγɵͼÿҳһ㣬ÿӶӦһߡ + pageRank˼룺voteöľҪġÿһҳһֵһҳpageRankƽ̯õҳ췽ҳ𣬵100κõƽȨҳָҳӣƯơ + Hits˼룺ҳ֮ӶȨؼĹױⲻӼֵҪߣרעڸƷָĽÿҳҪȨֵֵ(AuthorityҳHubҳ) + pageRankHitsıȽϣǻӷ㷨ΪۻݡHits㷨authorityֻڼȨأpageRankȨǶڼġ + + +ʮ Ż +1.SEOŻָ˽ȻƵĻϣվڲⲿĵŻĽվеĹؼȻ +2.webspanͨǷΪ棬ʹһЩҳֵ + Ϊterm spam()link spam()hiding techniques(Сӣ뱳ɫͬ) + + +ʮһ Ϣ +1.׼ʣĵ/(ĵ+IJĵ) +2.ȫ(ٻ)ĵ/(ĵ+δĵ) +3.ȫʵpoolingԶϵͳTopNɵļϽ˹עעĵϿΪĵļϡ +4.ȷʣ(ĵ+δIJĵ)/(IJĵ+δĵ) + ΪβþȷʣΪѯصĵռСһ֣Ծȷһ99%ϣ޲ο +5.FֵٻRͲ׼PļȨƽֵ (1+^2)*P*R/(^2*P+R) +6.R-׼ʣǰRIJ׼ +7.ԶѯΪƽ(Macro)΢ƽ(Micro)ƽָÿѯĽƽ΢ƽǽвѯΪһѯۺһֵ +8.APƽ׼ʣָһβѯÿĵλϵIJ׼ʵƽֵ(һĵ),ĸĵ +9.MAPβѯжÿβѯAPֵĺƽ +10.NDCGnormalizing(ضɸߵ) discounted(ǰļȨرںҪ) cumulative gain(ÿĵͲѯضȽмӺ)׼ۿۼ档һ۲ЧķüмӺ͵˼· +11.IJȷ + ʣʾĵû֪ĵı(ѯû֪/ûԺ֪)ڲȫʣĸû + ӱԣʾĵûδ֪ı(ѯûδ֪/ܵļ) + ԣĵղִͬ֯ʵÿоѡ + + +ʮ طͲѯչ +1.k-gramӢƴƴдУ +2.طڳʼĻϣͨûָЩĵػأȻĽĽٻʡ +3.ѯչͨڲѯмͬصĴ߼һٻʵķ +4.طࣺʽط(ûʽزμӽ)ʽط(ϵͳûΪƲⷵĵԣӶз)αط(ûû룬ϵͳֱӼ践ĵǰkƪصģȻз) +5.ʽطȱ㣺Ҫûʽ룬ûûΪһ̶ϷӳûȤ˾п + ΪнϸߵҪ׼ȷʲһܱ֤ijЩҪӶ豸 +6.αطȱ㣺Ҫûأ򵥣ܶʵҲȡ˲Ч + ûͨûжϣ׼ȷԱ֤еIJѯЧ +7.طͲѯչıȽϣطǾֲûѯоֲļʱķѯչȫַһԵȫַͬ/ʴʵ䣻ߵĸͬٻʣѯչܻ׼ȷ(ijһ) + + +ʮ ռ +1.SVDֵֽ⣬ԴһҪľֽ⡣ + ֵֽԵõֵʾʲôֵʾжôҪ + A^TAõһ󣬸÷ֵAֵAA'λUA'AλV +2.PCAɷַһͳƷͨ任һܴԱתΪһԲصıתıɷ֡ +3.SVD壺ֵֽܹЧؽݵάȣԱ󲿷ֵϢ + ͼѹڽPCA⣬Ƽ㷨NLP㷨 +4.LSAһϢģͣʹͳƼķԴıзӶȡ֮DZڵṹDZڵṹʾʺıﵽ֮ԺͼıʵֽάĿġ +5.LSA壺º͵ʶӳ䵽ͬһռ䣬ĵĵࡣִʵĹϵͬʡʼ⡣ +6.pLSADZǵworddocĹʽpLSAڶʽֲֲĻģֵĸ +7.pLSAʵֹ̣ոѡһƪĵdѡĵֲаոѡһp(z|d)ѡӴʷֲаոp(w|z)ѡһʡܽ˵ѡĵ⣬ѡɴʡݴ֪ĵ-Ϣѵĵ-- +8.pLSAƣ˸ģͣÿӦĸʷֲʷֲȷĽͣLSA˸˹ֲpLSAmulti-nomialֲıԣÿ̻һʶ壬öʽֲƵø׼ȷtopicpLSAŻĿKL-divergenceСС׼ +pLSAȱ㣺doctermӣpLSAģҲӣpLSA޷ĵģͣEM㷨ҪĵҪܴļģͲ걸ıҶ˹ģ +9.LDA׷ֲLDApLSA˼һ£ + Ϊĵгֵĸʷֲ͸ijϳֵĸʷֲȷģ˵ + ʵȫҶ˹ + + + +ʮĽ Լ +1.ŷϾ룺(|xi-yi|^p)^1/p +2.׼ŷϾ룺ŷϾȱ㣬ͬδǸķֲDzͬ + ׼ŷϾŷϾһָĽx=x-m/s mǾֵsDZ׼ +3.ıƶ + String based method + Character based method(LCS)༭(ַIJ롢ɾ滻)չı༭(롢ɾ滻ַĽ)JaroJaro-Winkler(ַַ֮ͬ˳λú͸)(ڳ֮ͬıȽ) + Term based methodƶȣJaccardƶȣDiceϵ(2|XY|/|X|+|Y|) + Corpus based methodLM(Language Model) ͨģֽʡͬʣTM(Topic Model) ͨʵĹӳ֮ +4.shingleѵ㷨˼룺ļתΪ⣬һkĵdһУԶĵdk-shingleΪdkɵУڽĵظ⡣ +5.LSHֲйϣһֳڴάָԺʱһ㷨޷ٽάѯƶȸߵӼضhash㷨άӳ䵽άռ䣬Խϸ߸ʿѰƶȸߵӼùˡ֤ܣڹ˽׶ΰѲܳɹݶ˵˺ݶ󼯺Ͻѡ֤׶ζԺѡϽƶȼ㡣 +6.MinHashLSHһ֣ٹϵƶȡÿѡ϶һϵдɣͨhash԰ÿtermӳΪһǿԵõһϵСϣֵminHashַ֣һʹöhashÿϿԵõkСֵABƶȿԱʾΪAkBk/AkBkڶʹһhashÿϵrСֵArBr/ArBr +7.SimHashLSHһ֣ҪΪ裬ִ(ĵзִʲÿעȨ)hash(ѡʵhashλͨhashhashֵ)Ȩ(hashֵȨˣhashλΪ0-1)ϲ(дļȨۼ)ά(ںϲһhashֵijһλ0Ϊ10ΪƪĵsimHashֵ)ͨĵsimHashļ㣬жĵǷơ +8.ʵֱʾʽone-hot representationÿʱʾΪһܳάΪʱС蹵ϡ裬άѣ޷ʾunseen wordsȱ + distributed representationÿʱʾΪһֵάʵÿһάԿʵϢ +9.word2vecģͣڽһתΪһ + CBOWʴģͣijǰCʻǰCĴijʳֵĸʡ + Skip-GramCBOW෴ijʷֱǰijʵĸ + word2vec壺شбѰҶӦϵڻ룻Ƽϵͳ + + +ʮ彲 ͼƬ +1.ͼķࣺͼͼWebͼıͼݵͼ(CBIR) +2.CBIRĹؼͼȡƥ(ͲӾɫ״ռλùϵ) +3.ɫıʾ + ɫֱͼһɫͳÿһɫռͼı + ŵ㣺ƽơ߶ȡתԣԶָͼ + ȱ㣺δռλϢͬͼͬɫֲ + ɫͼɫھķֲϢ + ŵ㣺ӳضԵĿռԣԼֲطֲطֲ + ɫأɫֱͼĻϼÿɫľعƣЩͳɫķֲʾɫ + ŵ㣺Ҫɫռά + ȱ㣺÷ЧʱȽϵ + ɫһʸһռϢĽֱͼ㷨ͳͼиɫ +4.֪ϣ㷨Сߴ磬ɫʣҶƽֵȽصĻҶȣhashƶȼ + ŵ㣺򵥿٣ͼƬСŵӰ + ȱ㣺ͼƬݲܱͼƬмӼ֣ܾϲ + pHashֵķӵ£ʹɢұ任Ƶʡ +5.; + ڽṹоĻԪǵй + ͳƵѰҿ̻ + ģ͵ͼĹģΪģ͵IJΪ + źŴźŷ֪ʶ +6.ѧ6ϡȡԱȶȡԡ״ԡԡֲڶ +7.Ҷȹ־˼룺ڽػҶֵ״̻ͼ޵ľϵֿԷΪˮƽֱԽߡԽĸ +8.źẈ̂ȶͼźŲƵ߿˲ٶͼз +9.LBPֲֵģʽͼṹͳƹϵÿص㶼ڵ8صգܴΪ1СΪ08תΪʮƵLBP룬ֵӳϢ + ŵ㣺ղԣתԣҶȲԡ +10.״ҪΪ + ״״Ŀ߽ؼ + (ѡһʼ㣬˳ʱ뷽ر߽˳ҳһ߽)ķ(ͼ״߽ӳ䵽ϽǣijԪ״߽ȫ󲿷ָǣֵΪ1ֵ0)ֱͼ(ڱ߽ȡ㣬㵽ĵľ룬ֱͼ)߽(߽㵽ĵľΪһֲֲľ)Ҷ + ״״Ŀصļ + 򵥵(ӣͶӰʣζȣϸ)IJ +11.򷨣֤ڲСͬw1 = n1/nڲ = w0*(u1-u)^2+w1*(u2-u)^2 = w0w1(u1-u2)^2 = + ҪľҵһTΪǰͱֵʹ +12.ͼֲ + HOGݶֱͼ + Ҫ˼룺ֲĿ״ԱݶȷܶȷֲܺõݶȵͳϢݶҪڱԵĵط + Ҫʵֹ̣ҶȻgammaУͼÿصݶȣͼ񻮷ΪСcellsͳÿcellݶֱͼcellݶֱͼͳcellݶϢ(ݶȷݶȴС)͵õcellHOGcellһblockеcell͵õblockHOGblockHOG͵õimageHOG + SIFT߶Ȳת + Ҫ˼룺ڲͬij߶ȿռϲҹؼ㣬ͨһͼе㼰س߶Ⱥͷõ + Ҫʵֹ̣߶ȿռ䣻ڳ߶ȿռм⼫ֵ㣬оȷλɸѡÿλá߶Ⱥͷ򣻼 +13.ͼֲȫַֻ + BOFBag Of Feature + Ҫ˼룺һͼʾbag of wordsγȡIJwordͼĹؼfeature + ŵ㣺ȡʱҪlabelһලѧϰ + ȱ㣺ȫûпǵ֮λùϵ + VLADvector of locally aggressgated descriptors + Ҫ˼룺BoFƣֳkۺֻ࣬Ǿֲľ + ͬǣǼ򵥵İֲ鵽ϣDZ˸ĵľ + ŵ㣺ÿһάֵԾֲ˸Ŀ̻ʧϢ + ȱ㣺 + FVfisher vector + ŵ㣺һʧϢ + ȱ㣺ֲıʾӻ˼Ѷ + Ҫ˼룺BoWĻϣ˾ֲĵľ룬оĵϱʾһֲ + + + + + diff --git a/信息检索/2019本科生/信息检索重点.docx b/信息检索/2019本科生/信息检索重点.docx new file mode 100644 index 0000000..3b2d9f0 Binary files /dev/null and b/信息检索/2019本科生/信息检索重点.docx differ diff --git a/信息检索/2019研究生/不负责任、杂乱无章的信息检索考试内容2.0.pdf b/信息检索/2019研究生/不负责任、杂乱无章的信息检索考试内容2.0.pdf new file mode 100644 index 0000000..3cb78e5 Binary files /dev/null and b/信息检索/2019研究生/不负责任、杂乱无章的信息检索考试内容2.0.pdf differ diff --git a/信息检索/2019研究生/简答题库.docx b/信息检索/2019研究生/简答题库.docx new file mode 100644 index 0000000..bd8a6c4 Binary files /dev/null and b/信息检索/2019研究生/简答题库.docx differ