- 421
- 9 949 592
Victor Lavrenko
Приєднався 10 жов 2011
Відео
IR20.8 Learning to rank with an SVM
Переглядів 10 тис.8 років тому
IR20.8 Learning to rank with an SVM
IR20.10 Learning to rank with click data
Переглядів 6 тис.8 років тому
IR20.10 Learning to rank with click data
IR20.7 Learning to rank for Information Retrieval
Переглядів 7 тис.8 років тому
IR20.7 Learning to rank for Information Retrieval
IR20.3 Passive-aggressive algorithm (PA)
Переглядів 10 тис.8 років тому
IR20.3 Passive-aggressive algorithm (PA)
IR20.4 Convergence of the PA algorithm
Переглядів 3,2 тис.8 років тому
IR20.4 Convergence of the PA algorithm
IR20.6 Sequential minimal optimization (SMO)
Переглядів 16 тис.8 років тому
IR20.6 Sequential minimal optimization (SMO)
LM.8 Interpolation with background model
Переглядів 2,4 тис.8 років тому
LM.8 Interpolation with background model
LM.13 Language model ranking formula
Переглядів 2,2 тис.8 років тому
LM.13 Language model ranking formula
LM.6 Laplace correction and absolute discounting
Переглядів 6 тис.8 років тому
LM.6 Laplace correction and absolute discounting
LM.12 Smoothing and inverse document frequency
Переглядів 2,6 тис.8 років тому
LM.12 Smoothing and inverse document frequency
BIR.10 Estimation with relevant examples
Переглядів 2,4 тис.8 років тому
BIR.10 Estimation with relevant examples
# SUMMARY A discussion on web search algorithms, focusing on the impact of data quantity and link analysis techniques like PageRank. # IDEAS: - Web search engines handle staggering amounts of information, making architecture maintenance a significant challenge. - Google’s architecture processed 20 petabytes of data per day five years ago. - Large data volumes make computational tasks harder but simplify algorithmic processes. - A random subset of web pages is used to build search engine indexes. - Precision at rank 10 measures the accuracy of the top 10 search results. - Competitors with larger data sets can achieve higher precision in search results. - Distribution of scores for relevant and non-relevant documents remains unchanged with more data. - Precision at a fixed rank improves with increased data volume. - Search engines can improve rankings by increasing the amount of crawled data. - Larger data sets can outperform better algorithms if the latter have less data. - The density of relevant documents at the top of rankings affects precision improvements. - Historical example: Quill had an index size four times larger than Google’s. - Larger indexes lead to better search results if algorithms are comparable. - Precision as a function of rank generally decreases, with more relevant documents at the top. - More data in the index leads to better performance for free. - Link analysis techniques like PageRank are crucial for ranking web pages. - PageRank evaluates the importance of web pages based on link structure. - HITS algorithm identifies hubs and authorities in web content. - Combining large data sets with effective link analysis improves search engine performance. - Search engines must balance computational challenges with algorithmic efficiency. # INSIGHTS: - Large data volumes simplify algorithmic processes despite increasing computational challenges. - Precision at a fixed rank improves significantly with increased data volume. - Larger data sets can outperform better algorithms with less data. - The density of relevant documents at the top of rankings is crucial for precision improvements. - Combining large data sets with effective link analysis enhances search engine performance. # QUOTES: - "Google's architecture was churning through about 20 petabytes of data per day." - "Having that much data actually makes some things a lot easier." - "You can never get the entire web; nobody has the entire web." - "Precision at rank 10 would be 40%." - "The overall distribution of scores shouldn't change because you're just getting four times the data." - "Precision at a fixed rank will actually go up." - "The accuracy of the top page of your results depends on how much data you've crawled." - "Quill's index size was four times as big as Google's." - "If you have the same algorithms but four times as much data, you'll do better." - "Precision as a function of rank generally decreases." # HABITS: - Regularly update and maintain large-scale data architectures to handle vast information volumes. - Continuously gather and analyze large random samples of web pages for indexing. - Focus on improving both algorithmic processes and data collection efforts. # FACTS: - Google processed 20 petabytes of data daily five years ago. - No search engine has access to the entire web. - Larger data sets lead to higher precision in search results. - Quill had an index size four times larger than Google’s. # REFERENCES: - PageRank - HITS algorithm - Quill search engine # ONE-SENTENCE TAKEAWAY Increasing the amount of crawled data significantly improves search engine precision and performance. # RECOMMENDATIONS: - Regularly update and maintain large-scale data architectures for handling vast information volumes. - Continuously gather and analyze large random samples of web pages for indexing. - Focus on improving both algorithmic processes and data collection efforts. - Invest in gathering more data to enhance search engine precision and performance. - Combine large data sets with effective link analysis techniques like PageRank.%
u speak like Jordan Belfort...lol
Great collection of videos, Thoroughly loved it..
You just cleared every doubts on this topic, it's 10 days before my exam watching your video and getting everything cleared
why does the covariance matrix rotates the vectors towards the greatest variance?
great explanation, simple and visualized. Thanks! =)
THANKS, you've answered a lot of questions in my mind with your amazing explanation!!!!
you sound like Gale Boetticher from breaking bad
This video has (by far) the highest knowledge/time of any other video on this topic on UA-cam. Clear explanation of the math and the iterative method, along with analogy to the simpler algorithm (k-means). Thanks Victor!
Content is good, but please amplify audio.
When andrew tate explaining Math
sir we cant see your cursor omg
how to know the value of P(b) and P(a)
Excellent explanation
ty
Its amazing this works at all, because the first step is to take a 2d image that makes sense, into a 1d image that has lost ALL spatial information. A 1d stream of pixels is not an image.
Splendid. Example very well portrays the algorithm stepwise!
Thanks for this. Your video helped bring clarity to the problem statement.
but is not projection (y .e)e where y = x - mew
at 3:08 the variance estimator shoud be divided by (nb-1) as corrected estimation and not nb .. that's what we call Bessel's correction
Great tutorial! But why the slop of two eigenvectors are expected to be the same?!
Outstanding!
thanks brother
Great Explanation Sir. I don't know why it motivated me to appreciate and comment on the video.
Andrew Tate of machine learning
This course / lecture series has been staggeringly useful, thank you. It was also explained in a way that I could understand easily - and I have been struggling with the eplanations from others. You simplified things superbly. I did get lost when we started talking about mathmatical functions et al, but the information I needed was more to do with concepts and ideas - so i could safely let the maths part slip by, though noting different efficiencies of course. Thank you. Sincerely appreciate you sharing your work.
Amazing, thank you!
Enlightening, thank you!
Thanks so much! I will refer my students to your webpage!
This is awesome explanation. Thanks !
Studying at TUM. I admire german students who are following the lecture contents from the uni. Taking ML course atm, but here, the lecture is just like dumping only the whole concepts, regardless of whether students can understand them or not... So nice explanations in every video in ML related playlist.. I fcking regret that I did not choose the UK to study my master's.
Wow, you explain very well, thank you! I was having a hard time understanding my professor's explanation in our class.
Thanks for posting these Victor. I'm working on understanding the prior bias of precision and this helped. I hope things are going well!
Great tutorial!
Incredibly insightful! Your teaching style, peppered with examples, made complex topics like inverted index data structure and MapReduce algorithms easy to grasp. The way you broke down the compression techniques was particularly eye-opening, and I gained a newfound appreciation for the mechanics behind large-scale search engines and big data management. Before watching your lectures, I was quite overwhelmed by these concepts. However, your clear and structured approach has removed that uncertainty and replaced it with genuine interest and understanding. Thank you for your dedication to spreading knowledge. Your work has had a significant impact on my learning journey, and I am truly grateful for that. Please continue to share your wisdom; you are making a real difference in the lives of your viewers!
how do we block the japanese hack
andrew tate?
Thankyou😃
Step 1 is center, should we also scale so variance = 1
Hi sir is k means and kneighborhood algorithms are same ?
Great playlist.
This is the first time that PCA has actually made sense mathematically. Great video
I cant help but notice the middle dotted line looks like a logistic regression curve. I should know this.. But is there any relation?
Sounds just like Andrew Tate
Thanks ı dont want to listen more ….
As mentioned by @omidmo7554, I was exactly in a similar situation. you have explained it so lucidly. Thank you so much Victor!
SVD 3:08
Great Explination
O chatgpt me mandou aqui 😳
Amazing video, perfectly explained the concepts without getting bogged down in the math/technical details.
Andrew Tate?