ebook img

Alex Smola PDF

106 Pages·2011·5.67 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Alex Smola

Scaling up Machine Learning Alex Smola Yahoo! Research Santa Clara alex.smola.org Monday, September 19, 11 Thanks Amr Joey Yucheng Qirong Ziad Sergiy Shravan Ahmed Gonzalez Low Ho al Bawab Matyusevich Narayanamurthy Kilian John Vanja Quoc Choon Hui Eric James Weinberger Langford Josifovski Le Teo Xing Petterson Jake Shuang Hong Vishy Zhaohui Markus Alexandros Martin Eisenstein Yang Vishwanathan Zheng Weimer Karatzoglou Zinkevich Monday, September 19, 11 Why Monday, September 19, 11 Data • Webpages (content, graph) • Clicks (ad, page, social) • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Photos, Movies (Flickr, YouTube, Vimeo ...) • Cookies / tracking info (see Ghostery) • Installed apps (Android market etc.) • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Reviews (Yelp, Y!Local) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Purchase decisions (Netflix, Amazon) • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) • Blog posts (Tumblr, Wordpress) >10B useful webpages • Microblogs (Twitter, Jaiku, Meme) Monday, September 19, 11 Data - Identity & Graph • Webpages (content, graph) • Clicks (ad, page, social) • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Photos, Movies (Flickr, YouTube, Vimeo ...) • Cookies / tracking info (see Ghostery) • Installed apps (Android market etc.) • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Reviews (Yelp, Y!Local) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Purchase decisions (Netflix, Amazon) • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) • Blog posts (Tumblr, Wordpress) • Microblogs (Twitter, Jaiku, Meme) 100M-1B vertices Monday, September 19, 11 Data - User generated content • Webpages (content, graph) • Clicks (ad, page, social) • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Photos, Movies (Flickr, YouTube, Vimeo ...) • Cookies / tracking info (see Ghostery) • Installed apps (Android market etc.) • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Reviews (Yelp, Y!Local) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Purchase decisions (Netflix, Amazon) • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) • Blog posts (Tumblr, Wordpress) • Microblogs (Twitter, Jaiku, Meme) >1B images, 40h video/minute Monday, September 19, 11 Data - Messages • Webpages (content, graph) • Clicks (ad, page, social) • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Photos, Movies (Flickr, YouTube, Vimeo ...) • Cookies / tracking info (see Ghostery) • Installed apps (Android market etc.) • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Reviews (Yelp, Y!Local) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Purchase decisions (Netflix, Amazon) • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) • Blog posts (Tumblr, Wordpress) >1B texts • Microblogs (Twitter, Jaiku, Meme) Monday, September 19, 11 Data - User Tracking • Webpages (content, graph) • Clicks (ad, page, social) • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Photos, Movies (Flickr, YouTube, Vimeo ...) • Cookies / tracking info (see Ghostery) • Installed apps (Android market etc.) • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Reviews (Yelp, Y!Local) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Purchase decisions (Netflix, Amazon) • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) • Blog posts (Tumblr, Wordpress) alex.smola.org • Microblogs (Twitter, Jaiku, Meme) >1B ‘identities’ Monday, September 19, 11 Personalization • 100-1000M users • Spam filtering • Personalized targeting & collaborative filtering • News recommendation • Advertising • Large parameter space (25 parameters = 100GB) • Distributed storage (need it on every server) • Distributed optimization • Model synchronization Monday, September 19, 11 (implicit) Labels no Labels • Ads • Graphs • Click feedback • Document collections • Emails • Email/IM/Discussions • Tags • Query stream • Editorial data is very expensive! Do not use! Monday, September 19, 11

Description:
Supervised learning. • Classification, regression. • CRFs, Max-Margin-Markov networks. • Fully observed graphical models. • Small modifications for aggregate labels, etc. • Works with MapReduce/Hadoop. • Small number of iterations. • Distributed file system. • Simple & theoretical g
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.