ebook img

Prof. Ahmet Süerdem Istanbul Bilgi University London PDF

20 Pages·2015·0.62 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Prof. Ahmet Süerdem Istanbul Bilgi University London

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence  Business intelligence (BI)  Uses data mining techniques and tools for the transformation of raw data into meaningful information for business analysis.  Media intelligence (MI)  serves the same purpose but uses text mining techniques on user-generated unstructured textual data such as online newspapers, social media sites, blogs, comment fields, and wikis. Media Monitoring  The activity of monitoring the visibility of some issues and topics in print, online and broadcast media.  Can be conducted for business, political, and scientific purposes.  The services that media monitoring companies provide typically include the systematic recording of radio and television broadcasts, the collection of press clippings from print media publications, the collection of data from online information sources. Web crawler  Systematically browses the Internet for the purpose of Web indexing.  Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly.  A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit  Some common crawlers: Heritrix; Nutch ; PHP- Crawler Issues in crawling  Selection: which pages to download,  re-visit: when to check for changes to the pages,  politeness: avoid overloading Web sites,  parallelization: coordinate distributed web crawlers. Scraping  Web scraping focuses more on the transformation of unstructured HTML data on the WARC, into structured data that can be stored and analyzed in a central local database or spreadsheet. Scraping Techniques  Human copy-and-paste  Regular expression matching: (tagging by detecting regular patterns)  HTML parsers: scraps according to the HTML structure. Needs constant updating because of changes in the HTML structures. Apache Nutch — provides web crawling and HTML parsing  Web-scraping software:  https://import.io/?utm_source=spro&utm_medium=mpu&utm _term=m260&utm_content=v1&utm_campaign=HomeGA  Semantic annotation recognizing: The pages being scraped may embrace metadata or semantic markups and annotations, which can be used to locate specific data  Xpath cleaning Full text database (Digital archive)  Contains the complete text of blogs, magazines, newspapers or other kinds of textual documents.  http://www.nexis.com.gate2.library.lse.ac.u k/search/flap.do?flapID=home&random=0. 6176078971655325​  Yahoo news  MongoDB

Description:
Uses data mining techniques and tools for the transformation of raw data into meaningful information for Apache Nutch — provides web crawling and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.