//Create an instance of the crawler and subscribe to the PageCrawlCompleted event
PoliteWebCrawler crawler = new PoliteWebCrawler();
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;

//The event handler method
void crawler_ProcessPageCrawlCompleted(object sender, PageCrawlCompletedArgs e)
{
    CrawledPage crawledPage = e.CrawledPage;

    if (crawledPage.WebException != null || crawledPage.HttpWebResponse.StatusCode != HttpStatusCode.OK)
        Console.WriteLine("Crawl of page failed {0}", crawledPage.Uri.AbsoluteUri);
    else
        Console.WriteLine("Crawl of page succeeded {0}", crawledPage.Uri.AbsoluteUri);


    //crawledPage.Content.Text //raw html
    //crawledPage.HtmlDocument //lazy loaded html agility pack object (HtmlAgilityPack.HtmlDocument)
    //crawledPage.CSDocument   //lazy loaded cs query object (CsQuery.Cq)
}

1 답변

0 투표

Abot is an open source C# web crawler built for speed and flexibility. It takes care of the low level plumbing (multithreading, http requests, scheduling, link parsing, etc..). You just register for events to process the page data. You can also plugin your own implementations of core interfaces to take complete control over the crawl process. Abot targets .NET version 4.0.

https://github.com/sjdirect/abot#quick-start

구로역 맛집 시흥동 맛집
이 포스팅은 쿠팡 파트너스 활동의 일환으로, 이에 따른 일정액의 수수료를 제공받습니다.
add
...