Compare Products
![]() |
![]() |
Features * Fetching and parsing are done separately by default, this reduces the risk of an error corrupting the fetch parse stage of a crawl with Nutch.
* Plugins have been overhauled as a direct result of removal of legacy Lucene dependency for indexing and search.
* The number of plugins for processing various document types being shipped with Nutch has been refined. Plain text, XML, OpenDocument (OpenOffice.org), Microsoft Office (Word, Excel, Powerpoint), PDF, RTF, MP3 (ID3 tags) are all now parsed by the Tika plugin. The only parser plugins shipped with Nutch now are Feed (RSS/Atom), HTML, Ext, JavaScript, SWF, Tika & ZIP.
* Distributed filesystem (via Hadoop)
* Link-graph database
* NTLM authentication
|
Features * CAPTCHA - ScrapeHero long defeated this foe of scraping- don't worry, we got this covered. We can handle Captcha and any other evil technology ;-) thrown at our Hero. And sometimes it is best to just sidestep them too
* WEBSITE CHANGES - We can identify changes to websites and keep scraping them for you - without missing a beat. Our advanced technology helps us monitor changes, aberrations and anomalies in the data and react in time
* COMPLEX WEBSITES - Not a problem - ScrapeHero likes to take on new web crawling challenges and beat them every day (yes, we handle transactional and JavaScript/Ajax heavy sites) - tell us about your complex project and we can take it on
* HERO TECH - We have built our own sophisticated technology learning from all other web scraping software out there - scrapy, scrapinghub, mozenda, visual web ripper, php scraper etc and removed all their shortcomings, giving you some awesome technology
|
LanguagesOther |
LanguagesOther |
Source TypeOpen
|
Source TypeClosed
|
License TypeApache |
License TypeProprietary |
OS Type |
OS Type |
Pricing
|
Pricing
|
X
Compare Products
Select up to three two products to compare by clicking on the compare icon () of each product.
{{compareToolModel.Error}}Now comparing:
{{product.ProductName | createSubstring:25}} X