Before a new search engine can hope to make a run against Google, it has to crawl.
But indexing the web by “crawling” sites with automated software doesn’t just require scaling up to the web’s vast scope—even though doing so is a big challenge in itself. Individual sites have no obligation to welcome a new search crawler. Some instead post digital no-trespassing signs, a way to discourage automated traffic that might bog down performance.
“The web has trillions of documents,” says Vivek Raghunathan, cofounder of the ad-free, subscription-based search startup Neeva. “And the web is a lot trickier to crawl than it was a few years ago.”
An October 2020 report on digital competition by the House Judiciary Committee’s Subcommittee on Antitrust aimed a government spotlight at this situation.
“The high cost of maintaining a fresh index, and the decision by many large webpages to block most crawlers, significantly limits new search engine entrants,” the report stated….
Read More…
Source : fastcompany.com
Source link