A CLI utility which scans websites for broken links. Sitemap aware.
Autonomous (self-hosted) BitTorrent DHT search engine suite. Designed specifically for end-users. Has a back-end daemon and a front-end UI. Uses SQLite as its backing store. Goes from DHT node to node, indexing what it finds.
REST API docs: https://app.swaggerhub.com/apis/boramalper/magneticow-api/v0
(v1 not published yet)
Does not have any built-in rate limiter yet, and it will literally suck the hell out of your bandwidth.
Free Software for your own Search Engine, Explorer for Discovery of large document collections, Media Monitoring, Text Analytics, Document Analysis & Text Mining platform based on Apache Solr or Elasticsearch open-source enterprise-search and Open Standards for Linked Data, Semantic Web & Linked Open Data integration.
Usage tutorial here: https://www.opensemanticsearch.org/doc/tutorial
Github: https://github.com/opensemanticsearch
Of course it has an API: https://www.opensemanticsearch.org/doc/admin/rest-api
The homepage of a distributed search engine project. The project involves downloading and running a cross-platform spider (available for Windows, Linux, FreeBSD, MacOSX, and pretty much any OS which can run Mono) that will then crawl the web and upload what it finds to the project. This can use lots of bandwidth so consider carefully before joining in.
Specific documentation for telling YaCy to start crawling a site, the mode in which to index it, where to start, how deeply, et cetera. indexing exocortex spider