Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
2 results tagged corpora  ✕   ✕
EleutherAI https://www.eleuther.ai/
Fri 06 Nov 2020 12:29:59 PM PST archive.org

EleutherAI is a grassroots AI research group aimed at democratizing and open sourcing AI research. Multiple projects and usable training corpora. F/OSS model called GPT-Neo.

Several spinoff projects to investigate.

ai ml foss models projects corpora research exocortex leandra
dariusk/corpora: A collection of small corpuses of interesting data for the creation of bots and similar stuff. https://github.com/dariusk/corpora
Mon 19 Mar 2018 10:43:57 PM PDT archive.org

A public domain collection of corpora for training AI ML bots. Consists of many YAML files containing key/value data on many different subjects. Each category contains multiple documents about different related subjects. You won't be able to drop these into your code randomly, you'll need to write a fairly simple parser tuned to the document's schema. There are several libraries in different programming languages for efficiently using one or more of these files in your own project.

nlp publicdomain nlu ai ml corpora languages yaml keyvalue pd bots foss data corpus schema text
6420 links, including 414 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn