Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
12 results tagged datasets  ✕   ✕
bytewax/awesome-public-real-time-datasets https://github.com/bytewax/awesome-public-real-time-datasets
Tue 25 Jul 2023 02:36:01 PM PDT archive.org

This list is inspired by awesome public datasets, but for real-time datasets and sources. Normally accessed via HTTP or Websockets.

The list is separated into Free and Paid and broken into subsections based on loose categories.

awesome datasets directory finance scheduling data rest api
Have I Been Trained? https://haveibeentrained.com/
Fri 09 Dec 2022 02:03:27 PM PST archive.org

HaveIBeenTrained uses clip retrieval to search the Laion-5B and Laion-400M image datasets. These are currently the largest public text-to-image datsets, and they are used to train models like Stable Diffusion, Imagen, among many others.

When it's time to train a generative AI system, organizations like Stability use those datasets to download the images from their links and present them to the model with their captions.

With HaveIBeenTrained, artists can search these databases for links to their work and flag them for removal. We partner with Laion, who built these datasets, to remove those links. This helps ensure that future models will not be trained with work that has been opted out.

ai ml images search datasets optout
GitHub - ipfs/awesome-ipfs https://github.com/ipfs/awesome-ipfs
Sat 24 Oct 2020 06:16:14 PM PDT archive.org

Useful resources for using IPFS and building things on top of it.

awesome list ipfs applications articles datasets archives tools
GitHub - MassMove/AttackVectors: A repository to monitor attack vectors https://github.com/MassMove/AttackVectors
Tue 10 Mar 2020 01:41:22 PM PDT archive.org

A repository for monitoring attack vectors mentioned in the billion-dollar disinformation campaign to reelect the president in 2020. Includes some Python code for analyzing the data.

lists fakenews directory research domains datasets metrics socialnetworks socialengineering exocortex edison
awesomedata/awesome-public-datasets https://github.com/awesomedata/awesome-public-datasets
Sat 31 Mar 2018 08:15:02 PM PDT archive.org

A topic-centric list of high-quality open datasets in public domains. By everyone, for everyone!

awesome github data datasets research
JSON Generator – Tool for generating random data http://www.json-generator.com/
Tue 20 Mar 2018 12:20:21 AM PDT archive.org

This website generates random JSON documents, suitable for use as test data or learning how to write and interface with various APIs.

development datasets generator javascript utilities random json testing online
AWS Public Datasets https://aws.amazon.com/datasets/
Tue 20 Mar 2018 12:10:32 AM PDT archive.org

Free datasets made available by Amazon. Stuff like an atlas of the Galactic Plane, NASA NEX data, the Human Microbiome Project, the Enron emails, Freebase, and the Marvel Universe's socialgraph. The Google Books Ngrams corpus is in here, also, alongside the Westbury Lab USENET corpus.

information microbiome datasets ai ml ngrams freebase socialgraph free exocortex google usenet nasa corpus marvel data enron
Mecodify | MeCoDEM http://www.mecodem.eu/mecodify/
Tue 20 Mar 2018 12:01:55 AM PDT archive.org

An opensource tool for the visualization of extremely large datasets, like twitter maps or email databases.

tables datasets maps tools exocortex graphs visualization networkanalysis databases
Welcome - Data Refuge https://www.datarefuge.org/
Mon 19 Mar 2018 11:50:03 PM PDT archive.org

The datarefuge website. Probably as official as it's going to get. Has some useful definitions, at least. opendata Also has a bunch of rescued datasets available for download. data

website datasets opendata download definitions datarefuge data
Catalog of friendly, useful, artistic online bots, and resources that can help you make them | botwiki https://botwiki.org/
Mon 19 Mar 2018 11:43:57 PM PDT archive.org

A wiki of resources for people writing bots - actual bots to interact with, tools, tutorials, code, and datasets. exocortex chatbots howto twitter

wiki datasets tools twitter howto exocortex code tutorials bots chatbots resources
Home | data.world https://data.world/
Mon 19 Mar 2018 10:39:00 PM PDT archive.org

socnet and archive of public and open data for research and study. Encourages people to upload their own datasets for others to use. I use Github to authenticate.

datasets data public socialnetworks open archives
UCI Machine Learning Repository https://archive.ics.uci.edu/ml/index.php
Mon 19 Mar 2018 10:38:03 PM PDT archive.org

Vast collections of data suitable for training and teaching AI ML software.

datasets ai ml free research download data archives
4997 links, including 379 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn