Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
page 1 / 2
33 results tagged documents  ✕   ✕
Torrent: 30 Years of Defcon https://www.btdig.com/f6d980965fe52c8c19d01d7b5ca00e643b8b5584/
Wed 25 Jan 2023 10:01:58 PM PST archive.org

Defcon 1-29. Video, audio, papers, pictures (lots of pictures), filler material, music and programs.

1.8 TB in size. Good luck.

defcon torrents archive audio video music documents pictures
bottomless-archive-project/library-of-alexandria https://github.com/bottomless-archive-project/library-of-alexandria
Fri 16 Dec 2022 05:58:42 PM PST archive.org

Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

In our modern age new text documents are born in a blink of an eye then (often just as quickly) disappear from the internet. We find it a noble task to save these documents for future generations.

This project aims to support this noble goal in a scalable way. We want to make the archival activity streamlined and easy to do even in a huge (Terabyte / Petabyte) scale. This way we hope that more and more people can start their own collection helping the archiving effort.

java web archival software web documents
TalEliyahu/Threat_Model_Examples https://github.com/TalEliyahu/Threat_Model_Examples
Thu 03 Nov 2022 05:35:18 PM PDT archive.org

A collection of links to threat models for various pieces of software and protocols.

infosec links threatmodeling documents
FOIA Machine https://www.foiamachine.org/
Wed 19 Oct 2022 10:34:35 AM PDT archive.org

Our simple tool allows anyone to generate a public records request with all the necessary legal boilerplate, all for free. Use your FOIA Machine account to track the progress of your requests, all from one place. Access an extensive database of jurisdictions and government agencies to find out where, and how, to send your request.

Powered by Muckrock.

Github: https://github.com/cirlabs/foiamachine

foia government military records documents automation
Shine Ultra Series Affordable Document & Book Scanner https://shop.czur.com/products/czur-shine-ultra-series
Sat 11 Jun 2022 09:41:50 AM PDT archive.org

A flatbed document and book scanner. Will also scan 3d objects that'll fit under the camera. Minimum of 13MP image resolution (4160 x 3120), can handle up to A3 size documents. Maximum document thickness: 10mm. Scanner camera's height above the document is adjustable. As fast as one second per scan. Portable - can be folded up for transportation. Can detect when you turn the page or change the document, look for the new page, and automatically take the next image. Abbyy OCR functionality built in. Scans to Word documents, PDF, Excel spreadsheets, or TIFF image files. Software for Windows (back to XP) and OS X.

Shows up as a UVC device under Linux (archived), so any image or video capture software that is UVC enabled can do the work for you.

images scanner documents books archival hardware buy ocr 3d
opentower/populus-viewer https://github.com/opentower/populus-viewer
Sat 04 Jun 2022 11:14:32 AM PDT archive.org

Populus-Viewer is a tool for decentralized social annotation, built on pdfjs, wavesurfer.js and the Matrix protocol. You can use it to read PDFs, listen to audio, or watch videos, and have rich discussions in the margins, with your friends, classmates, or scholarly collaborators.

Each uploaded file is attached to a matrix space, and each annotation to the file becomes a room within that space. Populus-Viewer has been tested with synapse and dendrite, but should be compatible with any spec-compliant matrix server.

annotation matrix javascript documents distributed federated
Read the Facebook Papers for Yourself https://gizmodo.com/facebook-papers-how-to-read-1848702919
Wed 20 Apr 2022 05:39:32 PM PDT archive.org

Changelog:
April 18, 2022: Twenty-eight documents relating to the 2020 presidential election, Donald Trump, and the Jan. 6 Capitol riot were published.

facebook foia politics documents archive
ckoshka/acrossword https://github.com/ckoshka/acrossword
Sat 26 Feb 2022 08:18:55 PM PST archive.org

Acrossword is a small async wrapper around the SentenceBERT library. It has a convenient object-oriented API with two main purposes:

semantic search

  • create miniature, powerful, cached semantic search engines from organised collections of documents
  • easily serialise and deserialise those documents in a gzipped JSON format
  • create documents from cleaned webpages and text files
  • search using different levels of granularity – from a book, to a chapter, to a single sentence

zero-shot text classification

  • simply provide examples of each class, or something as simple as "This sentence is about X", and it will quite reliably classify it correctly

It's useful if you want to avoid larger bloated libraries with capabilities you don't need, and comes with zero fuss.

python search ai ml classification text documents
PACER: Public Access to Court Electronic Records https://pacer.uscourts.gov
Thu 27 Jan 2022 06:40:42 PM PST archive.org

Has JSON and XML APIs: https://pacer.uscourts.gov/file-case/developer-resources

Needs an account.

us court cases documents legal archive api
Recoll https://www.lesbonscomptes.com/recoll/
Tue 21 Sep 2021 12:11:28 PM PDT archive.org

Recoll is a desktop full-text search tool. Finds documents based on their contents as well as their file names. Can search most document formats, even if they're compressed (even Maildir/ and mailboxes). You may need external applications for text extraction. Based on Xapian. Primarily desktop but it could be run server-side. Indices are backwards-compatible.

Source code: https://framagit.org/medoc92/recoll

Flies on solid state storage!

Can be plugged into Searx: https://searx.github.io/searx/admin/engines/recoll.html

exocortex desktop search server personal indexing documents leandra windbringer
GitHub - typesense/typesense https://github.com/typesense/typesense
Thu 01 Oct 2020 01:28:29 PM PDT archive.org

Fast, typo tolerant search engine for building delightful search experiences. Has an API and a number of protocol modules for different languages. Written in C and C++.

Designed for people who don't want to fuck with Elasticsearch, they just want a document search engine. Lightweight, powerful, scalable. Tries to have smart defaults. Single executable. Uses far less memory than the usual Java-based search systems do. Tries to be flexible so you can build the search engine you need.

Looks like you define a JSON document with the stuff you want to be able to search and throw it over to the engine. Means you'll need to write some front-end tooling to extract the data you want to index, which might not be that big a deal. It could just be some shell scripts.

cplusplus searchengine server indexing documents api flexible exocortex
GitHub - jsonresume/resume-schema: JSON-Schema is used here to define and validate our proposed resume json https://github.com/jsonresume/resume-schema
Thu 10 Sep 2020 08:14:53 PM PDT archive.org

A formal schema for representing a resume' or CV as a JSON document so that it's machine readable.

schema json resume documents personal business
GitHub - simon987/sist2: Lightning-fast file system indexer and search tool https://github.com/simon987/sist2
Sat 09 Nov 2019 11:04:10 PM PST archive.org

A fast, multi-threaded application that takes apart files, indexes them, and shoves them into Elastic Search. Tries to be portable. Relies upon Elastic Search, unfortunately. Indices can be transported elsewhere (say you've indexed offline storage media) and loaded into the engine.

python search index documents data
GitHub - jivoi/awesome-osint: A curated list of amazingly awesome OSINT https://github.com/jivoi/awesome-osint
Mon 15 Jul 2019 03:55:00 PM PDT archive.org

A curated list of amazingly awesome open source intelligence tools and resources. Open-source intelligence (OSINT) is intelligence collected from publicly available sources. In the intelligence community (IC), the term "open" refers to overt, publicly available sources (as opposed to covert or clandestine sources)

awesome osint search socialnetworks documents realtime
alephdata/aleph: Search and browse documents and data; find the people and companies you look for. https://github.com/alephdata/aleph
Tue 29 Jan 2019 02:12:55 PM PST archive.org

Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is built with investigative reporting as a primary use case. Aleph allows cross-referencing mentions of well-known entities (such as people and companies) against watchlists, e.g. from prior research or public datasets. Web-based search. Processing includes optical character recognition, language and encoding detection and named entity extraction. Load structured entity graph data from databases and CSV files. This allows navigation of complex datasets like companies registries, sanctions lists or procurement data.

data documents analysis browsing python webapps indexing exploration exocortex
Projects/Tracker - GNOME Wiki! https://wiki.gnome.org/Projects/Tracker
Fri 26 Oct 2018 01:55:25 PM PDT archive.org

A personal file system indexing and search application. Part of the Gnome desktop. Indexes file contents, metadata, and location to better help you find things. Also allows you to do your own tagging of stuff it keeps track of. Uses D-BUS for IPC and SPARQL for search. Uses multiple ontologies for different kinds of files (including multimedia content).

exocortex desktop personal search indexing foss engine gnome documents metadata
East Bay Municipal Utility District :: Emergency preparedness https://www.ebmud.com/customers/emergency-preparedness/
Thu 06 Sep 2018 03:21:13 PM PDT archive.org

EBMUD recommends customers take some simple steps to be prepared for an emergency. Downloadable documents at this page.

bayarea emergency documents pdf emergencies preparation earthquake
Textricator https://textricator.mfj.io/
Tue 31 Jul 2018 01:02:01 PM PDT archive.org

Textricator is a tool for extracting text from PDFs and generating structured data (CSV or JSON). It can even work on OCR'ed documents. Describe what the document's contents look like with a YAML file and it'll extract the data using those fields. Can also be used as a Java library.

Github: https://github.com/measuresforjustice/textricator

software text data extractor java documents pdf
Open Semantic Search: Your own search engine for documents, images, tables, files, intranet & news https://www.opensemanticsearch.org/
Mon 11 Jun 2018 02:39:36 PM PDT archive.org

Free Software for your own Search Engine, Explorer for Discovery of large document collections, Media Monitoring, Text Analytics, Document Analysis & Text Mining platform based on Apache Solr or Elasticsearch open-source enterprise-search and Open Standards for Linked Data, Semantic Web & Linked Open Data integration.

Usage tutorial here: https://www.opensemanticsearch.org/doc/tutorial

Github: https://github.com/opensemanticsearch

Of course it has an API: https://www.opensemanticsearch.org/doc/admin/rest-api

exocortex leandra search software searchengine documents data foss rest api spider indexing
U.S. Military Code Names http://www.designation-systems.net/usmilav/codenames.html
Tue 20 Mar 2018 03:29:05 AM PDT archive.org
operations information code us documents military
page 1 / 2
4684 links, including 339 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn