David MacKay has put the textbook he wrote online for everyone to download in a variety of formats. If you find it useful, consider buying a copy.
A massive corpus of annotated and tagged ngrams for use in machine learning and NLP. Free to download. It'll take time to grab it all because it's so finely split up. Creative Commons licensed (BY-NC-SA v3). Relationships between parts of speech and words are broken out, also.
MyMemory is an online translation system that uses both machine implemented translation and human-contributed translations, probably with some form of machine learning on the back-end. Users can upload files with their own translations to improve the service's accuracy. These documents, called memories, can either be public or private. They are also working to make translations more readily searchable. They have a REST API that we can use.