Google Books | Aaron Swartz Day and International Hackathon

Come to our Raw Thought Salon from 7-9 pm, with DJs afterwards till 2am :) – TICKETS

Originally posted on January 11, 2016:

Brewster Kahle at the Internet Archive, January 24, 2013

From the San Francisco Aaron Swartz Memorial. January 24, 2013.

Link to Video on the Internet Archive.

Brewster does a great job of explaining to us about Aaron’s “Open Source Life,” and how “bulk downloading” (although it got Aaron into trouble) is in itself, is not only “not a crime,” but a desirable action with outcomes that benefit the public.

He also sheds light on Aaron’s ongoing quests to make U.S. legal court documents (via PACER) and works in the Public Domain (via GoogleBooks) more publicly accessible (rather than locking both up behind paywalls or with cumbersome downloading restrictions).

Brewster Kahle:

I learned from Aaron what living an Open Source life was like. I think he really did live that way. He floated and helped others. He gave everything away. He really wasn’t tied to an institution. He really was not a company man in any sense. He was really quite pure in his motivations, and it made him incredibly effective at cutting through a lot of the stuff that most of us deal with.

An open source life.

He was able to keep his self interests at bay, which is kind of remarkable for a lot of us. But he was able to do it. And he was able to communicate well with an open smile and a kind heart. He had a way of communicating with this energy on things that mattered and he had a genius at finding things that mattered to millions of people. There are lots of things to work on, but the things that he worked on were incredibly effective.

We first met, I think, in 2002 at the Eldred Supreme Court case in Washington DC, where we drove a Bookmobile Across, celebrating the Public Domain by giving away books that kids made, and also then at the Creative Commons Launch. But I really got to know Aaron when he said ‘I’d really like to help make the Open Library website with the Internet Archive’ to go and give books and integrate books into the Internet itself. And he said “I’ve got this cool technology, called “Infogami,” it really made it possible to make Reddit happen. Let’s use it again for this other thing.”

And it was wonderful to work with him, but it was really unlike working with anybody else I’ve ever met. You certainly couldn’t tell him what to do, he just kind of did what was the right thing to do, and he was right certainly a lot more often than I was. We also worked together in other areas, when he was a champion of open access, especially of the Public Domain. Bringing public access to the Public Domain.

Most people think that’s kind of an obvious thing. Doesn’t “the Public Domain” mean that it’s publicly accessible? Of course all of us say “No!” It’s sort of like there are these National Parks, with moats and walls and guns turrets sort of pointing out, in case someone wanted to come near the Public Domain. And Aaron didn’t think this was right. And he spent a lot of time and effort freeing these materials.

One of the first ones that we were actively working together on was freeing government court cases, so that anybody could see this without having to have special privilege or money, and also to make it so you could data mine it, and go and look at these things in a very different way. So he freed and liberated a lot of court cases from the PACER system, and uploaded them, in bulk, to the Internet Archive, so that people could have access to these. There are now 4 Million documents, from 800,000 cases that have been used by 6 million people, because of the project that Aaron Swartz and others helped start.

It was an interesting project because it went over many different organizations, each playing a role and all cooperating in a very non-corporate way. It was a very Aaron style way of making things happen. And the idea of making court documents and legal documents available more easily struck a chord with me because, in college, I was trying to figure out how I was gonna try to get out of the draft. And my college didn’t have a legal collection, and the only way that I could try to get to legal court documents was to get an ID from my professor and break in to the Harvard Law Library to go and read court documents. That sucked! It really makes no sense, and Aaron not only sort of saw that it doesn’t make sense. He decided he was going to try to help solve this. Not just for himself, but for everyone.

Then there was other Public Domain collections like the Google Books Collection. Google Books was a library project to go and digitize lots and lots of books. A lot of them were Public Domain. Google would make them available from their website, but really really painfully. It would make it so if you wanted one book, you could get one book. If you wanted 100 books, they would turn off your IP address forever. This is no way to have public access to the Public Domain, and the Internet Archive started getting these uploads of “Google Books.” Going faster, and faster, and faster. Like well, where are these coming from? Well it turns out it’s Aaron. He and a bunch of friends figured out that they could go and get a bunch of computers to go slowly enough to just clock through tons of Google Books and upload them to the Internet Archive. Interestingly, Google never got upset about it. The libraries, on the other hand, grumbled. Which is so… Well anyway. They’ll get over it.

So, when this started happening, we said “Ok. What’s going on? Should we be concerned?” The answer was “No, it’s Public Domain.” We just made sure that we got the cataloging data right, and we linked back to Google, so that if you’re on the book, you can go back to the original page and see the da da da da da. And it all worked well.

But there it was. Aaron doing it again; bringing access to the Public Domain.

What is crushing to me is that Aaron got ensnared by the Federal Government for doing something that the Internet Archive actively encourages others to do for our collections, and we think all libraries should encourage, which is: Bulk downloading to support data mining and other research using computers. This is just the way the world works.

The first step is for a computer to read and analyze materials is to download a set of documents. When Aaron did this from one library, JSTOR, they strongly objected, and demanded that MIT find and stop that user, which then led U.S. Prosecutors to pull out their worst techniques.

Did anybody stop to ask if bulk downloading is a crime? I say “No. Bulk downloading is not, in itself, a crime.” Let’s stop this practice of discouraging bulk downloading, because there are encouraging projects that are learning amazing new things by having computers be part of the research process. Let’s not stop this and discourage young people from coming up with new and different ways to learn things from our libraries.

What resulted, in this case, was tragic, and not necessary. Really, what we want is computers to be able to read. Aaron knew this. We’re all building this, and he got ensnared anyway. Let’s let our computers read.

Because of this tragedy, JSTOR, whom I talked to this morning, and the Internet Archive, have agreed to meet to discuss the broad issue of data mining and web crawling. I hope that we really make progress. At least there’s reasons to be positive.

This assault on Aaron would disillusion, discourage and depress any principled young man, and if there ever was a principled young man, it was Aaron Swartz.

We miss you, and we will carry on your important work.