To be clear: It’s two events in one – if you get there early enough (between 7-8:30pm) to learn about Police Surveillance and filing public records requests, great! If not, no biggie, just grab a drink and join the party.
“Raw Thought” – a musical artistic thought-provoking combination of DJs (Mochipet w a live drummer, Tha Spyryt, Ozlo Glowing and Cain MacWitish), visuals on many screens by Projekt Seahorse, an immersive art installation by Grumpy Green, historical collages by the Swartz-Manning VR Museum Team, and a Psychedelic Chill room. The fun starts before 10 sometime and goes till 2am.
All pics below of Mochipet and kitties link to Mochipet songs :-)
Lisa Rein: So you mentioned that you’ve been feeling very experimental lately, and I saw on your instagram that you’ve been messing with a lot of new equipment.
Mochipet: I always do that.
LR: Well what are you playing with recently?
Mochipet: I’m doing a lot of modular synth stuff. I’ll probably be doing some modular synth stuff on Friday.
LR: Is that just a type of synthesizer? What’s “modular synth” exactly?
Mochipet: Modular synth is basically kind of like a synthesizer, with a bunch of parts. You can take separate parts and make a frankenstein thing of whatever you want.
It’s like if you had the ability to take the EQ off of one synthesizer, and then took the amp of another synthesizer, and then put them all together. All the pieces are modular. They’re separate. So you can put them all together and make a brand new thing.
LR: So it allows you to customize your sound?
Mochipet: Yes it’s very customized because everybody can put things together in a very different ways. It’s kind of like LEGOs I guess. That’s a great analogy: yes just like LEGOs. It’s just parts.
LR: Do the parts have to be a certain brand?
Mochipet: No. There are many brands of modules. That’s the cool thing about it is that it’s a very decentralized system. So, basically, there’s a standard.
It’s kind of like the Internet: If people follow the standard their pages will work in a browser. This is kind of like that with modular synths. They all have a certain voltage. They all have a certain voltage range, for notes, and things like that.
There’s kind of a standard that this German company Doepfer made that other people just adopted. So there’s a lot of really small operations, individual people, making modules.
Doepfer made their specs open. There’s a lot of standards now where I think people realize if you make them open then you’re gonna get a lot more use out of it. If it was a closed system, nobody would use it. It would be useless. You need open systems in order for people to be able to participate,, and that really opens the door up for a lot of individuals to do really unique things. Because everybody thinks differently.
Rather than having a big bureaucratic company with standards and rules dictating whatever their idea of what the industry should be like. There’s none of that. Instead, it’s just random people making different things. But they all work together. So you can connect anything to anything and it will work, and you can make unique things out of it that nobody could ever make before.
Mochipet: Yeah yeah. It’s a really cool thing. It’s kind of new. Doepfer came out with it many years ago, but the whole modular synth “scene” kind of thing is pretty new. I mean like five or ten years old. People are doing it just because they love it. They are making really interesting instruments and they like coming up with ideas. Some of these modules, there are only like 50 of them. They’ll make 50. And they’ll sell em, and that’s it.
LR: So some of them are rare and hard to obtain?
Mochipet: Yes. Some of them become rare. Some of them are very hard to find. There’s a lot of them that are made all over the world. There’s this guy in China that makes really cool ones. There’s people in Italy that make really cool modules. There’s this company Make Noise, here in the states that’s very popular. It’s kinda nerdy. It’s kind of like open source programming, but with music. It’s like people can write little programs or functions or whatever and put it into the system. And then people can take it and ya know, do whatever they want with it. Do new things with it.
This one company, Mutable Instruments. They’re in France. All these companies are just like, one guy. There’s like a guy who designs the modules and he tells a guy how many knobs. But it’s just those guys. There’s no team. So he (the guy in France) started doing digital modules, which incorporate computer programming within them. All his stuff is open source too. So, you can take his code and make the module, or add more stuff to it, or change it. There’s an open source community spirit to it, which is really nice.
LR: Does he actually release it under an open source license?
LR: (Lisa looks it up online.) Hey cool it’s a Creative Commons: cc-by-sa 3.0 license.
We will be discussing this lawsuit and the Aaron Swartz Day Police Surveillance Project in general (its templates, latest results from Sacramento & other cities in California) at this month’s Raw Thought Salon on March 8th – from 7-9pm.
A “Property of the People” Freedom of Information Act (FOIA) lawsuit has obtained documents proving that Aaron was scooped up in an FBI investigation as far back as 2007, even before the PACER project, in 2008. (Previously we thought the PACER project was the first time the FBI had concerned itself with Aaron.)
Aaron had been erroneously swept up in a 2007 terrorist investigation that, most likely, caused law enforcement agencies (FBI, DOJ) to treat him with rougher hands during its subsequent encounters with him afterwards.
His email was only scooped up because the Feds were probably using National Security Letters (NSLs) to get all of a University Department’s email headers, in bulk, from a computer science department that Aaron had emailed. (More on NSL’s here.) Long story short they enable the FBI to demand information from entities without court approval. (No warrants. No judicial oversight.)
How specifically the FBI came to possess Swartz’s email data remains unclear.
But after reviewing the document and other related files, several legal experts told Gizmodo the most likely explanation was that the FBI had used a National Security Letter (NSL), a ubiquitous tool for obtaining email header data at the time. An NSL would have enabled federal agents to demand access to the data and then impose a gag order to maintain secrecy around the investigation, all without a judge’s approval.
Authorized under the Stored Communications Act, in cases of suspected terrorism or espionage, these letters enable the FBI to seize a variety of electronic records under its own authority. While agents cannot use an NSL to acquire the contents of an email message, the FBI’s notes appear to show that, in Swartz’s case, it sought only “email headers,” data the FBI would argue falls well within the scope of its power to seize.
NSL Letters are over reaching, post-911 creations that we’ve all learned a lot about these last few years because Brewster Kahle at the Internet Archive went public with his experience with them, and then he worked with the ACLU and the EFF to challenge NSLs as being unconstitutional. Here’s a great story about it by Richard Koman for ZDNet, where Brewster Kahle offers a cookbook for fighting security letters:
Just talked to Brewster Kahle at the Internet Archive about their successful settlement with the FBI of a lawsuit over a National Security Letter. The FBI had demanded personal information on a user; the Archive replied with a lawsuit challenging the propriety of the NSL.
We will be discussing this lawsuit and the Aaron Swartz Day Police Surveillance Project in general (its templates, latest results from Sacramento & other cities in California) at this month’s Raw Thought Salon on March 8th – from 7-9pm.
Page 4 – Explains the VPN and “user traffic mixing”
Page 4 – Explains Static IP Address Management
Page 5 – Explains “Virtual Private Servers”
Page 6 – Explains “Point of Presence Locations” to allow personas to appear to originate from different locations
Page 7 – Explains the “Secure Operating Environment”
Page 8 – Says $2,760,000 again
Detailed version of the story:
As I was preparing Barrett Brown and Trevor Timm’s segment from the Aaron Swartz Day Evening Event for publication, and transcribing some of it, I realized that he and Aaron had actually kinda known each other.
This was amazing to me, as I had asked Barrett to start participating in our last two years of Aaron Swartz Day’s because his projects had felt so on-target with Aaron’s concerns and values, not because I knew that they had ever exchanged emails, much less collaborated at any point.
Barrett himself had basically forgotten about it until recently at the end of his talk with Trevor Timm during last year’s Evening Event. As I was transcribing the talk last week, my ears perked up as Barrett explained their interactions:
“He (Aaron) once offered to do an FOIA request on persona management. One of my interests back then. One of these disinformation propaganda methodologies that have come out of the intelligence contract industries, and had been encouraged by various states. Something that I think is very dangerous. So he offered to do his thing on that. To explore the possibilities and see if we could get some information on it. And the interesting thing about that is that I’d sort of forgotten about it until very recently. I’m not sure where that was left. I’m not sure if he got some results back.”
Aha! The hunt was on – I wanted to find this request and find out what information ever came back for it. I still didn’t know what “persona management” was exactly, but I could guess the outcome – sock puppets. I had always known that sock puppets were very dangerous, from the first day I learned of their existence en mass.
I had been working with Muckrock intensely all week myself, filing dozens of public records requests to numerous police and sheriff departments for the Aaron Swartz Police Surveillance Project. (Which just revamped it’s Muckrock templates, by the way.)
I wrote Barrett an email immediately asking what ever came of it. He said, to his knowledge, it was filed, but he didn’t know if anything was ever sent back on it. He sent me this Project PM link on “Persona Management”:
Persona management entails the use of software by which to facilitate the use of multiple fake online personas, or “sockpuppets,” generally for the use of propaganda, disinformation, or as a surveillance method by which to discover details of a human target via social interactions. Various incarnations of this capability have been discovered in the form of patents, U.S. military contracts, and e-mail discussions among intelligence contractors.
My first idea was to go back to the original story by Jason Leopold in Truthout that was published immediately after Aaron’s death. There was a link in it to all of Aaron’s FOIA requests, but it was broken. So I wrote Michael Morisy, founder of Muckrock, and asked him about it. He not only gave me a good link of every FOIA request Aaron Swartz ever filedand what he received back, on Muckrock. But he sent me the exact FOIA request in question, asking about persona management software. As well as the document that came back.
“Hey, I believe this is the same RFP that became public in 2011, though I’m not entirely sure. But yes, that’s exactly what it’s used for; social media accounts are the main vector, and as seen by the recent NYT story on the Israeli firm that Trump campaign approached regarding this, it’s definitely been marketed to entities as a means of influencing elections (in this case, influencing GOP delegates).” (I have reposted the NYT story on our blog .)
This software allows you to have sock puppets on steriods, provide VPNs for masking your geographical location, with the ability to actually pull in feeds from the geographical location you are claiming to be in, so you make the right comments and comment on local posts/issues and such.
Sock puppets and fake personas were not a new invention, of course. But you usually had to be a technical wizard of sorts to be able to pull it off. You would have to actually write things in a certain voice and monitor the input and output feeds of a given location on your own. Perhaps a gifted individual could have 5 or 10 of these going at once, but that would be impressive.
In contrast, using the software described in the RFP that was retrieved from Aaron’s FOIA request on Barrett’s behalf, someone could have dozens or even hundreds of these things going at once; without having to remember everything. In addition, the software could refresh one’s memory about a certain profile before having to interact with a human. And human interactions come so rarely anyway. Most all social media interactions are passive – like email, as opposed to conversations in real time.
From the NY Times story:
“A top Trump campaign official requested proposals in 2016 from an Israeli company to create fake online identities, to use social media manipulation and to gather intelligence to help defeat Republican primary race opponents and Hillary Clinton, according to interviews and copies of the proposals.
The Trump campaign’s interest in the work began as Russians were escalating their effort to aid Donald J. Trump…
The campaign official, Rick Gates, sought one proposal to use bogus personas to target and sway 5,000 delegates to the 2016 Republican National Convention by attacking Senator Ted Cruz of Texas, Mr. Trump’s main opponent at the time.” – NY Times Article by By Mark Mazzetti, Ronen Bergman, David D. Kirkpatrick and Maggie Haberman.
Here’s a clip from the film “The Internet’s Own Boy” – Directed by Brian Knappenberger – which explains the PACER project in more detail. [This is background for our Next Raw Thought Salon on March 8th.]
PACER is the name of the website that lawyers use to retrieve legal documents from current and past court cases. These documents make up the precedents that make up “the law,” yet to access documents on PACER you must have a credit card and pay per page. (Costing a dime or more for *each* page, so you can see how it can add up quickly. )
You can understand why this “pay to see the law” system could present a problem for anyone who doesn’t have a credit card or is unfamiliar with the details of legal proceedings.
Aaron learned of a program which enabled free access to PACER via a small group of libraries across the country, and coordinated with a friend to download millions of PACER documents.
The FBI didn’t like it, and investigated him for a while, including surveillance at his parent’s home. But ultimately it had to let it go, because Aaron hadn’t actually done anything illegal.
Below is a transcription of the PACER Section of “The Internet’s OwnBoy“ (Directed by Brian Knappenberger)
Brewster Kahle – Founder, Internet Archive:
“How can you bring public access to the public domain? It may sound obvious that you would have public access to the public domain, but in fact, it’s not true. So, the public domain should be free to all, but it’s often locked up. There’s often guard cages. It’s like having a National Park but with a moat around it and gun turrets pointed out, in case somebody might want to come and actually enjoy the Public Domain.
One of the things Aaron was particularly interested in was bringing public access to the public domain. It was one of the things that got him into so much trouble.”
Stephen Shultze – Former Fellow, Berkman Center for Internet and Society at Harvard:
“I had been trying to get access to Federal Court records in the United States. What I discovered was a puzzling system, called PACER, which stands for “Public Access to Court Electronic Records.
I started Googling and that’s when I ran across Carl Malamud.”
Narrative: “Access to legal materials in the United States is a 10 billion dollar per year business.”
Carl Malamud – Founder, Public.Resource.org
“PACER is just this incredible abomination of government services. Ten cents a page. It’s this most brain dead code you’ve ever seen. You can’t search it. You can’t bookmark anything. You’ve gotta have a credit card. And these are “public records.”
U.S. District Courts are very important. That’s where a lot of our seminal legislation starts. Civil Rights cases. Patent cases. All sorts of stuff. And journalists and students and citizens and lawyers all need access to PACER and it fights em every step of the way.
People without means can’t see the law as readily as people with that American Express card. It’s a poll tax on access to justice.”
Tim O’Reilly, Publisher
“The law is the operating system of our democracy, and you have to pay to see it? That’s not much of a democracy.”
Stephen Shultze: “They make about 120 million dollars a year on the PACER system and it doesn’t cost anything near that, according to their own records.
In fact, it’s illegal. The E-government Act of 2002 states that the courts may charge “only to the extent necessary” in order to reimburse the costs of running pacer.”
Narrator: “As the founder of Public.Resource.org, Malamud wanted to protest the PACER charges.
He started a program called “The PACER Recycling Project.” People could upload documents they had already paid for to a free database, so others could use them.”
Carl Malamud: “The PACER people were getting a lot of flack from congress and others about public access. And so they put together this system in seventeen (17) libraries across the country, there was free PACER access. That’s one library every 22,000 square miles I believe. So it wasn’t like really convenient.
I encouraged volunteers to join the “thumb drive core” and download docs from the public access libraries and upload them to the PACER recycling site. People take a thumb drive into one of these libraries and they download a bunch of documents and then send em to me. And it was just a joke. In fact if you clicked on “thumb drive core,” the Wizard of Oz, ya know, the munchkins singing, video clip came up.
But of course, I get this phone call from Steve Shultze and Aaron saying “Gee, we’d like to join the Thumb Drive Core.”
Stephen Shultze: “Around that time, I ran into Aaron at a conference. So I approached him and said “hey, I’m thinking about doing an intervention on the PACER problem.”
Narrator: “Shultze had already developed a program that could automatically download PACER documents from the trial libraries. Swartz wanted to take a look.”
Stephen Shultze: “So, I showed him the code. And I didn’t know what would come next, but as it turns out, over the next few hours at that conference. He was off sitting in a corner, improving my code, recruiting a friend of his that lived near one of these libraries to go into the library and to begin testing his improved code, and at some point the folks at the court realized something’s not going quite according to plan.”
Carl Malamud: “And data started to come in, and come in, and come in. Soon there were 760 GB of PACER docs. About 20 million pages.”
Narrator: “Using information retrieved from the trial libraries, Swartz was conducting massive automated parallel downloading of the PACER system. He was able to acquire more than 2.7 million Federal Court Documents. Almost 20 million pages of text.
Carl Malamud: “Now, I’ll grant you that 20 million pages perhaps exceeded the expectations of the people running the pilot access project, but surprising a bureaucrat isn’t illegal.”
They also got the attention of the FBI, who began to stake out Swartz’ parents’ house in Illinois.
Carl Malamud: “I get a tweet from his mother saying ‘Call me!’ And I’m like what the hell’s going on here? So, I finally got a hold of Aaron, and Aaron’s mother is like ‘oh my god FBI, FBI, FBI’ ”
Noah Swartz: “An FBI agent drives down our home’s driveway trying to see if Aaron is like, in his room. And I remember being home that day and wondering why this car was driving down our driveway and just driving back out. That’s weird. Like five years later I read the FBI file and I’m like my goodness – that was the FBI agent, in my driveway.”
Carl Malamud: “He (Aaron) was terrified. He was totally terrified. He was way more terrified after the FBI actually called him up on the phone and tried to sucker him in to coming down to a coffee shop without a lawyer. He said he went home and laid down on the bed, and was shaking.
Narrator: The downloading also uncovered massive privacy violations in the court documents. Ultimately, the courts were forced to change their policies as a result.
And the FBI closed their investigation without bringing charges.
Cory Doctorow: “To this day, I find it remarkable that anybody, even at the most remote podunct field office of the FBI, thought that a fitting use for taxpayer dollars was investigating people for theft on the grounds that they had made the law public. How can you call yourself a “law man,” and think there can possibly be anything wrong in this whole world with making the law public.”
An Effort to Upgrade a Court Archive System to Free and Easy
By JOHN SCHWARTZ
FEB. 12, 2009
Aaron Swartz used a free trial of the government’s Pacer system to download 19,856,160 pages of documents in a campaign to place the information free online. Credit Michael Francis McElroy for The New York Times
Americans have grown accustomed to finding just about anything they want online fast, and free. But for those searching for federal court decisions, briefs and other legal papers, there is no Google.
Instead, there is Pacer, the government-run Public Access to Court Electronic Records system designed in the bygone days of screechy telephone modems. Cumbersome, arcane and not free, it is everything that Google is not.
Recently, however, a small group of dedicated open-government activists teamed up to push the court records system into the 21st century — by simply grabbing enormous chunks of the database and giving the documents away, to the great annoyance of the government.
“Pacer is just so awful,” said Carl Malamud, the leader of the effort and founder of a nonprofit group, Public.Resource.org. “The system is 15 to 20 years out of date.”
Worse, Mr. Malamud said, Pacer takes information that he believes should be free — government-produced documents are not covered by copyright — and charges 8 cents a page. Most of the private services that make searching easier, like Westlaw and Lexis-Nexis, charge far more, while relative newcomers like AltLaw.org, Fastcase.com and Justia.com, offer some records cheaply or even free. But even the seemingly cheap cost of Pacer adds up, when court records can run to thousands of pages. Fees get plowed back to the courts to finance technology, but the system runs a budget surplus of some $150 million, according to recent court reports.
To Mr. Malamud, putting the nation’s legal system behind a wall of cash and kludge separates the people from what he calls the “operating system for democracy.” So, using $600,000 in contributions in 2008, he bought a 50-year archive of papers from the federal appellate courts and placed them online. By this year, he was ready to take on the larger database of district courts.
Those courts, with the help of the Government Printing Office, had opened a free trial of Pacer at 17 libraries around the country. Mr. Malamud urged fellow activists to go to those libraries, download as many court documents as they could, and send them to him for republication on the Web, where Google could get to them.
Aaron Swartz, a 22-year-old Stanford dropout and entrepreneur who read Mr. Malamud’s appeal, managed to download an estimated 20 percent of the entire database: 19,856,160 pages of text.
Then on Sept. 29, all of the free servers stopped serving. The government, it turns out, was not pleased.
A notice went out from the Government Printing Office that the free Pacer pilot program was suspended, “pending an evaluation.” A couple of weeks later, a Government Printing Office official, Richard G. Davis, told librarians that “the security of the Pacer service was compromised. The F.B.I. is conducting an investigation.”
Lawyers for Mr. Malamud and Mr. Swartz told them that they appeared to have broken no laws, noting nonetheless that it was impossible to say what angry government officials might do.
At the administrative office of the courts, a spokeswoman, Karen Redmond, said she could not comment on the fate of the free trial of Pacer, or whether there had been a criminal investigation into the mass download.
The free program “is not terminated,” Ms. Redmond said. “We’ll just have to see what happens after the evaluation.” As for the system’s cost, she said: “We’re about as cheap as we can get it. We’re talking pennies a page.”
Carl Malamud has been leading the effort to push the court records system into the 21st century. Credit Heidi Schumann for The New York Times
Meanwhile, the 50 years of appellate decisions remain online and Google-friendly, and the 20 million pages of lower court decisions are available in bulk form, but are not yet easily searchable. “I want the whole database in 2009,” Mr. Malamud said.
Mr. Malamud, 49, has a long record of trying to balance openness with privacy, and has also pushed the Securities and Exchange Commission and the Patent and Trademark Office to put their records online free. But the issue is a thorny one with court documents, which often contain personal information.
Daniel J. Solove, a professor at the George Washington University Law School, noted that marketers skim court records for personal data, and making records easier to troll will put even more data at risk. “It’s taking away this middle ground that offered a lot of protection, practically, and throwing it into this radically wide open box,” he said.
Newsletter Sign Up
Continue reading the main story
The news and stories that matter to Californians (and anyone else interested in the state), delivered weekday mornings.
You will receive emails containing news content, updates and promotions from The New York Times. You may opt-out at any time.
But this argument for what is known as “practical obscurity” does not convince Peter A. Winn, a privacy expert who is an assistant United States attorney in Washington State. Noting that he was speaking only for himself, he argued that the courts developed rules over the last 400 years to protect privacy.
“It worked in the bricks-and-mortar age — it should work in the electronic age,” Mr. Winn said. The administrative office of the courts, he said, should take on the role of policing privacy on its databases. “This is going to take focus and a lot of hard work,” he said.
Mr. Malamud agrees that the court system needs to do a better job of protecting privacy. He found thousands of documents in which the lawyers and courts had not properly redacted personal information like Social Security numbers, a violation of the courts’ own rules. There was data on children in Washington, names of Secret Service agents, members of pension funds and more.
“They’re pretty spectacular blunders,” he said. He sent letters to the clerks of individual courts around the country. After some initial inaction, and repeated and increasingly spirited notices from Mr. Malamud, most of the offending documents were pulled from the databases to be redacted.
Ms. Redmond, of the administrative office of the courts, said the courts comb through the documents “on a regular basis” and tell lawyers to redact confidential information. The number of violations, she noted, was relatively small.
Mr. Malamud scoffed at that. “This is a large number of transgressions, and this is illegal,” he said. “The law doesn’t say that you should only publish a small number of Social Security numbers!”
Mr. Malamud said his years of activism had led him to set a long-shot goal: serving in the Obama administration, perhaps even as head of the Government Printing Office. The thought might seem far-fetched — Mr. Malamud is, by admission, more of an at-the-barricades guy than a behind-the-desk guy. But he noted that he published more pages online last year than the printing office did.
Mr. Malamud represents a perspective of openness and transparency that is much in tune with the new administration’s, said Lawrence Lessig, a law professor at Harvard who is a leading advocate for free culture. “The principles are those that Carl has been at the center of defining,” he said.
The idea also seems to have a measure of appeal for John D. Podesta, a longtime fan of Mr. Malamud and head of the Obama transition team, who stopped short, however, of anything resembling an endorsement. “He would certainly shake things up,” Mr. Podesta said, laughing.
Mr. Malamud says he is not counting on the new administration’s being quite that bold. Besides, he said, he keeps himself awfully busy doing what he believes the government ought to be doing anyway.
“If called, I will certainly serve,” he said. “But if not called, I will probably serve anyway.”
After scouring social media accounts and all other available information to compile a dossier on the psychology of any persuadable delegate, more than 40 Psy-Group employees would use “authentic looking” fake online identities to bombard up to 2,500 targets with specially tailored messages meant to win them over to Mr. Trump.
The messages would describe Mr. Cruz’s “ulterior motives or hidden plans,” or they would appear to come from former Cruz supporters or from influential individuals with the same background or ideology as a target…
Each approach would “look authentic and not part of the paid campaign,” the proposal promised. The price tag for the work was more than $3 million…
A third document emphasized “tailored third-party messaging” aimed at minority, suburban female and undecided voters in battleground states. It promised to create and maintain fake online personas that would deliver messages highlighting Mr. Trump’s merits and Mrs. Clinton’s weaknesses or revealing “rifts and rivalries within the opposition.”
Brewster does a great job of explaining to us about Aaron’s “Open Source Life,” and how “bulk downloading” (although it got Aaron into trouble) is in itself, is not only “not a crime,” but a desirable action with outcomes that benefit the public.
He also sheds light on Aaron’s ongoing quests to make U.S. legal court documents (via PACER) and works in the Public Domain (via GoogleBooks) more publicly accessible (rather than locking both up behind paywalls or with cumbersome downloading restrictions).
I learned from Aaron what living an Open Source life was like. I think he really did live that way. He floated and helped others. He gave everything away. He really wasn’t tied to an institution. He really was not a company man in any sense. He was really quite pure in his motivations, and it made him incredibly effective at cutting through a lot of the stuff that most of us deal with.
An open source life.
He was able to keep his self interests at bay, which is kind of remarkable for a lot of us. But he was able to do it. And he was able to communicate well with an open smile and a kind heart. He had a way of communicating with this energy on things that mattered and he had a genius at finding things that mattered to millions of people. There are lots of things to work on, but the things that he worked on were incredibly effective.
We first met, I think, in 2002 at the Eldred Supreme Court case in Washington DC, where we drove a Bookmobile Across, celebrating the Public Domain by giving away books that kids made, and also then at the Creative Commons Launch. But I really got to know Aaron when he said ‘I’d really like to help make the Open Library website with the Internet Archive’ to go and give books and integrate books into the Internet itself. And he said “I’ve got this cool technology, called “Infogami,” it really made it possible to make Reddit happen. Let’s use it again for this other thing.”
And it was wonderful to work with him, but it was really unlike working with anybody else I’ve ever met. You certainly couldn’t tell him what to do, he just kind of did what was the right thing to do, and he was right certainly a lot more often than I was. We also worked together in other areas, when he was a champion of open access, especially of the Public Domain. Bringing public access to the Public Domain.
Most people think that’s kind of an obvious thing. Doesn’t “the Public Domain” mean that it’s publicly accessible? Of course all of us say “No!” It’s sort of like there are these National Parks, with moats and walls and guns turrets sort of pointing out, in case someone wanted to come near the Public Domain. And Aaron didn’t think this was right. And he spent a lot of time and effort freeing these materials.
One of the first ones that we were actively working together on was freeing government court cases, so that anybody could see this without having to have special privilege or money, and also to make it so you could data mine it, and go and look at these things in a very different way. So he freed and liberated a lot of court cases from the PACER system, and uploaded them, in bulk, to the Internet Archive, so that people could have access to these. There are now 4 Million documents, from 800,000 cases that have been used by 6 million people, because of the project that Aaron Swartz and others helped start.
It was an interesting project because it went over many different organizations, each playing a role and all cooperating in a very non-corporate way. It was a very Aaron style way of making things happen. And the idea of making court documents and legal documents available more easily struck a chord with me because, in college, I was trying to figure out how I was gonna try to get out of the draft. And my college didn’t have a legal collection, and the only way that I could try to get to legal court documents was to get an ID from my professor and break in to the Harvard Law Library to go and read court documents. That sucked! It really makes no sense, and Aaron not only sort of saw that it doesn’t make sense. He decided he was going to try to help solve this. Not just for himself, but for everyone.
Then there was other Public Domain collections like the Google Books Collection. Google Books was a library project to go and digitize lots and lots of books. A lot of them were Public Domain. Google would make them available from their website, but really really painfully. It would make it so if you wanted one book, you could get one book. If you wanted 100 books, they would turn off your IP address forever. This is no way to have public access to the Public Domain, and the Internet Archive started getting these uploads of “Google Books.” Going faster, and faster, and faster. Like well, where are these coming from? Well it turns out it’s Aaron. He and a bunch of friends figured out that they could go and get a bunch of computers to go slowly enough to just clock through tons of Google Books and upload them to the Internet Archive. Interestingly, Google never got upset about it. The libraries, on the other hand, grumbled. Which is so… Well anyway. They’ll get over it.
So, when this started happening, we said “Ok. What’s going on? Should we be concerned?” The answer was “No, it’s Public Domain.” We just made sure that we got the cataloging data right, and we linked back to Google, so that if you’re on the book, you can go back to the original page and see the da da da da da. And it all worked well.
But there it was. Aaron doing it again; bringing access to the Public Domain.
What is crushing to me is that Aaron got ensnared by the Federal Government for doing something that the Internet Archive actively encourages others to do for our collections, and we think all libraries should encourage, which is: Bulk downloading to support data mining and other research using computers. This is just the way the world works.
The first step is for a computer to read and analyze materials is to download a set of documents. When Aaron did this from one library, JSTOR, they strongly objected, and demanded that MIT find and stop that user, which then led U.S. Prosecutors to pull out their worst techniques.
Did anybody stop to ask if bulk downloading is a crime? I say “No. Bulk downloading is not, in itself, a crime.” Let’s stop this practice of discouraging bulk downloading, because there are encouraging projects that are learning amazing new things by having computers be part of the research process. Let’s not stop this and discourage young people from coming up with new and different ways to learn things from our libraries.
What resulted, in this case, was tragic, and not necessary. Really, what we want is computers to be able to read. Aaron knew this. We’re all building this, and he got ensnared anyway. Let’s let our computers read.
Because of this tragedy, JSTOR, whom I talked to this morning, and the Internet Archive, have agreed to meet to discuss the broad issue of data mining and web crawling. I hope that we really make progress. At least there’s reasons to be positive.
This assault on Aaron would disillusion, discourage and depress any principled young man, and if there ever was a principled young man, it was Aaron Swartz.
We miss you, and we will carry on your important work.