Have YOUR Books Been Used to Train AI? – by James M. Walsh, Esq.

Have YOUR Books Been Used to Train AI? – by James M. Walsh, Esq.

“In 2023, a group of educational and professional publishers, including Macmillan Learning and McGraw Hill, sued LibGen. This time the court ordered LibGen to pay $30 million in damages, in what TorrentFreak called ‘one of the broadest anti-piracy injunctions we’ve seen from a U.S. court.’ But that fine also went unpaid, and so far authorities have been largely unable to constrain the spread of these libraries online. Seventeen years after its creation, LibGen continues to grow.”The Atlantic


A vestige of the Cold War era, Library Genesis (or LibGen) has its roots in clandestine information sharing, and is a massive collection of millions of scholarly books, audiobooks, journals, general-interest novels, comics, etc. LibGen is a Soviet Union / Netherlands-based shadow library project borne out of underground document sharing that Mikhail Gorbachev legalized in1980. Until that point, printed media was strictly controlled by the state and it remains primarily state controlled today. Today, the Russian internet is known as RuNet.

Library Genesis is not without its detractors. It has been described as “a digital warehouse of stolen intellectual property, neatly stacked with pirated books, academic papers, and various works authors and publishers never approved.” The Authors Guild, a plaintiff in current litigation over LibGen, released “Meta’s Massive AI Training Book Heist: What Authors Need to Know” on March 20, 2025.

In separate court filings (California & New York), it is alleged that Meta Platforms, Inc. and other tech giants (Microsoft Corporation & OpenAI, LLC et al.) availed themselves of either LibGen’s massive archives or, as Open AI has admitted, that it has “trained” its LLMs on “large, publicly available datasets that include copyrighted works to train its Artificial Intelligence generative programs. Meta’s AI software program is known as LLaMA – Large Language Model Meta AI (Llama 3). Venerated authors such as John Grisham, Scott Turow, and David Baldacci have joined in the fray as plaintiffs in New York.

“LibGen hosts more than 7.5 million books and 81 million research papers, making it one of the largest online libraries of pirated work in the world.” The federal court is contemplating class action status in pending litigation in the Northern District of California. It appears that Plaintiffs (Authors, et al.) are to be automatically included in this ongoing litigation unless they opt out. (The presiding Judge has yet to certify class action status.) The case boils down to a single question. “On Meta’s motion, the court dismissed all the claims with the exception of plaintiffs’ assertion that Meta’s alleged unauthorized copying of the plaintiffs’ books to train LLaMA constituted direct copyright infringement, which defendant (Meta) did not move to dismiss. The dismissal of all the claims save the negligence claim was granted with leave to amend; the negligence claim was dismissed with prejudice.” Until now, fines and injunctions against rogue, underground online libraries have been unable to constrain or deter their proliferation.

The defense to the remaining direct copyright infringement claim will rest upon whether Meta’s harvesting of LibGen data to train LLaMA constitutes Fair Use under Federal Copyright law. Ditto for the data harvesting from copyrighted works in the matter of Microsoft and Open AI in New York’s Southern District.

WHAT IS FAIR USE?

Fair use is an affirmative defense to Copyright infringement. Generally, an individual or corporation may use copyrighted material for purposes of commentary, criticism, education, scholarship, or research.

Fair use is determined on a case-by-case basis. Factors considered in determining whether something constitutes fair use include:

  • The purpose and character of the use, including whether such use is of a commercial nature, or is for non-profit educational purposes;
  • The nature of the copyrighted work;
  • The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  • The effect of the use upon the potential market for or value of the copyrighted work.

The biggest hurdle for Microsoft, Open AI, and Meta Platforms, Inc. will be the fact that the development of generative AI is a concerted, commercial, for-profit undertaking, and not designed for non-profit educational purposes. Microsoft has invested 13 Billion into Open AI and is alleged to own a 49% stake in the tech behemoth. Internal communications obtained in discovery clearly indicated that Meta employees were acutely aware that harvesting data from LibGen to “educate”’ its generative AI software posed substantial legal risks. Meta intentionally removed Copyright Management Information (CMI) from harvested works in the California litigation.

“The use of copyrighted works to train AI systems remains [hotly] contested legal territory. Both AI developers and creators have valid interests at stake. There is a clear need to balance technological innovation with sustainable models for original content creation.

“Finding the right balance between these interests will likely require a combination of legal precedent, new business models and thoughtful policy development.”

Publishing giant HarperCollins has undertaken licensing agreements with AI generative firms. Others have followed this direction. A pressing question raised by the Authors Guild is how to share royalties from the publishing houses licensing authors published works? Licensors were quick to formulate a 50-50 split of revenues, but the Author’s Guild suggests that a 75-25 split in flavor of authors is more equitable.

Forbes notes that Meta Platforms, Inc.’s market capitalization is pegged at around 1.8 trillion dollars as of March 25, 2025. Surprisingly, its coverage of Meta’s harvesting of data to feed its Llama 3 is squarely in favor of compensating authors.

Copyright law can be dry and esoteric. Its origins date back to General George Washington and the founding of our nation. It has evolved ever so slowly. The advent of AI generative software is certain to bring much needed change and clarification.

As Dan Pontefract so aptly noted in his Forbes expose, “Companies must develop sustainable, lawful partnerships with content creators, authors, publishers, and the like. The tech companies must be put into a position to respect copyrights, intellectual property, and the simple human dignity behind creative effort. Innovation cannot excuse exploitation.”

An admonishment by the Authors Guild to authors is to specifically proscribe AI harvesting of published works on their Copyright page. “No AI Training.” That’s akin to wearing a belt and suspenders, I suppose. In my opinion, federal Copyright law clearly protects authors whose data and publications are being harvested by unscrupulous big tech.

We shall soon see how the U.S. District Courts for the Northern District in California and Southern District in New York rule as decisions are expected this summer.

If you believe Meta or another firm has used your book to train their AI, the Authors Guild provides a form letter for authors to use when contacting those companies.

The Atlantic has a searchable database for authors to learn if their work was used. Unfortunately, you have to pay to search the database.

RELATED

Maximum Impact by Leo A. Murray & James M. Walsh Esq.JAMES M. WALSH, ESQ. is a former Navy JAGC officer and a recipient of the American Bar Association’s coveted LAMP Award for excellence in military legal assistance practice. A rolling stone, J.M. has globetrotted most of his adult life. After the military, J.M. pursued commercial real estate development, leasing, and asset management. He resides in Catania, Sicily. He spent almost twenty years in the Commonwealth of Pennsylvania’s Luzerne, Erie & Lackawanna Counties. His handiwork as an editor and author is interspersed throughout this novel. Leo A. Murray fondly refers to J.M. as his collaborative, literary ‘Coach’ or ‘Lieutenant.’ Agnes claims that he has gypsy in his heart and rabbit in his feet.

James’ thriller, Maximum Impact, written with co-author Leo Murray, was published by Abuzz Press.



HAVE A QUESTION ABOUT SELF-PUBLISHING A BOOK?

Angela is not only the publisher of WritersWeekly.com. She is President & CEO of BookLocker.com,
a self-publishing services company that has been in business since 1998. Ask her anything.

ASK ANGELA!



>>>Read More WritersWeekly Feature Articles<<<





FREE ADVERTISING!!


DO YOU PAY WRITERS? We'll post your ad for free (provided you pay respectable wages). Send your ad to Angela here: https://writersweekly.com/contact-angela



7.625 STRATEGIES IN EVERY BEST-SELLER - Revised and Expanded Edition


At this moment, thousands of would-be authors are slaving away on their keyboards, dreaming of literary success. But their efforts won’t count for much. Of all those manuscripts, trade book editors will sign up only a slim fraction.

And of those titles--ones that that editors paid thousands of dollars to contract, print and publicize--an unhealthy percentage never sell enough copies to earn back their advances. Two years later, most will be out of print!

Acquisition Editor Tam Mossman shares seven essentials every book needs to stay in print, and sell!



Read more here:


https://writersweekly.com/books/5635.html







HowMaster: The Writer's Guide to Beautiful Word Crafting




Author Linda M. Gigliotti draws from years of practice as a private
writing tutor in the guidebook that teaches writers how to format visceral
writing that pulls readers into their book. She explains with instruction
and samples of published works how to craft writing that come to life in the reader's mind.





HowMaster is a wise choice for the writer who wants to weave words around the reader’s heart.


Author Linda M. Gigliotti draws from years of practice as a private
writing tutor in the guidebook that teaches writers how to format visceral
writing that pulls readers into their book. She explains with instruction
and samples of published works how to craft writing that come to life in the reader's mind.




Read more here:


http://booklocker.com/books/2304.html





HAVE A QUESTION ABOUT SELF-PUBLISHING A BOOK?

Angela is not only the publisher of WritersWeekly.com. She is President & CEO of BookLocker.com,
a self-publishing services company that has been in business since 1998. Ask her anything.

ASK ANGELA!



 

Leave a Reply

Your email address will not be published.