Past Events
Workshop 28: Friday, October 25 U. of California Berkeley (Pamela Samuelson and Rebecca Wexler)
Katrina Geddes, In response to the threats posed by new copy-reliant technologies, copyright owners often demand stronger rights. Frequently this results in the overprotection of copyrighted works and the suppression of lawful user expression. Generative AI is shaping up to be no different. Owners of copyrighted training data have asked the courts to find AI outputs to be infringing in the absence of substantial similarity, and to prohibit unlicensed training despite its extraction of unprotectable metadata. Service providers automatically block or modify user prompts that retrieve copyrighted content even though fair use is a fact-specific inquiry. These trends threaten to undermine the democratic and egalitarian potential of generative AI. Generative AI has the capacity to democratize cultural production by distributing powerful and accessible tools to previously excluded creator communities. Ordinary individuals can now create sophisticated synthetic media by modifying, remixing, and transforming cultural works without any artistic training or skills. This radically expands the range of individuals who can engage in aesthetic practice, irrespective of the legal status or exchange value of the resulting output. To date, however, the democratic and egalitarian character of generative AI has been relatively under-theorized. Lawmakers are focused on averting two possible outcomes: the extinction of human artists, or the flight of technological capital to low-IP jurisdictions. As copyright owners and technology firms dominate public discourse, relatively little attention is paid to the expressive interests of users. This Article remedies that neglect by directing scholarly attention to the democratizing effects of generative AI. It suggests that jurists should not rush to pacify owners of copyrighted training data by enjoining generative models, or pressuring service providers to adopt unnecessary use restrictions. Instead, Congress should embrace the democratic and egalitarian potential of generative AI by protecting users from the chilling effects of infringement liability. This Article canvasses a range of options directed towards this objective, including a non-commercial use provision, a compulsory licensing regime, a DMCA-style safe harbor, and a presumption of user authorship of AI generations.
Peter Henderson, Artificial intelligence (AI) model creators commonly attach restrictive terms of use to both their models and their outputs. These terms typically prohibit activities ranging from creating competing AI models to spreading disinformation. Often taken at face value, these terms are positioned by companies as key enforceable tools for preventing misuse, particularly in policy dialogs. The California AI Transparency Act even codifies this approach, mandating certain responsible use terms to accompany models.
But are these terms truly meaningful, or merely a mirage? There are myriad examples where these broad terms are regularly and repeatedly violated. Yet except for some account suspensions on platforms, no model creator has actually tried to enforce these terms with monetary penalties or injunctive relief. This is likely for good reason: we think that the legal enforceability of these licenses is questionable. This Article provides a systematic assessment of the enforceability of AI model terms of use and offers three contributions.
First, we pinpoint a key problem with these provisions: the artifacts that they protect, namely model weights and model outputs, are largely not copyrightable, making it unclear whether there is even anything to be licensed.
Second, we examine the problems this creates for other enforcement pathways. Recent doctrinal trends in copyright preemption may further undermine state-law claims, while other legal frameworks like the DMCA and CFAA offer limited recourse. And anti-competitive provisions likely fare even worse than responsible use provisions.
Third, we provide recommendations to policymakers considering this private enforcement model. There are compelling reasons for many of these provisions to be unenforceable: they chill good faith research, constrain competition, and create quasi-copyright ownership where none should exist. There are, of course, downsides: model creators have even fewer tools to prevent harmful misuse. But we think the better approach is for statutory provisions, not private fiat, to distinguish between good and bad uses of AI and restrict the latter. And, overall, policymakers should be cautious about taking these terms at face value before they have faced a legal litmus test.
Workshop 27: Friday, September 20 Northwestern University (Jason Hartline and Dan Linna)
Melissa Dutz, Han Shao, Avrim Blum, Aloni Cohen, A machine learning theory perspective on strategic litigation
Shraeya Iyer, Dan Linna, Jaromir Savelka, Hannes Westermann, A Systematic Review of Evaluation of Artificial Intelligence for Legal Tasks
Workshop 26: Friday, May 17 Tel Aviv + Hebrew Universities (Inbal Talgam-Cohen and Katrina Ligett)
Lauren Scholz, McConnaughhay and Rissman Professor, Florida State University, Algorithmic Contracts: Algorithmic contracts are contracts in which an algorithm determines a party’s obligations. Some contracts are algorithmic because the parties used algorithms as negotiators before contract formation, choosing which terms to offer or accept. Other contracts are algorithmic because the parties agree that an algorithm to be run at some time after the contract formation will serve as a gap-filler. Such agreements are already common in high speed trading of financial products and will soon spread to other contexts. However, contract law doctrine does not currently have a coherent approach to describing the creation and enforcement of algorithmic contracts. This Article fills this gap in doctrinal law and legal literature, providing a definition and novel taxonomy of algorithmic contracts. Fiduciary Boilerplate: Locating Fiduciary Relationships in Information Age Consumer Transactions: The result of applying general contract principles to consumer boilerplate has been a mass transfer of unrestricted rights to use and sell personal information from consumers to companies. This has enriched companies and enhanced their ability to manipulate consumers. It has also contributed to the modern data insecurity crisis. Consumer transactions should create fiduciary relationships between firm and consumer as a matter of law. Recognizing this fiduciary relationship at law honors the existence of consumer agreements while also putting adaptable, context-sensitive limits on opportunistic behavior by firms. In a world of ubiquitous, interconnected, and mutable contracts, consumers must trust the companies with which they transact not to expose them to economic exploitation and undue security risks: the very essence of a fiduciary relationship. Firms owe fiduciary duties of loyalty and care to their customers that cannot be displaced by assent to boilerplate. History, doctrine, and pragmatism all support this position.
Dylan Hadfield-Menell, Bonnie and Marty (1964) Tenenbaum Career Development Assistant Professor of EECS, MIT, Incomplete Contracting and AI Alignment: We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding the AI alignment problem and help to generate a systematic approach to finding solutions. We first provide an overview of the incomplete contracting literature and explore parallels between this work and the problem of AI alignment. As we emphasize, misalignment between principal and agent is a core focus of economic analysis. We highlight some technical results from the economics literature on incomplete contracts that may provide insights for AI alignment researchers. Our core contribution, however, is to bring to bear an insight that economists have been urged to absorb from legal scholars and other behavioral scientists: the fact that human contracting is supported by substantial amounts of external structure, such as generally available institutions (culture, law) that can supply implied terms to fill the gaps in incomplete contracts. We propose a research agenda for AI alignment work that focuses on the problem of how to build AI that can replicate the human cognitive processes that connect individual incomplete contracts with this supporting external structure.
Workshop 25: Friday, April 19 Ohio State (Bryan Choi)
Sarah Cen, Ph.D. Candidate, Electrical Engineering and Computer Science, MIT, The Promises and Challenges of AI Auditing: A Demonstration of Counterfactual Audits Using Black-Box Access: Auditing is the process of evaluating the properties of a system, often to determine whether it satisfies a predetermined set of criteria. Auditing is an essential ingredient of AI oversight and accountability. Without the ability to systematically and consistently test—or audit—for compliance, AI regulations are impossible to enforce. Beyond compliance testing, auditing also plays several important roles. Perhaps most fundamentally, it allows for the independent verification of developers’ claims that would otherwise go untested. It can also be used to certify whether an AI technology meets industry standards(e.g., privacy standards) that matter to downstream users (e.g., customers). In this way, auditing not only plays an important role in AI accountability, but also takes an important step toward developing trustworthy AI. In this talk, we will discuss the history of auditing, the state of AI auditing, and open problems. In particular, we will start off with the "access question": What type of access to an AI system is needed to efficiently and effectively audit? We'll then discuss the benefits and limitations of black-box auditing. Finally, we'll demonstrate a class of audits that we call "counterfactual audits" and how they can be executed given only black-box access to an AI system.
William H. Widen, Professor of Law, University of Miami School of Law, Verification and Validation of AI-Enabled Vehicles in Theory and Practice: Challenges for Corporate Governance: This presentation will discuss challenges for corporate governance in light of the difficulties associated with verification and validation of an ADS which uses machine learning. In particular, the presentation discusses problems associated with the generic requirement of "safer than a human driver" which has started to appear in law and regulations in some jurisdictions. Even in the absence of a law or regulation, a corporation deploying ADS in Level 4 or 5 vehicles will need to confront its own standards for deployment. The presentation advances tentative recommendations of a way forward in the absence of conventional verification and validation procedures using repeatable tests.
Workshop 24: Friday, March 22 Cornell (James Grimmelmann)
Sarah B. Lawsky, Northwestern University Pritzker School of Law, Reasoning with Formalized Statutes: The Case of Capital Gains and Losses, 43 Va. Tax Rev. (forthcoming 2024): This Article formalizes sections of the Internal Revenue Code—that is, represents them symbolically—and then reasons with these formalizations algebraically, graphically, and, in a novel approach for U.S. legal scholarship, using a computer program that proves theorems. Reasoning with the formalizations reveals previously overlooked errors in the statute; demonstrates equivalence between the actual law and facially dissimilar administrative implementations of the law; and uncovers technical corrections in these administrative implementations that the Internal Revenue Services has not openly acknowledged. The Article thus shows how reasoning with formalized statutes leads to insights that may be otherwise obscured by law’s complexity
If you'd like to read ahead of time, the paper is available here: https://github.com/slawsk/tax-formalization/blob/main/FormalizationReasoningPaper.pdf
David Stein, Founder and Partner at Darby Creek Advisors LLC, Hot Apps: Recalibrating IP to Address Online Software, 2024 Wis. L. Rev. (forthcoming): Current IP rules do not work for online consumer software. Software-specific IP doctrine formed during the era of installable software, which has high upfront costs and is easy to copy. IP rights helped companies recoup development costs by granting them the exclusive right to make and sell copies. But online software has low upfront costs and is not susceptible to copying, rendering IP protection unnecessary. Limits on software IP were designed to foster competition by letting market entrants replicate the interfaces of incumbent products. Online, copying incentives point in the other direction. The limits on software IP let incumbents raise barriers to entry by copying from newcomers. The net effect is an IP regime that exacerbates preexisting tendencies towards market concentration and depressed innovation in markets for online consumer services.
Given the growing role online services play in data collection, commerce, and speech, these broken innovation and competition incentives have far-reaching effects. Fixing those incentives is urgent. Policymakers and commentators blame the concentration of online services on structural market failures and turn to antitrust remedies for solutions. This pervasive narrative focuses on a symptom, not the cause. I argue that tech concentration is an artifact of IP law’s failure to keep up with technology.
This article proposes a program for IP reform: we should replace the trade-motivated aspects of software IP law with expanded trade regulation. Drawing on common-law misappropriation as a model, I sketch one politically pragmatic option for implementing those reforms.
Beyond this article’s focus on software innovation, it serves as a case study describing the mechanics behind a law falling out of sync with technology. As such, it may help policymakers avoid similar legislative and regulatory pitfalls as they regulate emerging and fast-changing technologies.
If you'd like to read ahead of time, the paper is available here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4546335
Workshop 23: Friday, February 23 Cal Berkeley (Rebecca Wexler)
Colleen V. Chien, UC Berkeley School of Law, and Miriam Kim, Partner at Munger, Tolles, & Olson in San Francisco, 100 Ways that Generative AI Can Address the Access to Justice Gap: How can AI tools be used to address the access to justice gap - the 90% of low-income Americans that lack adequate legal assistance? To find out, we surveyed 200 legal aid professionals about their usage and attitudes towards AI tools, and conducted a randomized controlled trial of 91 individuals. While all participants were given free access to paid generative artificial intelligence tools, a subset of participants, chosen randomly, were provided “concierge” services including peer use cases, office hours, and assistance. Before the trial, women in the pilot were less likely to have used the tools or think they were beneficial. At the end of the trial, however, male and female participants reported almost no significant differences across a wide variety of metrics reflecting usage patterns, benefits, and planned use. In addition, participants that received “concierge” services reported statistically significant better outcomes on a range of metrics as compared to the control group, suggesting that assistance in the rollout of AI tools can improve experiences with them. We discuss our findings and, to support the broader use of these tools, we publish a companion database of over 100 helpful use cases and existing use policies, including prompts and outputs, provided by legal aid professionals in the trial.
Yonadav Shavit, et. al., Harvard School of Engineering and Applied Sciences, Practices for Governing Agentic AI Systems: Agentic AI systems—AI systems that can pursue complex goals with limited direct supervision—are likely to be broadly useful if we can integrate them responsibly into our society. In this talk, we will overview a recent policy whitepaper on best practices for governing agentic AI systems and addressing some of the specific risks specifically caused by agentic AI systems. We will start by defining agentic AI systems and the parties in the agentic AI system life-cycle, and highlight the necessity of agreeing on a set of baseline responsibilities and safety best practices for each of these parties. We will then discuss an initial set of practices for keeping AI agents’ operations safe and accountable, which we hope can serve as building blocks in the development of agreed baseline best practices. The paper itself also enumerate the questions and uncertainties around operationalizing each of these practices that must be addressed before such practices can be codified, as well as a range of indirect impacts from the wide-scale adoption of agentic AI systems which may not be addressed by such best practices.
If you'd like to read ahead of time, the whitepaper is here: https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdfSarah Barrington, University of California at Berkeley, Hany Farid, University of California at Berkeley, Rebecca Wexler, University of California at Berkeley School of Law, AI Baselines in Evidence Law: AI forensic systems -- from face recognition to gunshot detection to probabilistic DNA analysis and more -- have potential to outperform human expert witnesses in analyzing evidence from crime scenes. Yet they also have potential to err, with high stakes consequences for the safety of communities and the life and liberty of the accused. What standards should apply to help ensure that accurate, reliable, and accountable AI evidence can be admitted in court while minimizing the harm from erroneous results? This talk will first present results from a study showing that a state-of-the-art AI system for photogrammetry and human expert photogrammetrists both performed worse at identifying a person's height and weight from a photograph than did non-experts hired on mechanical turk to perform the same task. The talk will then argue that, before admitting evidence from either AI or human experts, courts should assess not merely the proficiency of the AI or human expert, but also the proficiency of non-experts performing the same task. If AI or human experts do not substantially improve over non-expert baselines, courts should exclude the expert evidence as failing to satisfy Federal Rule of Evidence 702's requirement that experts must "help" the trier of fact.
Workshop 22: Thursday, January 18 Georgetown (Paul Ohm)
Ayelet Gordon-Tapiero, Hebrew University, Benin School of Computer Science and Engineering, Just the Two of Us: Generative AI & Unjust Enrichment: In the growing landscape of generative artificial intelligence, large language models (LLMs) have emerged as transformative entities, wielding the power to generate human-like text at an unprecedented scale. LLMs are changing the way we work, learn, teach, research, and think. They represent incredible new opportunities, and great challenges. The legal foundation that would allow the operation, existence, and training of these new tools still eludes scholars and policy makers. The problem is thorny yet simple and is rooted in the law of copyright and more specifically, the framework of fair use. The training of LLMs requires huge amounts of data, much of it protected by copyright. If the use of this copyrighted data by companies training LLMs is seen as an infringement of copyright, this will bring an effective end to LLMs as we know them. If, however, it is characterized as fair use, we risk undermining creators’ financial incentive to create. Ironically, this later outcome also likely spells the end of LLMs, and the models will collapse into themselves absent new human content to be trained with. Against this backdrop, this Article is the first to offer a new way forward. The law of unjust enrichment provides a middle ground solution between the two extremes offered by copyright law. In particular, unjust enrichment suggests the following. Companies training LLMs enjoy the labor of human creators, without paying for it. As human creators did not give their work willingly as a gift, and did not waive remuneration for its use, it would be unjust if companies training LLMs were allowed to receive it without payment. To avoid unjust enrichment, companies training LLMs must pay fair compensation equal to the approximate market value of the services they received, if such a bargain can be constructed between the parties as a legal fiction. We suggest this is not only a natural legal response and doctrinally sound, but also allows a fair solution going forward, allowing for the responsible continued development of LLM and technology based upon them. Under this solution, authors enjoy a level of protection that allows them to benefit fairly from their work, without being able to completely prevent the market for LLM training.
Blake Reid, University of Colorado Law School, Section 230's Debts: Much attention has been paid to the unknown First Amendment permissibility of government regulation of the carriage practices of social media platforms. This Symposium, held on the eve of the Supreme Court’s hearing of the NetChoice cases, poses the million-dollar First Amendment question: “Can the government permissibly dictate what types of content platforms publish?” Because the potential for a significant shift in the First Amendment’s application to platform regulations in NetChoice makes this an impossibly difficult time to pin down a definitive answer, this Essay begins with a slightly different question: how did the First Amendment stakes in NetChoice get so high? This Essay identifies a long-standing gap in the Supreme Court’s First Amendment jurisprudence for platform regulation following its decision in ACLU v. Reno. This Essay attributes that gap to
both interpretive and legislative debts incurred by Section 230 of the Communications Act that have effectively obviated the development a substantive law of platform regulation. This Essay explores three cautionary tales for paying down Section 230’s debts: copyright law, the Fight Online Sex Trafficking Act (FOSTA), and the Florida and Texas social media laws at issue in NetChoice. Each case study highlights fundamental challenges for the tripartite gauntlet of substantive law, the First Amendment, and Section 230 itself that courts and legislatures must run to regulate platform carriage and moderation decisions. The draft paper is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4624865.Alicia Solow-Niederman, George Washington University Law School, Do Cases Generate Bad AI Law?: There’s an AI governance problem, but it’s not (just) the one you think. The problem is that our judicial system is already regulating the deployment of AI systems—yet we are not coding what is happening in the courts as privately driven AI regulation. That’s a mistake. AI lawsuits here and now are determining who gets to seek redress for AI injuries; when and where emerging claims are resolved; what is understood as a cognizable AI harm and what is not, and why that is so. This Essay exposes how our judicial system is regulating AI today and critically assesses the governance stakes. When we do not situate the generative AI cases being decided by today’s human judges as a type of regulation, we fail to consider which emerging tendencies of adjudication about AI are likely to make good or bad AI law. For instance, litigation may do good agenda-setting and deliberative work as well as surface important information about the operation of private AI systems. But adjudication of AI issues can be bad, too, given the risk of overgeneralization from particularized facts; the potential for too much homogeneity in the location of lawsuits and the kinds of litigants; and the existence of fundamental tensions between social concerns and current legal precedents. If we overlook these dynamics, we risk missing a vital lesson: AI governance requires better accounting for the interactive relationship between regulation of AI through the judicial system and more traditional public regulation of AI. Shifting our perspective creates space to consider new AI governance possibilities. For instance, litigation incentives (such as motivations for bringing a lawsuit, or motivations to settle) or the types of remedies available may open up or close down further regulatory development. This shift in perspective also allows us to see how considerations that on their face have nothing to do with AI – such as access to justice measures and the role of judicial minimalism – in fact shape the path of AI regulation through the courts. Today’s AI lawsuits provide an early opportunity to expand AI governance toolkits and to understand AI adjudication and public regulation as complementary regulatory approaches. We should not throw away our shot. The draft paper is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4680641.
Workshop 21: Friday, December 15 Penn (Christopher Yoo)
Giovanna Massarotto, University of Pennsylvania, A Computer Science Approach to Antitrust: the Case of Cheat Tolerant Cartels: Although cartels represent the most severe antitrust conduct as they imply competitors fix prices or allocate markets, cartel agreements are increasingly difficult to detect and prosecute. Cartels are unstable because cartel members have high incentives to cheat, and courts typically require mechanisms to detect cheating to prosecute a cartel. Interestingly, computer science shows that computers connected in a network can coordinate their activity according to a common scheme and reach consensus on a bit of information like a price by tolerating cheating. Computer scientists use the 'Byzantine Generals Problem' as an analogy for the coordination problem among computers in a distributed network by considering that some computers might be unreliable 'traitors'. The Byzantine Generals Problem is a 'trust' problem. In my paper, I use this problem and its solutions to analyze how theoretically computers can build strong cartels that tolerate cheating, thus unreliable computers, which the present cartel law in the U.S. does not capture. These solutions/mechanisms work in abstract, thus potentially in a computer as well as in a non-computer situation. I propose to include what I call 'cheat tolerance mechanisms' among the plus factors that courts presently use to detect cartels. Using the case of cheat tolerant cartels, I propose a computer science approach to complement the present law-and-economics antitrust analysis. A computer science approach provides a complementary and potentially more reliable approach for the present antitrust enforcement. Here is the link to the paper. Thank you for any inspirations you may have time to share: Using Computer Science to Detect Cheat Tolerant Cartels by Giovanna Massarotto :: SSRN
Chris Callison-Burch, University of Pennsylvania, Understanding Generative Artificial Intelligence and Its Relationship to Copyright: Generative AI had its breakthrough moment last November with the release of OpenAI’s ChatGPT. I'll give an overview of how generative AI systems are trained (aiming to make my explanation accessible to non-computer-scientists), review the points of intersection with generative AI and copyright law and related laws like right-of-publicity. I'll argue that training of AI systems on copyrighted works is likely to be fair use, and that outputs of AI systems can infringe copyright. I'll present research from my former PhD student Daphne Ippolito analyzing factors that affect an AI system's propensity to 'memorize' its training data (and thus its ability to create copyright infringing works).
Workshop 20: Friday, November 17 Boston University (Ran Canetti)
Rory Van Loo, Boston University, Amazon's Pricing Paradox: Antitrust scholars have widely debated the potential paradox of Amazon seemingly wielding monopoly power while offering low prices to consumers. A single company's behavior thereby helped spark a vibrant intellectual conversation as scholars debated why Amazon's prices were so low, whether enforcers should intervene, and, eventually, how the field of antitrust should be reformed. One of the main sources of agreement in these and other scholarly conversations has long been that Amazon offered low prices. Without weighing in on which side of the antitrust debate was correct, this Article argues that Amazon may have long charged higher prices than is commonly understood, but using strategies that do not depend on monopoly power. More importantly, unraveling the disconnect between perception and reality yields broader insights. One of the reasons why perceptions of Amazon's pricing have remained disconnected from reality is that conversations about regulating Amazon have paid inadequate attention to behavioral economics. Behavioral economics reveals how the company leverages its sophisticated algorithms and large datasets to build a marketplace of consumer misperception by, for instance, making it difficult to find the lowest prices. Such practices undermine competition, in the uncontroversial economic sense of the word. But these practices reside in the domain of consumer law, not antitrust. Thus, a behavioral consumer lens is necessary to see that what was originally framed as an antitrust paradox is better viewed as a pricing paradox. To see the full set of concrete legal solutions for promoting competition in Amazon's marketplace and beyond, it will be important to move consumer law out of antitrust's shadows. These two bodies of law operating at full force offer the best chance for an era of open retail. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4436546 Joint work with Nikita Aggarwal.
Shlomi Hod, Boston University, Co-Designing the Pilot Release of Israel’s National Registry of Live Births: Reconciling Privacy with Accuracy and Usability: In Q4 2023, the Israeli Ministry of Health will make a pilot release of the National Registry of Live Births for the year 2014. The data holds significant value for various stakeholders for multiple purposes, including demographic analysis, scientific research and policy-making. Nonetheless, releasing such data poses a privacy challenge, because medical and governmental data contain sensitive information on birthing people and newborns. In this talk, we present how we co-designed the data release together with the stakeholders to fulfill four main requirements: (1) affordance of tabular format via synthetic data; (2) data quality concerning statistical analysis; (3) record-level faithfulness between released data and original data; and (4) privacy as a rigorous state-of-the-art guarantee and as stakeholders’ expectation. Ultimately, we will discuss the outlook for co-design approaches for based data releases with differential privacy. Joint work with Ran Canetti.
Workshop 19: Monday, October 23 UCLA (John Villasenor)
Christa J. Laser, Assistant Professor at Cleveland-Marshall College of Law, Legal Issues in Blockchain, Cryptocurrency, and Non-Fungible Tokens (NFTs): Judge Easterbrook argued in 1996 that there is no more need for a “Law of Cyberspace” than there ever was for a “Law of the Horse.” Rather, existing laws spanning multiple fields often are sufficient to cover niche factual applications and even new technological change. The same is true now of “The Law of Blockchain.” Nonetheless, blockchain marketplace participants lack any cohesive analysis to turn to that is neutral as to outcome and performs a comprehensive analysis spanning the multitude of laws affecting the whole ecosystem. We might not need a “Law of Blockchain,” yet this article hopes to shed light on the wide scope of existing laws that apply to this new technological era. Assets on the blockchain have ballooned to billions of dollars, stored everywhere from Bitcoin and Ethereum, to Bored Ape Yacht Club and Lazy Lion NFTs, to new coins, decentralized finance, and play-to-earn gaming. Yet regulators are only just catching up to the complexities of “Web 3.0” and, for many, it can feel like a Wild West. Prospectors, shills, and fraudsters abound, as do innovative companies and community projects. This article hopes to shed light on the emerging legal questions that Web 3.0 founders, creators, and lawmakers should watch and cases that inform them. These issues include securities laws, intellectual property, right of publicity, advertising, and more. This Article can present only one snapshot in time, and indeed the application of existing laws to even newer blockchain technologies will be clarified further by the time this is published. Nonetheless, this Article will hopefully provide a useful framework of how to approach new technology that relies on sound principles of decades-old legal schemes, not merely a “Law of Blockchain.”
Miranda Christ, Computer Science PhD Student, Columbia University, Undetectable Watermarks for Language Models: Recent advances in the capabilities of large language models such as GPT-4 have spurred increasing concern about our ability to detect AI-generated text. Existing machine-learning-based detectors are prone to false positives, resulting in several scandals where students were falsely accused of cheating. Watermarking is a different and promising approach that involves changing the output of the model to make it detectable. Following pressure from the Biden administration, several large AI companies, including OpenAI and Google, have pledged to watermark their AI-generated materials. Regulation for AI watermarks is sparse, though it is moving quickly: A US bill requiring AI watermarks was introduced in September, and regulation already passed by China's Cyberspace Administration requires companies to mark AI-generated content. An understanding of what guarantees watermarks can offer is crucial in shaping future policy decisions. In this talk, I will give an overview of challenges surrounding watermarking AI-generated text and my recent work that formally defines desirable guarantees of watermarks for language models. In this work, we introduce a cryptographically-inspired notion of undetectable watermarks for language models. That is, watermarks can be detected only with the knowledge of a secret key; without the secret key, it is computationally intractable to distinguish watermarked outputs from those of the original model. In particular, it is impossible for a user to observe any degradation in the quality of the text. Crucially, watermarks remain undetectable even when the user is allowed to adaptively query the model with arbitrarily chosen prompts. We also define soundness, which requires that false positives are exceedingly rare, and completeness, which requires that the watermark is present in sufficiently original text output by the model. We construct a watermark that is undetectable, sound, and complete based on the existence of one-way functions, a standard assumption in cryptography. This is joint work with Sam Gunn and Or Zamir.
Workshop 18: Tuesday, September 26 Northwestern University (Jason Hartline and Dan Linna)
Peter Henderson, JD/PhD candidate in CS, Stanford University (incoming Assistant Professor, Princeton University), Foundation Models and Fair Use: Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models. Joint work with Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, and Percy Liang.
Rui-Jie Yew, CS-PhD student, Brown University, Break It Till You Make It: Limitations of Copyright Liability Under a Pretraining Paradigm of AI Development: In this talk, I consider the impacts of a pre-training regime on the enforcement of copyright law for AI systems. I identify a gap between conceptualizations of the developmental process in the legal literature for copyright liability and the evolving landscape of deployed AI models. Specifically, proposed legal tests have assumed a tight integration between model training and model deployment: the ultimate purpose of a model plays a central role in determining if a training procedure's use of copyrighted data infringes on the author's rights. In practice, modern systems are built and deployed under a pre-training paradigm: large models are trained for general-purpose applications and to be specialized to different applications, often by third parties. This potentially creates an opportunity for developers of pre-trained models to avoid direct liability under these tests. As a result, I consider copyright's secondary liability doctrine in the practical effect of copyright regulation on the development and deployment of AI systems. I draw from secondary copyright liability litigation for technologies in the past to understand how AI companies may manage or attempt to limit their copyright liability in practice. Based on this, I conclude with a discussion of regulatory strategies to close these loopholes and propose duties of care for developers of ML models to evaluate and mitigate their models' present and downstream effects on the authors of copyrighted works that are used in training. This is joint work with Dylan Hadfield-Menell.
Workshop 17: Friday, May 19 Georgetown (Paul Ohm and Ayelet Gordon-Tapiero)
Shareen Joshi, Associate Professor of International Development at Georgetown University’s School of Foreign Service, Impact of free legal search on rule of law: Evidence from Indian Kanoon: Access to legal information is limited in many parts of the world. Can digital platforms offering free legal search reduce market-level constraints on economic development? We estimate the impact of Indian Kanoon, a free legal search engine, using a generalized difference-in-differences empirical strategy. We find that the staggered rollout of Kanoon was associated with a 1-2% increased likelihood of case resolutions and doubling of the number of appeals, which are also less likely to be dismissed by the courts. It affected the finances of firms with positive impacts on assets and negative impacts on audit fees and bad debts.
Neel Sukhatme, Professor of Law and Anne Fleming Research Professor at Georgetown University Law Center, and Affiliated Faculty at the Georgetown McCourt School of Public Policy, Judges for Sale: The Effect of Campaign Contributions on State Criminal Courts: Scholars and policymakers have long sought to determine whether campaign contributions affect democratic processes. Using data on donations from Texas, we show that criminal defense attorneys who contribute to a district judge’s electoral campaign are preferentially assigned by that judge to indigent defense cases, i.e., public contracts in which the state pays private attorneys to represent poor defendants. We estimate that attorney donors receive twice as many cases as non-donors during the month of their campaign contribution. Nearly two-thirds of this increase is explained by the contribution itself, with the remainder attributable to shared preferences within attorney-judge pairs, such as professional, ideological, political, or personal ties. Defendants assigned to donor attorneys also fare worse in cases resolved in the month of contribution, with fewer cases dismissed and more defendants convicted and incarcerated. Further evidence suggests recipient judges close cases to cash out their attorney benefactors, at the expense of defendants. Our results provide some of the strongest causal evidence to date on the corrosive potential of campaign donations, including their impact on the right to counsel as guaranteed by the U.S. Constitution.
Workshop 16: Friday, April 21 Berkeley (Rebecca Wexler)
Barry Scheck, co-founder of The Innocence Project, Automating Police Misconduct Databases: Professor Scheck will discuss work relating to the Community Law Enforcement Accountability Network, also known as CLEAN, which is a first-of-its-kind partnership of journalists, lawyers, computer engineers and academic institutions. CLEAN is working to develop and automate a database of police misconduct records that can be easily accessed by news organizations and the general public.
The Integrity of Our Convictions
Davi Liang , associate research scholar at the Digital Interests Lab at NYU, Challenging Algorithms: How Courts have Judged Challenges to Algorithmic Decision making: Without properly acknowledging the limitations and issues of modern AI tools and the associated problems, the US criminal justice system risks putting too much credibility into flawed technology. Recent court cases have demonstrated contentment with permitting algorithmic decision-making without full transparency or evaluation, emblematic of other cases scrutinizing algorithms used for processing evidence or determining recidivism in criminal cases. Solutions to mitigating algorithmic harms proposed by researchers and writers have not been followed by judicial opinions. This article explores how challenges to the use of algorithms in criminal courts have developed rules and limitations to the use of algorithms to comply with Constitutional due process rights, and how these rules and limitations require further development to guarantee trust and equity in their use.
Workshop 15: Friday, March 17 UPenn (Christopher Yoo)
Stefan Bechtold, ETH Zurich, Switzerland, Explaining Explainable AI: The increasing use of algorithms in legal and economic decision-making has led to calls for explainable decisions or even a “right to explanation” for decision subjects. Such explanations are desired, in particular, when the decision-making algorithms are opaque, for example with machine learning or artificial intelligence. The EU’s General Data Protection Regulation (GDPR) and related reforms have started giving these rights legal force. Nevertheless, even a specified right to explanation leaves many questions open, in particular, how decisions made by black-box algorithms can and should be explained. Faced with this question, legal and social science scholars have begun to articulate conditions that explanations should satisfy to make them legally and ethically acceptable for decision subjects. At the same time, an active literature in explainable AI has produced a growing library of methods for explaining algorithmic predictions and decisions. However, explainable AI has focused primarily on the needs of software developers to debug rather than the interests of decision subjects to understand. The legal-ethical debates, on the one hand, and explainable AI innovations, on the other, have mostly proceeded independently without a connecting conversation. This project aims to bridge this gap. Starting from the legal side, we present an organizing framework for explanations of algorithmic decision-making, distill factors contributing to good explanations of algorithmic decision-making, and introduce a taxonomy of explanation methods. We argue that this framework may provide a bridge between the literature in law and computer science on explainable AI. We also present avenues for applying this framework in empirical research.
Steve Bellovin, Preventing Intimate Image Abuse via Privacy-Preserving Credentials: The problem of non-consensual pornography (NCP), sometimes known as intimate image abuse or revenge porn, is well known. However, despite its distribution being illegal in most states, it remains a serious problem, if only because it is often difficult to prove who uploaded the pictures.
One obvious countermeasure would be to require Internet sites to strongly authenticate their users, but this is not an easy problem. Furthermore, while that would provide accountability for the immediate upload, such a policy would cause other problems—the ability to speak anonymously is a vital constitutional right. Also, it often would not help identify the original offender—many people download images from one site and upload them to another, which adds another layer of complexity.
We instead propose a more complex scheme, based on a privacy-preserving cryptographic credential scheme originally devised by Jan Camenisch and Anna Lysyanskaya. We arrange things so that three different parties must cooperate to identify a user who uploaded an image. We perform a legal analysis of the acceptability of this scheme under the First Amendment and its implied guarantee of the right to anonymous speech, show how this must be balanced against the victim's right to sexual privacy, discuss the necessary changes to §230 (and the constitutional issues with these changes), and the legal standards for obtaining the necessary court orders—or opposing their issuance.
Workshop 14: Friday, February 17 University of Chicago (Aloni Cohen)
Ruth Greenwood, Harvard Law School, Differential Privacy in the 2020 Census – Assessing Bias through the Noisy Measurements Files: The Census Bureau introduced a new disclosure avoidance system (DAS) for the 2020 Census that involved first applying differential privacy (DP) to the census edited file (CEF), resulting in a Noisy Measurements File (NMF), and then applying post-processing to create the 2020 Census products. Academic work reviewed demonstration products released in advance of the 2020 Census and contended that the DAS algorithm would introduce bias into the final Census, and that bias likely had disparate effects on communities of color. In order to determine whether and how much bias was introduced by post processing (as opposed to by DP), the Election Law Clinic (ELC) filed filed a Freedom of Information Act (FOIA) request with the Census Bureau on behalf of Professor Justin Phillips (in July 2022), requesting both the 2010 and 2020 NMFs. That FOIA was not responded to until after Prof. Phillips filed a lawsuit to enforce the FOIA. Today’s presentation will cover the course of the litigation and the implications of the possible findings for the use of the same DAS algorithm for future census products.
Marika Swanberg, PhD Candidate Boston University, Control, Confidentiality, and the Right to be Forgotten: Recent digital rights frameworks give users the right to delete their data from systems that store and process their personal information (e.g., the "right to be forgotten" in the GDPR). How should deletion be formalized in complex systems that interact with many users and store derivative information? We argue that prior approaches fall short. Definitions of machine unlearning [CY15] are too narrowly scoped and do not apply to general interactive settings. The natural approach of deletion-as-confidentiality [GGV20] is too restrictive: by requiring secrecy of deleted data, it rules out social functionalities. We propose a new formalism: deletion-as-control. It allows users' data to be freely used before deletion, while also imposing a meaningful requirement after deletion---thereby giving users more control. Deletion-as-control provides new ways of achieving deletion in diverse settings. We apply it to social functionalities, and give a new unified view of various machine unlearning definitions from the literature. This is done by way of a new adaptive generalization of history independence. Deletion-as-control also provides a new approach to the goal of machine unlearning, that is, to maintaining a model while honoring users' deletion requests. We show that publishing a sequence of updated models that are differentially private under continual release satisfies deletion-as-control. The accuracy of such an algorithm does not depend on the number of deleted points, in contrast to the machine unlearning literature. Based on joint work with Aloni Cohen, Adam Smith, and Prashant Nalini Vasudevan
Workshop 13: Friday, January 20, 2023, MIT (Dazza Greenwood)
Dr. Thibault Schrepel, Amsterdam Law & Technology Institute, Augmenting antitrust analysis with CS
Dr. Megan Ma, Standford CodeX Fellow, Conceptual Issues in Developing “Expert-Driven” Data: It has (historically) been argued that specialized domains, such as the legal field, frequently are not exposed to research in deep learning due to the high costs of expert annotations. Coupled with the nature of the legal profession, few datasets in the public domain are available for research. Accordingly, methodology around how expertise may be used to create datasets tailored to the specialized field remain relatively unexplored. We conduct a qualitative experiment with an existing “expert-driven” dataset to offer preliminary observations on the quality of the data labels, functional relevance of the tool in practice, and consistency in the revisions. We then assess and provide recommendations as to whether a new standard is required when building expert datasets. Furthermore, against the exponential developments in generative AI, we reflect on the notion and role of expertise in training state-of-the-art NLP models, specific to contractual review.
For background, CUAD and MAUD on expert datasets referenced.
Workshop 12: Friday, December 16, 2022, Boston University (Ran Canetti)
Angela Jin (University of California, Berkeley), Adversarial Scrutiny of Evidentiary Statistical Software: The U.S. criminal legal system increasingly relies on software output to convict and incarcerate people. In a large number of cases each year, the government makes these consequential decisions based on evidence from statistical software—such as probabilistic genotyping, environmental audio detection, and toolmark analysis tools—that defense counsel cannot fully cross-examine or scrutinize. This undermines the commitments of the adversarial criminal legal system, which relies on the defense’s ability to probe and test the prosecution’s case to safeguard individual rights. Responding to this need to adversarially scrutinize output from such software, we propose robust adversarial testing as an audit framework to examine the validity of evidentiary statistical software. We define and operationalize this notion of robust adversarial testing for defense use by drawing on a large body of recent work in robust machine learning and algorithmic fairness. We demonstrate how this framework both standardizes the process for scrutinizing such tools and empowers defense lawyers to examine their validity for instances most relevant to the case at hand. We further discuss existing structural and institutional challenges within the U.S. criminal legal system that may create barriers for implementing this and other such audit frameworks and close with a discussion on policy changes that could help address these concerns. Full paper available at: Adversarial Scrutiny of Evidentiary Statistical Software by Rediet Abebe, Moritz Hardt, Angela Jin, John Miller, Ludwig Schmidt, Rebecca Wexler :: SSRN
Rachel Cummings (Columbia University), Attribute Privacy: Framework and Mechanisms: Ensuring the privacy of training data is a growing concern since many machine learning models are trained on confidential and potentially sensitive data. Much attention has been devoted to methods for protecting individual privacy during analyses of large datasets. However in many settings, global properties of the dataset may also be sensitive (e.g., mortality rate in a hospital rather than presence of a particular patient in the dataset). In this work, we depart from individual privacy to initiate the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis. We propose definitions to capture attribute privacy in two relevant cases where global attributes may need to be protected: (1) properties of a specific dataset and (2) parameters of the underlying distribution from which dataset is sampled. We also provide two efficient mechanisms for specific data distributions and one general but inefficient mechanism that satisfy attribute privacy for these settings. We base our results on a novel and non-trivial use of the Pufferfish framework to account for correlations across attributes in the data, thus addressing “the challenging problem of developing Pufferfish instantiations and algorithms for general aggregate secrets” that was left open by Kifer and Machanavajjhala in 2014 [15]. Full paper available at: Attribute Privacy: Framework and Mechanisms | 2022 ACM Conference on Fairness, Accountability, and Transparency This talk will conclude with a discussion of future applications and open problems in CS+Law where Attribute Privacy may be a useful formalism.
Workshop 11: Friday, November 18, 2022, UCLA (John Villasenor)
Apurva Panse (2L at NYU School of Law), Rethinking the Ex Parte Nature of Search Warrants: The current framework for searches and seizures allows police and prosecutors broad latitude without much oversight, under the premise of exigency. The system allows officers to secure ex parte permission to infringe on a person’s right of privacy, without acknowledging that officers can lie, and procedural due process may demand more safeguards. In my paper, I argue that additional procedural safeguards are necessary for evidence in law enforcement custody, which does not implicate exigency but warrants a heightened expectation of privacy. I focus on two such categories of evidence: DNA and digital evidence, and argue that courts should allow ex ante challenges to search warrants or judicial orders for such evidence.
Nikita Aggarwal (Postdoctoral Research Fellow UCLA), #Fintok and Financial Regulation: Social media platforms are becoming an increasingly important site for consumer finance. This phenomenon is referred to as “FinTok,” a reference to the “#fintok” hashtag that identifies financial content on TikTok, a popular social media platform. This Essay examines the new methodological possibilities for consumer financial regulation due to FinTok. It argues that FinTok content offers a novel and valuable source of data for identifying emerging fintech trends and associated consumer risks. As such, financial regulators should use FinTok content analysis—and social media content analysis more broadly—as an additional method for the supervision and regulation of consumer financial markets. The Essay test-drives this method using audiovisual content from TikTok in which consumers discuss their experience with “buy now, pay later,” a rapidly growing and less regulated form of fast, digital credit. It reveals tentative evidence of payment difficulties and strategic default in the buy now, pay later credit market, with attendant consumer protection risks. These insights provide a point of entry for the further study and regulation of the buy now, pay later credit market.
Workshop 10: Friday, October 28, 2022, Cornell University (James Grimmelmann)
Lizzie Kumar (PhD candidate at Brown University), Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation: Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of "unfairness," thus raising the concern that banks and other financial institutions could---potentially unwittingly---engage in illegal discrimination through the use of this technology. In the US, there are laws in place to make sure discrimination does not happen in lending and agencies charged with enforcing them. However, conversations around fair credit models in computer science and in policy are often misaligned: fair machine learning research often lacks legal and practical considerations specific to existing fair lending policy, and regulators have yet to issue new guidance on how, if at all, credit risk models should be utilizing practices and techniques from the research community. This paper aims to better align these sides of the conversation. We describe the current state of credit discrimination regulation in the United States, contextualize results from fair ML research to identify the specific fairness concerns raised by the use of machine learning in lending, and discuss regulatory opportunities to address these concerns. Full paper available at: https://assets-global.website-files.com/6230fe4706acf355d38b2d54/62e02dd4dccb7c2ee30bfd56_Algorithmic_Fairness_and_Fair_Lending_Law.pdf
Sandra Wachter (University of Oxford), The Theory of Artificial Immutability: Protecting Algorithmic Groups under Anti-Discrimination Law: Artificial intelligence is increasingly used to make life-changing decisions, including about who is successful with their job application and who gets into university. To do this, AI often creates groups that haven’t previously been used by humans. Many of these groups are not covered by non-discrimination law (e.g., ‘dog owners’ or ‘sad teens’), and some of them are even incomprehensible to humans (e.g., people classified by how fast they scroll through a page or by which browser they use).
This is important because decisions based on algorithmic groups can be harmful. If a loan applicant scrolls through the page quickly or uses only lower caps when filling out the form, their application is more likely to be rejected. If a job applicant uses browsers such as Microsoft Explorer or Safari instead of Chrome or Firefox, they are less likely to be successful. Non-discrimination law aims to protect against similar types of harms, such as equal access to employment, goods, and services, but has never protected “fast scrollers” or “Safari users”. Granting these algorithmic groups protection will be challenging because historically the European Court of Justice has remained reluctant to extend the law to cover new groups.
This paper argues that algorithmic groups should be protected by non-discrimination law and shows how this could be achieved. Full paper available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4099100
CS+Law Workgroup Shoutout: What's Driving Conflicts Around Differential Privacy for the U.S. Census | IEEE Journals & Magazine | IEEE Xplore
Workshop 9: Friday, September 23, 2022, Organized by Northwestern University (Jason Hartline and Dan Linna)
Aileen Nielsen (ETH Zurich's Center for Law and Economics -Co-authors: Laura Skylaki, Milda Norkute, and Alexander Stremitzer), Building a Better Lawyer - Machine Guidance Can Make Legal Work Faster and Fairer: The possibility to make lawyers work better with technology is an important and ever moving target in the development of legal technologies. Thanks to new digital technologies, lawyers can do legal research and writing far more effectively today than just a few decades ago. But, to date, most assistive technology has been limited to legal search capabilities, with attorney users of these technologies executing relatively confined instructions in a narrow task. Now, a new breed of legal tools offers guidance rather than information retrieval. This rapid expansion in the range of tasks for which a machine can offer competent guidance for legal work has created new opportunities for human-machine cooperation to improve the administration of law but also offers new risks that machine guidance may bias the practice of law in undesirable ways. We present a randomized controlled study that tackles the question of how machine guidance influences the quality of legal work. We look both to the quality of the procedure by which work is carried out and the quality of the outputs themselves. Our results show that a legal AI tool can make lawyers faster and fairer without otherwise influencing aggregate measures of work quality. On the other hand, we identify some distributional effects of the machine guidance that raise concerns about the impact of machine guidance on human work quality. We thus provide experimental evidence that legal tools can improve objectively assessed performance indicators (efficiency, fairness) but also raise concerns about how the quality of legal work should be defined and regulated. In addition to these results, we also furnish an example methodology as to how organizations could begin to assess legal AI tools to ensure appropriate and responsible deployment of these tools.
Liren Shan (Theory Group at Northwestern University -Co-authors: Jason D. Hartline, Daniel W. Linna Jr. , Alex Tang), Algorithmic Learning Foundations for Common Law: This paper looks at a common law legal system as a learning algorithm, models specific features of legal proceedings, and asks whether this system learns efficiently. A particular feature of our model is explicitly viewing various aspects of court proceedings as learning algorithms. This viewpoint enables directly pointing out that when the costs of going to court are not commensurate with the benefits of going to court, there is a failure of learning and inaccurate outcomes will persist in cases that settle. Specifically, cases are brought to court at an insufficient rate. On the other hand, when individuals can be compelled or incentivized to bring their cases to court, the system can learn and inaccuracy vanishes over time. (Preprintonarxiv)
Workshop 8: Friday, May 20, 2022, Organized by Boston University (Professor Ran Canetti)
Sunoo Park (Cornell), The Right to Vote Securely: This Article argues that election law can, does, and should ensure that the right to vote is a right to vote securely. First, it argues that constitutional voting rights doctrines already prohibit election practices that fail to meet a bare minimum threshold of security. But the bare minimum is not enough to protect modern election infrastructure against sophisticated threats. The Article thus proposes new statutory measures to bolster election security beyond the constitutional baseline, with technical provisions designed to change the course of insecure election practices that have become regrettably commonplace and standardize best practices drawn from state-of-the-art research on election security.
Sarah Scheffler (Princeton), Formalizing human ingenuity: A quantitative framework for substantial similarity in copyright: A central notion in U.S. copyright law is judging the "substantial similarity" between an original and an allegedly derived work. Capturing this notion has proven elusive, and the many approaches offered by case law and legal scholarship are often ill-defined, contradictory, or internally inconsistent. This work suggests that a key part of the substantial similarity puzzle is amenable to modeling inspired by theoretical computer science. Our proposed framework quantitatively evaluates how much "novelty" is needed to produce the derived work with access to the original work, versus reproducing it without access to the copyrighted elements of the original work. Our definition has its roots in the abstraction-filtration-comparison method of Computer Associates, Inc. v. Altai. Our framework's output "comparison" is easy to evaluate, freeing up the court's time to focus on the more difficult "abstraction and filtering" steps used as input. We evaluate our framework on several pivotal cases in copyright law and observe that the results are consistent with the rulings.
Azer Bestavros, (Boston University); Stacey Dogan, (Boston University); Paul Ohm, (Georgetown); Andrew Sellars, (Boston University), Bridging the Computer Science-Law Divide (PDF available): With the generous support of the Public Interest Technology University Network (PIT-UN) researchers from the Georgetown University Institute for Technology Law and Policy and Boston University’s School of Law and Faculty of Computing and Data Sciences present this report compiling practical advice for bridging Computer Science and Law in academic environments. Intended for university administrators, professors in computer science and law, and graduate and law students, this report distills advice drawn from dozens of experts who have already successfully built bridges in institutions ranging from large public research universities to small liberal arts colleges.
Workshop 7: Friday, April 15, 2022, Organized by MIT (Lecturer and Research Scientist Dazza Greenwood)
Sandy Pentland (MIT), Law was the first Artificial Intelligence: Hammurabi's Code can be described as the first formally codified distillation of expert reasoning. Today we have many systems for codification and application of expert reasoning, these systems are often lumped together under the umbrella of Artificial Intelligence (AI). What can we learn by thinking about law as AI?
Robert Mahari (MIT & Harvard), Deriving computational insights from legal data: First, we will discuss the law as a knowledge system that grows by means of citations. We will compare the citation networks in law and science by leveraging tools from “science-of-science”. We will explore how, despite the fundamental differences between the two systems, the core citation dynamics are remarkably universal, suggesting that the citation dynamics are largely shaped by intrinsic human constraints and robust against the numerous factors that distinguish the law and science. Second, we will explore how legal citation data can be used to build sophisticated NLP models that can aid in forming legal arguments by predicting relevant passages of precedent given the summary of an argument. We will discuss a state-of-the-art BERT model, trained on 530,000 examples of legal arguments made by U.S. federal judges, which predict relevant passages from precedential court decisions given a brief legal argument. We will highlight how this model performs well on unseen examples (with a top-10 prediction accuracy of 96%) and how it handles arguments from real legal briefs.
Workshop 6: Friday, March 11, 2022, Organized by University of Pittsburgh (Professor Kevin Ashley)
Daniel E. Ho (Stanford University), Large Language Models and the Law: This talk will discuss the emergence of large language models, their applicability in law and legal research, and the legal issues raised by the use of such models. We will illustrate with the CaseHOLD dataset, comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case, and the application to mass adjudication systems in federal agencies.
Elliott Ash (ETH Zurich), Reading (Judges') Minds with Natural Language Processing: This talk will introduce some recent lines of empirical legal research that apply natural language processing to analyze beliefs and attitudes of judges and other officials. When do lawmakers use more emotion, rather than logic, in their rhetoric? When do judges use notions of economic efficiency, rather than fairness or justice, in their written opinions? What can language tell us about political views or social attitudes?
Workshop 5: Friday, February 18, 2022, Organized by University of Chicago (Professor Aloni Cohen)
Deborah Hellman (University of Virginia School of Law), What is a proxy and does it matter: A few years ago there was a controversy about the Amazon hiring tool that downgraded women applicants because the tool had learned that women are less good as software engineers. As reported in the press, the program downgraded resumes with the word “Women” as in “Women’s volleyball team” and also the resumes of candidates from two women’s colleges. If we focus in particular on the women’s college example (suppose it was Smith and Wellesley), should we consider this differently than if the program had downgraded the resumes of candidates that noted “knitting” as a hobby? What about if it had downgraded resumes in which the candidate had listed him/herself as the president of a college club which, it turns out, also correlates with sex/gender (suppose women are more likely to seek these offices than men). The question I am interested in is whether there is a meaningful, normatively significant category of a proxy. A program might use the trait itself (sex, for example) to sort. It might have a disparate impact on the basis of that trait (sex). But is there something in between, in which it uses another trait (attended Smith, likes knitting, was President of the club) that we describe as being a “proxy for sex” such that this description is descriptively and normatively meaningful?
Aloni Cohen (University of Chicago Computer Science), Truthtelling and compelled decryption: The Fifth Amendment to the US Constitution provides individuals a privilege against being “compelled in any criminal case to be a witness against himself.” Courts and legal scholars disagree about how this privilege applies to so-called compelled decryption cases, wherein the government seeks to compel an individual to unlock or decrypt an encrypted phone, computer, or hard drive. A core question is under what circumstances is there testimony implicit in the act of decryption. One answer is that there is no implicit testimony if “the Government is in no way relying on the ‘truthtelling’” of the respondent (Fisher v US, 1976). In ongoing work with Sarah Scheffler and Mayank Varia, we are formalizing a version of this answer and exploring what it suggests about compelled decryption and other compelled computational acts. With this audience, I'd like to discuss (and elicit feedback about) the relationship between our approach and the underlying criminal law context.
Workshop 4: Friday, January 21, 2022, Organized by UCLA (Professor John Villasenor)
Leeza Arbatman (UCLA Law) and John Villasenor (UCLA Engineering and Law), When should anonymous online speakers be unmasked?: Freedom of expression under the First Amendment includes the right to anonymous expression. However, there are many circumstances under which speakers do not have a right to anonymity, including when they engage in defamation. This sets up a complex set of tensions that raise important—and as yet unresolved—questions regarding when, and under what circumstances online anonymous speakers should be “unmasked” so that their true identities are revealed.
Priyanka Nanayakkara (Northwestern Computer Science & Communication), The 2020 U.S. Census and Differential Privacy: Surfacing Tensions in Conceptualizations of Confidentiality Among Stakeholders: The U.S. Census Bureau is legally mandated under Title 13 to maintain confidentiality of census responses. For the 2020 Census, the bureau employed a new disclosure avoidance system (DAS) based on differential privacy (DP). The switch to the new DAS has sparked discussion among several stakeholders—including the bureau, computer science researchers, demographers, independent research organizations, and states—who have different perspectives on how confidentiality should be maintained. We draw on public-facing and scholarly reports from a variety of stakeholder perspectives to characterize discussions around the new DAS and reflect on underlying tensions around how confidentiality is conceptualized and reasoned about. We posit that these tensions pinpoint key sources of miscommunication among stakeholders that are likely to generalize to other applications of privacy-preserving approaches pioneered by computer scientists, and therefore offer important lessons about how definitions of confidentiality (as envisioned by different stakeholders) may align/misalign with one another.
Workshop 3: Friday, November 19, 2022, Organized by University of Pennsylvania (Professor Christopher S. Yoo)
Christopher S. Yoo (University of Pennsylvania), Technical Questions Underlying the EU Competition Case Against Android: The EU’s competition law case against Google Android turns on key factual findings about the motivations and limitations faced by the developer community. For example, how do developers decide on which operating systems to launch their apps and how to prioritize their efforts if they decide to create versions for more than one? How can we quantify the costs for developers of porting versions of apps for different platforms and forks of the same platform? To what extent does the Chinese market influence app development? To what extent does the Google Play Store compete with Apple’s App Store or Chinese equivalents? To what extent are developers inhibited by requirements that phone manufacturers bundle certain apps? And to what extent do developers benefit from provisions guaranteeing that the platform will provide general-purpose functionality such as clocks and calendars? At the same time, the Google Android case overlooks the inherent tensions underlying open source operating systems, which simultaneously presuppose the flexibility inherent in open source and the rigid compatibility requirements of a modular platform. Is fragmentation a real threat both in terms of software development and consumer adoption, and if so, what steps are appropriate to mitigate the problems it poses? Open source app environments are sometimes criticized as cesspools of malware. Is this true, and if so, what are the appropriate responses? To what extent are operating system platforms justified in overseeing compatibility? This presentation will sketch out how the EU’s case against Google Android frames and answers these questions. Full resolution depends not only on providing answers to this specific case but also providing more general frameworks for conceptualizing how to address similar questions in the future.
David Clark and Sara Wedeman (MIT), Law and Disinformation: Do They Intersect?: Disinformation (the intentional creation and propagation of false information, as opposed to misinformation, the unintentional propagation of incorrect information), has received a great deal of attention in recent years, with strong evidence of Russian attempts to manipulate elections in the US and elsewhere. The problem has been studied for well over 10 years, with many hundreds of papers from different disciplines ranging from journalism to psychology. However, the role of law in combatting disinformation is unclear. In this talk, I offer a few dimensions along which law might relate to this issue and invite discussion and clarification. My concern is specifically with online disinformation, propagated through platforms such as Twitter and Facebook. In the U.S., one law that defines the responsibilities and protections for platform providers is Section 230 of the Communications Decency Act. There are now calls for the revision or repeal of this law. However, I do not think the debate around Section 230 is well-formed. I suggest this as a possible topic of discussion. As another dimension of the problem, while financial institutions have a regulatory obligation to Know Your Customer (KYC) platform providers have no such obligation, and unattributed speech on the Internet is the norm. But the anonymity of some forms of speech is protected, as is telling lies. Should platform providers have any responsibilities with respect to disinformation, and if so, of what sort? The solution must be much more nuanced than simple calls for filtering or digital literacy.
Workshop 2: Friday, October 22, 2022, Organized by University of California Berkeley (Professors Rebecca Wexler and Pamela Samuelson)
Sarah Lawsky, Northwestern Law, and Liane Huttner, Sorbonne Law (research in collaboration with Denis Merigoux, Inria Paris CS, and Jonathan Protzenko, Microsoft Research CS). This presentation will describe a new domain-specific programming language, Catala, that provides a tractable, functional, and transparent approach to coding tax law. It will describe benefits of formalizing tax law using Catala, including increased government accountability and efficiency, and speculate about potential compliance issues that could arise from the formalization.
Catalin Voss and Jenny Hong (Stanford CS), Most applications of machine learning in criminal law focus on making predictions about people and using those predictions to guide decisions. Whereas this predictive technology analyzes people about whom decisions are made, we propose a new direction for machine learning that scrutinizes decision-making itself. Our aim is not to predict behavior, but to provide the public with data-driven opportunities to improve the fairness and consistency of human discretionary judgment. We call our approach the Recon Approach, which encompasses two functions: reconnaissance and reconsideration. Reconnaissance reveals patterns that may show systemic problems across a set of decisions; reconsideration reveals how these patterns affect individual cases that warrant review. In this talk, we describe the Recon Approach and how it applies to California’s parole hearing system, the largest lifer parole system in the United States, starting with reconnaissance. We describe an analysis using natural language processing tools to extract information from 35,105 parole hearing transcripts conducted between 2007 and 2019 for all parole-eligible candidates serving life sentences in California. We are the first to analyze all five million pages of these transcripts, providing the most comprehensive picture of a parole system studied to date through a computational lens. We identify several mechanisms that introduce significant arbitrariness into California’s parole decision process. We then ask how our insights motivate structural parole reform and reconsideration efforts to identify injustices in historical cases.
Workshop 1: Friday, September 17, 2022, Organized by Northwestern University (Professors Jason Hartline and Dan Linna)
Rebecca Wexler (Berkeley Law), Privacy Asymmetries: Access to Data in Criminal Defense Investigations: This Article introduces the phenomenon of “privacy asymmetries,” which are privacy statutes that permit courts to order disclosures of sensitive information when requested by law enforcement, but not when requested by criminal defense counsel. In the United States adversarial criminal legal system, defense counsel are the sole actors tasked with investigating evidence of innocence. Law enforcement has no constitutional, statutory, or formal ethical duty to seek out evidence of innocence. Therefore, selectively suppressing defense investigations means selectively suppressing evidence of innocence. Privacy asymmetries form a recurring, albeit previously unrecognized, pattern in privacy statutes. They likely arise from legislative oversight and not reasoned deliberation. Worse, they risk unnecessary harms to criminal defendants, as well as to the truth-seeking process of the judiciary, by advantaging the search for evidence of guilt over that for evidence of innocence. The number of these harms will only increase in the digital economy as private companies collect immense quantities of data about our heart beats, movements, communications, consumption, and more. Much of that data will be relevant to criminal investigations, and available to the accused solely through the very defense subpoenas that privacy asymmetries block. Moreover, the introduction of artificial intelligence and machine learning tools into the criminal justice system will exacerbate the consequences of law enforcement’s and defense counsel’s disparate access to data. To avoid enacting privacy asymmetries by sheer accident, legislators drafting privacy statutes should include a default symmetrical savings provision for law enforcement and defense investigators alike. Full Paper at: Privacy Asymmetries: Access to Data in Criminal Defense Investigations by Rebecca Wexler :: SSRN
Jinshuo Dong, Aravindan Vijayaraghavan, and Jason Hartline (Northwestern CS), Interactive Protocols for Automated e-Discovery: