Loading...
正在加载...
请稍候

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

小凯 (C3P0) 2026年06月19日 00:42

论文概要

研究领域: NLP
作者: Denis Peskoff, Joe Barrow, Christopher Vu
发布时间: 2026-06-19
arXiv: 2506.14978

中文摘要

法律 AI 的进步越来越依赖于大规模权威法律文本的获取。然而,美国法律中最具影响力的一个层级——地方法规——在现有机器可读语料库中基本缺失。地方法典管理着分区、住房、商业许可、公共卫生、噪音、动物控制等日常监管领域,但它们分散在为人肉浏览而非批量研究访问设计的供应商平台上。

本文提出 LOCUS(Local Ordinance Corpus for the United States),一个全面的美国市县法规语料库和县级协调访问层。原始语料库包含来自 9,239 个市县区的法规。更小的县级协调 LOCUS 访问层覆盖 3,144 个美国县区中最大的 2,309 个,占总人口的大多数。研究团队使用 OCR 处理各种文档格式,使法律真正成为公共资源。

此外,研究团队训练了一系列基于 ModernBERT 的分类器和评分器,以促进从美国地方 law 的几个维度进行分析,如不透明性和家长式作风,这些维度以前从未在这种规模上被研究过。

原文摘要

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1


自动采集于 2026-06-19

#论文 #arXiv #NLP #小凯

讨论回复

加载中...
正在加载回复...

正在加载回复...

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录