Wednesday, February 24, 2021

A Court Public Data Access Proposal – Yes, but…


My friend Bob Ambrogi asked for comment during last Friday’s Legaltech Week Journalists’ Roundtable (an excellent discussion every week about our court customers). The discussion turned briefly to Jason Tashea's proposal via the initiative titled “Digitizing State Courts, Expanding Access to Justice”.  The following is my partial reply to the question asked.

I applaud Mr. Tashea for thinking about the problem. But I think there is a lot more to consider and there are additional options?  I explain…


In brief, Mr. Tashea proposes that the US Federal government provide a 1 Billion Dollar cash infusion to the state courts “to develop and adopt standardized digital infrastructure for courts and other justice agencies” to “allow courts to collect granular raw data, which can help overcome the current backlog, increase access to the justice system, inform policies that drive down mass incarceration, improve transparency and seed a public and private revolution in justice technology that improves access to justice for all Americans.” 

Please note that I do not want to discourage any additional court technology funding.  But if believe there are many problems with the paper that I will discuss below.

First, let me point out that my colleagues at the National Center for State Courts have been working on the Court Statistics Project since I joined the organization 30 years ago.  Any criticism of court statistics without reviewing their work, along with that of the companion project at State Court Organization is incomplete in my opinion.  See:  In addition, there have been decades of work put into criminal justice information systems by The Search Group and more recently the IJIS Institute.  Their ideas and work also need to be considered.

Second, we need to discuss the scope of Mr. Tashea's proposal as it is unclear.  Criminal cases?  Only felony/serious matters?  What about a misdemeanor/minor violation that escalates into a felony as time passes and more information is known?  And what about civil matters?  Perhaps a civil matter that becomes a criminal matter? Just having statistical count data does not tell the full story as it is fundamentally a snapshot count in courts at a particular moment in time. 

For example, when I managed the Arizona State Court statistical system, we did two counts.  One around fifteen days following the first day of the statistical month and the final one around fifty days. This allows for data to be received and errors to be corrected. Which counts?  Which snapshot?

He further wrote that “(m)any states still operate on paper and have little-to-no digital data citing a paper from Ohio judges.  While it could be true in some counties in Ohio, it is simply not true in most of the states, especially those who have built unified systems.  I am happy to discuss this later.

Third, I have a serious issue with the thesis that raw data counts will overcome backlogs, etc.  This simple statement ignores the complexity of legal system data, formats, relationships, and situational changes.  This data is only partially present in the court document and data systems. It also resides in law enforcement, prosecution, and social services (juvenile) depending upon the case type.  If we want to understand the justice system, we must follow a person’s journey from arrest to charging to plea to adjudication to consequences with all the relationships and case processing captured.

As a side note, I have written before that I do not believe raw case count statistics adequately tell the justice system story.  Nor do they particularly help in the management/projections to address backlogs.  Perhaps they help with trends, but I have had other ideas here and here?

Fourth. now the following is harsh, but I do not think that turning the court’s data systems via “open API” into Facebook is the answer being sought.  The huge hurdle is correctly in my opinion, privacy rights.  Having the data mined by outside interests (as BTW it is already done by credit and data collection companies) can cause more harm than good?  And I would point out that many data/credit data collection companies have had litigation that they lost and paid settlements for bad data.

Fifth, I agree with Mr. Tashea that the really useful data is in the filed documents and forms where he calls for metadata structures to be applied.  Court forms are the most common way that has been used to structure data.  The community started on the metadata definitions in 1998 that eventually became NIEM.  And for more than 20 years the potential for applying document creation metadata markup has unfortunately been a failure (not that it has not been worked on by brilliant people at ).   Instead forms and via AI type systems such as Gavelytics apply post document data identification and extraction have been I believe a better alternative (BTW, this is where e-discovery systems approaches have been helpful).

Further Mr. Tashea asks, “What risks are created by digitizing justice system records?

“Regardless of whether this proposal is enacted, cybersecurity will be an issue for courts. Court documents are filled with sensitive information, including names of confidential informants, information about children, and people’s mental health histories. If courts fail to protect people’s data, it will erode trust in the justice system, which in turn undermines the rule of law. For any court, regular, third-party security and privacy audits of these systems should be non-negotiable.”

Sixth, I agree with the need in the statement above.  But I disagree that this problem can be overcome with security and privacy audits.  Every audit system that I have ever worked with has specifically defined rules such as those set by the Governmental Accounting Standards Board (GASB)  We do have legal privacy rules that have been defined by the judicial systems but really no standards as to how they are applied to the real world.  So here are a couple of ideas as I think that criticism of an idea requires discussion of potential solutions.

My potential solutions include the ability for courts to provide a way for parties to opt-in for the use of their court data by external entities.  The courts must create specific "algorithm rules" that the AI systems can apply to their document mining systems.  Some of those rules will be restrictions on the use of names, addresses, and especially records that have been sealed and expunged.  This in turn means that the courts must provide a secure way to post and share those algorithm rules with the external parties. The expectation is that when a record is sealed, that the external parties will remove it from their current and archival systems.  This may require data to be shared in signed/certificate data packets or, as we use in courts, documents.  Those documents will need a license/certificate to be used and tracked so that as situations change, the data can be updated or expunged. 

Obviously, none of this is easy.  I look forward to the continuing discussion. 

No comments:

Post a Comment