A couple of weeks ago I shared some of “my problems with pending case statistics”. Before that, I posted another note regarding an alternative for analyzing criminal justice data. I generally try not to complain about things without having a solution in mind. In this article, I will share the idea of using text analytics to work with a court’s largest data source, case documents, and reports.
One might say yes, it would be great if my computer to read and count things in the court documents? But it has generally been the reality that the court documents are not in a format that can be used by computers to read, categorize, and count things. My hope was that E-filing could reduce this problem, And it has. But now thanks to AI, there are other possible solutions?
First, I have been monitoring work being done by the “legal technology” on “e-discovery” systems as part of a possible solution. With that, I recently stumbled across an article from a lawyer, Craig Ball who wrote about the “Google Pinpoint” service to possibly be used for “e-discovery”. He wrote:
“(A) glimmer of hope crept over the
transom today as I dragged and dropped a container file holding 50,000 e-mail
messages into a free Google tool called Pinpoint.
Within minutes, Google converted
the emails to PDFs and ran optical character recognition (OCR) against embedded
imagery. I quickly realized that
Pinpoint hadn’t processed email attachments, so I grabbed the native
attachments and pointed Pinpoint to them.
The attachments uploaded, images were OCR’ed and audio files were
transcribed! Even handwritten items were
converted to searchable text!”
What? WHAT! We can get documents, typed and handwritten along with audio files transcribed? That is a huge barrier that has been overcome not surprisingly as it is what Google does, consume data.
Microsoft of course also has a cool tool for this kind of
transformation called Computer Vision - https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ And some of you thought there wasn’t much
benefit to AI?
Second, how can we count things in documents… it is called text analytics or text mining? And unsurprising there has already been thought put to this approach in criminal justice case matters.
A paper titled “Text Mining on Criminal Documents”
was published in the International Journal of Advances in Electronics and
Computer Science in 2016. This paper
describes the concepts and some examples as to how Text Mining can be used to
count items (one could look at the case caption text for example), capture
decision-related data. And I would also want to find and count relationship
data between documents and different/similar cases.
Many court documents have formats such as the case title and
case caption that alone can be used to identify and count case events and
actions. The court documents also
include specific formats of statutory and case references used in legal
research systems for literally decades.
Another thought leader in this space is Dr. Ewe Ewald, Director of the International Justice Analysis Forum. Dr. Ewald has for example applied text
analytics for cases at the UN Criminal Tribunal in The Hague. His current work uses the Provalis Research text analytics software from Canada. While this is a proprietary software
application, it provides a good example of what is possible.
Third, it is always helpful to count the same things in the
same manner (at least for a specific jurisdiction or research program). To that end, we have some thoughts from erudite
Margaret Hagan of Stanford University.
She has written about the legal taxonomy they have been developing known as LIST. Might we use this with text
analytics/mining for privacy protection?
If we can find the data/text that should be protected that could help in
this area.
She explains:
“LIST is a taxonomy of legal
issues, needs, and situations that people may face. It matches people’s life
situations to standard legal terms and codes. Stanford Legal Design Lab
maintains LIST.
LIST provides standard codes to use
in your civic and legal technology projects. It also maps to other legal
dictionaries and problem code taxonomies.
App & bot developers can use
LIST codes to encode people’s inputs and their responses.
If you build bots, conversational
agents, and other apps that go back-and-forth with users, then the LIST
taxonomy codes can help you tag what people are asking for help with. And you
can similarly encode the resources and links you’re offering to your users.”
Last, how does this subject relate to privacy? The transformation and text mining of the documents will allow courts to find the documents and the data within to create systems for what, when, and how that information can be made accessible. The Best Practices for Court Privacy PolicyFormation” report can provide guidance on how the text mining tools can potentially be configured and applied to the need.
In conclusion, we can get the data into a format that we can
apply text mining tools. This rich set
of data goes far beyond what is possible in our court case management systems
databases. Therefore, we should add this
to our toolboxes for legal, policy, and sociological analytics.
--
Notes:
A list of Text Analysis tools is available at: https://monkeylearn.com/blog/text-analysis-tools/
Prof. Erich Schweighofer of the University of Vienna has
been thinking and writing about this for several decades. Luckily much of his work is now available via
Google Scholar at: https://scholar.google.com/citations?user=GuNftZsAAAAJ&hl=en
Excellent information Jim. I hope these approaches can be embraced by courts and they can carve out a bit of R&D to mine such data (which leads to information, then knowledge, results and then review). Unfortunately for many in government though, we don't always find this leading/bleeding edge appetite to fund such positions.
ReplyDelete