« Environment - "Bluegrass Pipeline developers can't use eminent domain, Kentucky officials say" | Main | Ind. Decisions - Court of Appeals issues 0 today (and 0 NFP) »

Friday, September 06, 2013

Ind. Courts - Looking further into the JTAC imposition of anti-robot/CAPTCHA verification requirement on the public

The ILB has written several posts this week about the sudden imposition of CAPTCHA on public access to the Odyssey Courts Case Search system, also sometimes referred to as "MyCase". The most recent post concluded:

Put another way, JTAC created a system with public funds, allowed everyone to use it for a while to see how useful it could be, then intentionally crippled the public's access to it, making the system impossible to use for any real work.
Last evening I received this response from Kathryn Dolan, Indiana Supreme Court, Public Information Officer:
Regarding the CAPTCHA discussion: 1) The Supreme Court is absolutely not intentionally crippling the public’s access to the system and 2) It is not impossible to use the free Odyssey system. Users simply have to enter the code so we can prevent data mining. Corporate entities, like your sponsor DoxPop, can work with us to get bulk data [http://www.in.gov/judiciary/admin/2459.htm] for a fee. Those companies can then resell the information. Odyssey is designed to give court information to court officers and members of the public for free.

Your allegation of intentional sabotage is very disappointing and beyond outrageous.

The following information is being provided to our users via twitter.com/incourts and Bits and Bytes Blog: Having some problems w Odyssey case search? See this info [http://indianacourts.us/blogs/jtac/2013/09/05/information-on-using-odyssey-free-access/]

ILB: To start, I want to make two points:The ILB did a lot of research last evening on the CAPTCHA issue, and the results follow.

There actually seem to be two issues. The CAPTCHA "feature" apparently is not working right, and JTAC (it is hard to know who to refer to, I know it is not the Supreme Court that is working on these issues, so I will use the term JTAC) is working them out. There are problems with compatibility with some browers, and the CAPTCHA letters/numbers are evidently even more blurry than need be, although I'm told if you elect the "mike" function they clear up some.

But the issue the ILB has been writing about, joined by email messages from attorneys and journalists, is the sudden imposition by JTAC of the CAPTCHA requirement on every case search the public (anyone not on the court version of Odyssey) attempts.

Here I have not received any explanation from Ms. Dolan, except for assertions of "data mining."

So I turned elsewhere for answers. As it happens, one of the best-informed non-state experts on Odyssey is Ray Ontko, president of the ILB supporter, Doxpop, LLC. I sent him a note last evening, asking about data mining. The response:

The term "data mining" (http://en.wikipedia.org/wiki/Data_mining) is probably not the correct term. What they're trying to prevent is often called "web scraping" (http://en.wikipedia.org/wiki/Web_scraping), and is related to "screen scraping" and the general term "data scraping".

The judiciary is claiming that there are folks using automated programs that simulate the actions of a user to do extensive searching of the mycase.in.gov to systematically collect information. When done at top speed, it can simulate the presence of a large number of users and may cause the system to appear slow to other users.

Doxpop does NOT engage in this practice. We were the first (of perhaps very few) that use the bulk data service offered by JTAC to obtain access to the information in a way that does not degrade the performance of the system. We pay handsomely for this (over $125,000 per year).

BTW, we are not permitted to resell the information in bulk, no matter how we reformulate or enhance the information. We are only allowed to provide access to the information in response to specific queries (e.g., by name or case or date).

As an aside, the ILB remembers when the Supreme Court, by order, finally agreed to make Odyssey bulk data available to third party vendors. Until then, as I wrote in this Sept. 14, 2011 post, as "Odyssey conversion expand[ed] to additional counties, it kick[ed] offline long-time third-party information providers such as Doxpop by not allowing such providers access to the case management data generated by the new Odyssey system." The change, which took another year to implement, allows Doxpop to purchase data from Odyessy counties and add them to its own extensive network, so that Doxpop now covers 86 counties.

The ILB sent Mr. Ontko a follow-up question:

Your normal law office or reporter sits down and looks to do a number of searches at once. As for "the scrapers", do you know what kind of entity they are and what they are looking for, case info, or mailing list type info, etc? But there has to be some happy medium for the others. I'm thinking that CAPTCHA should apply per session, rather than per search, for instance. Thoughts?
Of course, this question was somewhat adverse to Mr. Ontko's company's interests, since the more difficult and time-consuming the public system is to use, the more people that may subscribe to Doxpop for their information needs. But he courteously didn't mention that and sent this answer:
I'm not sure who the scrapers are. They could be backgound checking firms that are looking up people of interest to their clients who may be interested in criminal history or driving violations. They could be web site operators that allow searching across many websites for information about a person. They could be enterprising attorneys looking for unrepresented parties in their specialty or region of practice. They could be mailing list companies building lists of addresses. People can be very creative in the use of public information.

JTAC could have implemented the CAPTCHA (http://en.wikipedia.org/wiki/CAPTCHA) mechanism for mycase.in.gov in such a way that the CAPTCHA would only apply once per session instead of once per search. But, that would also make it easier for an automated program to circumvent the CAPTCHA; one could have a human provide the answer to the CAPTCHA at the beginning of the program and then the program would run without hurdles after that. By doing it once per search, the CAPTCHA becomes a practical obstacle for web scraping under most circumstances. Unfortunately, it also creates inconvenience for ordinary human users.

The earlier ILB posts on this issue quoted some small law offices and journalists who have come to rely on Odyssey, but now are finding it a real pain to use. I don't often have a need to use it myself, my last search was during Charlie White's trial in Hamilton County. So I asked the office assistant in a one-man law firm -- "What do you use the system for?":
To check on Court dates;

To see when licenses have been (and not been) sent into the BMV for suspension so you can let the client know if he/she can drive at this point;

When potential client calls in saying what they have been charged with X (usually they are wrong);

When a potential client calls in to see if they have other cases pending (or even if they already have an attorney and are shopping or if they have a court appointed lawyer that they have "forgotten" about);

When a potential client calls in to check which Court the case has been assigned to (since, of course, we have our favorites);

To see what action the Court might have taken on a particular case (we get the paperwork, but, especially when other counties are involved, it is a little slow sometimes);

Sometimes to see what sentence has been on a similar case;

To check to make sure what has been decided in Court or by plea is what was entered into the record.

I know there are other things, but this should give you an idea.

Forgot. We also check for warrants.

I might file 3-5 appearances and continuances at a time. I fax in the request and then wait for the court to take action. Meanwhile one of the first things I do each morning is to call up these people on Odyssey (and people who did not have their license suspended yet) and see when their case is set for so I can send them notification. I do them all at once.

A person must enter the Captcha "word" first and then the case desired. If one misreads and/or mistypes, then the entire process must be repeated from the start. The same is true to locate a second case in the same county. Previously all that was needed was to just hit the back button and enter another search. However, now it is back to square one again.

ILB Conclusion. So, what is the answer? Something other than CAPTCHA, such as screening out certain IP addresses? Or adjusting CAPCHA so that it is only required each session, or maybe after a dozen searches? Or ...?

Let me know your thoughts and experiences.

Posted by Marcia Oddi on September 6, 2013 10:51 AM
Posted to Indiana Courts