Arya Hariharan is an undergraduate student in the Department of Computer Science and Engineering at RV College of Engineering (RVCE), Bengaluru, India (2022-2026 batch). Throughout her academic years at the college, Arya has been involved in several industry projects with esteemed organizations such as HPCC Systems, Nokia Corporation and Samsung Research India, covering a wide array of domains such as machine learning, computer networking and generative AI. Being curious and goal-driven, she is open to new opportunities and ready to learn about things she is unfamiliar with.

Poster Abstract

Lawyers and legal professionals often utilize digital databases to carry out research to build strong cases, provide accurate legal advice and stay informed about legal changes. However, the vast amount of data can be overwhelming, requiring strong research skills to find relevant references. To address this issue, we propose an application that enhances legal research through data enrichment using HPCC Systems.

The application leverages Natural Language Processing (NLP) to extract keywords from a case abstract entered by the user via Named Entity Recognition (NER). The application can recognize up to 12 different legal entities in any given abstract. These keywords are then used to search for case statements in a main cases dataset. The extracted keywords and their corresponding case statements are stored in a separate dataset containing 1,200,000 keywords. This dataset was sprayed onto the HPCC Systems cluster, where it was indexed for efficient reference searches. A full keyed join was performed between the keywords and the main indexed dataset to retrieve relevant references with one or more matches. The distributed architecture of HPCC Systems enabled parallelism, thus improving search efficiency. Users can interact with the application through a web interface and after the search and retrieval process, relevant references are displayed to the user through the same interface.

The application developed demonstrates state-of-the-art performance compared to multithreaded processes in Python and Hadoop. When tested on the same dataset using a single-node local cluster, the application using HPCC Systems had an average latency of 1.7 seconds while Hadoop had an average latency of 6 seconds. A multithreaded Python application was significantly slower, averaging 13 seconds to search a dataset of 300 records using five keywords.

The proposed application enhances legal research efficiency and accuracy by using NLP for keyword extraction and leveraging HPCC Systems for rapid data retrieval, ensuring quick and relevant reference searches. Among those who will benefit from this application are lawyers who wish to simplify the task of finding relevant legal references, academics and law students who usually conduct extensive research, and legal organizations overall.

Presentation

In this Video Recording, Arya provides a tour and explanation of her poster content.

Enhancing Legal Assistance through Data Enrichment with HPCC Systems:

Click on the poster for a larger image.

HPCC

Arya Hariharan - 2024 Poster Contest Resources

Poster Abstract

Presentation

Enhancing Legal Assistance through Data Enrichment with HPCC Systems:

Related content