Data & Analytics
Description of the data model, relationship between entities and data flow diagrams.
Overview
Transactional data in the platform is stored in the RDBMS (PostgreSQL). Data for analytics is enriched & pipelined to ElasticSearch indexes.
Persistence
PostgreSQL is used as the transactional database. ElasticSearch is used as the analytical store. The platform is cloud agnostic and can be implemented on cloud or on-premise infra.
Note:
Partitioning and sharding strategies for optimizing query performance are to be done during implementation depending on the business need & volumes of data.
Data replication for high availability is to be decided during the implementation of the platform. The general recommendation is to have backups in different regions for DR.
Logical Data Architecture
Data Sources
All services emit data when the respective entities they manage are created/edited. Data is pushed to a Kafka topic. The topics to which they emit data are configured in the respective services.
Events
DRISTI follows event-driven architecture principles. All services publish data to Kafka topics consumed by other services for further downstream processing. This section summarises the topics, and the services published to or consumed from them, with descriptions and relevant index names (if applicable).
A formal eventing system with schema registries and versioning is on the roadmap.
Dristi Event Processor (Analytics)
Consumes from
save-wf-transition
Workflow transitions published by the Workflow service.
Application
Publishes to
save-application-transform, update-application-transform
Publishes transformed application data.
application-index
Case
Publishes to
save-case-application, update-case-application, litigant-join-case, representative-join-case
Full payloads of case objects in save-case-application and update-case-application; Partial payloads with litigant/representative details in the other topics.
Case
Consumes from
case-overall-status-topic, case-outcome-topic
Consumes analytics data to update case information.
Hearing
Consumes from
update-case-status-application
Consumes workflow status updates from Case service.
Indexer
Consumes from
save-case-transform, update-case-transform, save-order-transform, update-order-transform, save-task-transform, update-task-transform
Reads enriched data from the Transformer service and pushes it to Elasticsearch.
case-index, order-index, task-index
Order
Publishes to
save-order-application-transform, save-order-case-transform
Publishes transformed order data, enriched by the Transformer service.
order-index
Persister
Consumes from
save-case-application, update-case-application, litigant-join-case, representative-join-case, update-order-application, update-order-transform, save-task-application
Processes and persists case, order, and task data into the database.
Scheduler
Publishes to
schedule-hearing, update-schedule
Publishes scheduling information for hearings.
Task
Publishes to
save-task-application, update-task-transform
Publishes task-related events after enrichment.
task-index
Transformer
Publishes to
save-application-transform, save-case-transform, save-hearing-transform, save-order-application-transform, update-application-transform, update-case-transform, update-hearing-transform, update-order-transform, update-task-transform
Publishes enriched application, case, hearing, order, and task data. Indexer reads these and updates the corresponding indices in Elasticsearch.
application-index, case-index, hearing-index, order-index, task-index
Here's another view of the same matrix for easier reference. This matrix shows which services publish to ("P") or consume from ("C") specific topics. If a service publishes a topic and updates an index, the index name is included in parentheses. If there is no interaction, it is marked with "-".
save-case-application
P
-
-
-
-
-
C
C
-
-
update-case-application
P
-
-
-
-
-
C
C
-
-
update-case-status-application
P
-
-
-
-
-
C
-
-
-
litigant-join-case
P
-
-
-
-
-
C
-
-
-
representative-join-case
P
-
-
-
-
-
C
-
-
-
case-overall-status-topic
C
-
-
-
-
-
C
-
-
P (case-index)
case-outcome-topic
C
-
-
-
-
-
C
-
-
P (case-index)
update-order-application
-
-
-
-
P
-
C
C
-
-
save-order-transform
-
-
-
-
-
-
-
P (order-index)
C
-
update-order-transform
-
-
-
-
-
-
-
P (order-index)
C
-
save-task-application
-
-
-
-
-
P
C
C
-
-
update-task-application
-
-
-
-
-
P
C
C
-
-
save-task-transform
-
-
-
-
-
-
-
P (task-index)
C
-
update-task-transform
-
-
-
-
-
-
-
P (task-index)
C
-
save-wf-transition
-
-
-
-
-
-
-
-
-
C
Data Flows
The underlying DIGIT platform allows for asynchronous writes to the database via Kafka. Most DRISTI services use this model to write to DBs. In certain services where there's a need to maintain ACID compliance, services can choose to write directly to the database bypassing the persister. For example, payments and billing services are written directly to the RDBMS.
Reads happen from the RDBMS or ElasticSearch via search APIs.
Certain services also write and read directly from ElasticSearch indexes. Refer to the specific service documentation for details.
Data flow for READS
Reads are enabled directly from the RDBMS or the analytical store.
TBD
Data flow for WRITES
The persister service subscribes to select topics via configuration files. Data is read from the topics, translated to SQL and written to the relational database.
The indexer service (if configured for a service) upserts data to the analytical store (ElasticSearch) via configuration files. Attributes selected to be written to the analytical store and the final data payload format are encoded in the indexer configuration files. The indexer also performs basic enrichment of workflow/master data etc.. via config. Data is upserted into ElasticSearch. i.e. the latest version of each record is stored in the analytical database.
Shown below is an illustrative flow of writes in the system.
Data Governance and Security
TBD
Data Archival Strategies
TBD
Last updated