Publicação

Replicated Data Types for Graph Databases

Ver documento

Detalhes bibliográficos
Resumo:Nowadays graph databases have seen an increase in popularity in applications with highly interconnected data like networks such as social networks, biological, or transportation. Such applications can be implemented on a spectrum of graph data models, from labelled graphs to property graphs. Graph databases allow for flexibility in schema enforcement, offering schema-flexible and even schema-free approaches. Graph databases are used by large-scale applications that require low latency and high availability for millions of users across the world. Nowadays, the usual approach to achieve these latency and availability demands is to resort to data replication across multiple data centers and rely on a weak consistency model. The use of weak consistency introduces scenarios where concurrent updates may lead the replicas’ states to diverge. Replicated data types (CRDTs) implement a deterministic resolution of such conflicts, ensuring state convergence among all replicas. Even though CRDTs guarantee conver- gence, the final state may be invalid if convergence rules do not consider data invariants. For example, implementing a property graph CRDT, requires more complex data types in the state, since nodes and edges have sets of key-value pairs with properties, these are considered invariants and create the possibility of invalid states. Also, a property graph can be bound to a schema, and if so it becomes a graph invariant that is to be maintained. With this work, we intend to develop specialized graph CRDTs in Datalog (Soufflé) for multiple graph models and multiple conflict policies as Datalog offers a high level of abstraction, monotonicity and safety.
Autores principais:Almas, Luís Filipe Raimundo
Assunto:CRDT Graph databases Weak Consistency Invariants Graph database Schemas Graph database models
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:Nowadays graph databases have seen an increase in popularity in applications with highly interconnected data like networks such as social networks, biological, or transportation. Such applications can be implemented on a spectrum of graph data models, from labelled graphs to property graphs. Graph databases allow for flexibility in schema enforcement, offering schema-flexible and even schema-free approaches. Graph databases are used by large-scale applications that require low latency and high availability for millions of users across the world. Nowadays, the usual approach to achieve these latency and availability demands is to resort to data replication across multiple data centers and rely on a weak consistency model. The use of weak consistency introduces scenarios where concurrent updates may lead the replicas’ states to diverge. Replicated data types (CRDTs) implement a deterministic resolution of such conflicts, ensuring state convergence among all replicas. Even though CRDTs guarantee conver- gence, the final state may be invalid if convergence rules do not consider data invariants. For example, implementing a property graph CRDT, requires more complex data types in the state, since nodes and edges have sets of key-value pairs with properties, these are considered invariants and create the possibility of invalid states. Also, a property graph can be bound to a schema, and if so it becomes a graph invariant that is to be maintained. With this work, we intend to develop specialized graph CRDTs in Datalog (Soufflé) for multiple graph models and multiple conflict policies as Datalog offers a high level of abstraction, monotonicity and safety.