Distributed Key Value Store from Scratch
July 11, 2025Shreder: Building a Distributed Cache System
Welcome to Shreder, a high-performance distributed cache system built in Go. This tutorial will guide you through understanding, building, and using a production-ready distributed cache that implements consistent hashing and peer-to-peer replication.
What You'll Learn
By the end of this tutorial, you'll understand:
- How distributed caching works
- Consistent hashing algorithms
- Peer-to-peer replication strategies
- Building scalable Go applications
- RESTful API design for distributed systems
Architecture Overview
Shreder implements a distributed cache using three core components:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Node 1 │◄──►│ Node 2 │◄──►│ Node 3 ││ :8060 │ │ :8061 │ │ :8062 │└─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └───────────────────┼───────────────────┘ │ Hash Ring (Consistent Hashing)
Key Features
- Distributed Architecture: Data is automatically distributed across multiple nodes
- Consistent Hashing: Ensures even distribution and minimal data movement when nodes join/leave
- Automatic Replication: Data is replicated across peer nodes for fault tolerance
- RESTful API: Simple HTTP interface for cache operations
- High Availability: System continues operating even when individual nodes fail
Getting Started
Prerequisites
- Go 1.23.1 or later
- Basic understanding of HTTP and JSON
Installation
Clone and build the project:
git clone https://github.com/UnplugCharger/shreder.gitcd shredergo mod tidygo build -o bin/shreder main.go
Core Components Deep Dive
1. Hash Ring Implementation
The heart of our distributed system is the consistent hashing algorithm. Let's examine how it works:
// hash_ring/hashring.gopackage hash_ring
import ( "crypto/sha1" "sort" "sync")
type Node struct { ID string Address string}
type HashRing struct { nodes []Node hashes []uint32 lock sync.RWMutex}
func NewHashRing() *HashRing { return &HashRing{}}
func (h *HashRing) AddNode(node Node) { h.lock.Lock() defer h.lock.Unlock() hash := h.hash(node.ID) h.nodes = append(h.nodes, node) h.hashes = append(h.hashes, hash) sort.Slice(h.hashes, func(i, j int) bool { return h.hashes[i] < h.hashes[j] })}
func (h *HashRing) GetNode(key string) Node { if len(h.nodes) == 0 { return Node{} }
h.lock.RLock() defer h.lock.RUnlock()
hash := h.hash(key) index := sort.Search(len(h.hashes), func(i int) bool { return h.hashes[i] >= hash })
if index == len(h.hashes) { index = 0 }
targetHash := h.hashes[index] for _, node := range h.nodes { if h.hash(node.ID) == targetHash { return node } }
return h.nodes[0]}
How it works:
- Each node is assigned a position on a virtual ring using SHA1 hashing
- Keys are also hashed to find their position on the ring
- A key is assigned to the first node clockwise from its position
- This ensures even distribution and minimal data movement when nodes change
2. In-Memory Cache with LRU Eviction
Our cache implements an LRU (Least Recently Used) eviction policy with TTL support:
// shreder/cache.gotype CacheItem struct { Value string TimeToLive time.Time}
type Cache struct { mu sync.RWMutex Items map[string]*list.Element eviction *list.List capacity int}
func (c *Cache) Get(key string) (string, bool) { c.mu.RLock() defer c.mu.RUnlock() item, found := c.Items[key] if !found || time.Now().Local().After(item.Value.(entry).value.TimeToLive) { if found { c.eviction.Remove(item) delete(c.Items, key) } return "", false }
c.eviction.MoveToFront(item) return item.Value.(entry).value.Value, true}
func (c *Cache) Set(key string, value string, ttl time.Duration) { c.mu.Lock() defer c.mu.Unlock() if item, found := c.Items[key]; found { c.eviction.Remove(item) delete(c.Items, key) }
if c.eviction.Len() >= c.capacity { c.evictLRU() }
item := CacheItem{ Value: value, TimeToLive: time.Now().Local().Add(ttl), }
elem := c.eviction.PushFront(entry{key, item}) c.Items[key] = elem}
Key features:
- Thread-safe: Uses RWMutex for concurrent access
- LRU eviction: Automatically removes least recently used items when capacity is reached
- TTL support: Items expire after a specified time duration
- Background cleanup: Expired items are cleaned up automatically
3. HTTP Server and Request Routing
The server handles HTTP requests and routes them to the appropriate nodes:
// shreder/server.gotype CacheServer struct { cache *Cache peers []string mu sync.Mutex selfID string hashRing *hash_ring.HashRing}
func (cs *CacheServer) SetHandler(w http.ResponseWriter, r *http.Request) { bodyBytes, err := io.ReadAll(r.Body) if err != nil { w.WriteHeader(http.StatusBadRequest) return }
var request setRequest if err := json.Unmarshal(bodyBytes, &request); err != nil { w.WriteHeader(http.StatusBadRequest) return }
isReplication := r.Header.Get(replicationHeader) == "true" targetNode := cs.hashRing.GetNode(request.Key)
if targetNode.Address == cs.selfID { cs.cache.Set(request.Key, request.Value, 10*time.Minute) if !isReplication { cs.replicaset(request.Key, request.Value) } w.WriteHeader(http.StatusOK) } else { r.Body = io.NopCloser(bytes.NewReader(bodyBytes)) cs.forwardRequest(targetNode, r, w) }}
Request flow:
- Client sends request to any node
- Node uses hash ring to determine responsible node
- If current node is responsible, it processes the request locally
- Otherwise, it forwards the request to the responsible node
- Data changes are replicated to peer nodes
4. Peer-to-Peer Replication
Replication ensures data availability even when nodes fail:
// shreder/replication.gofunc (cs *CacheServer) replicaset(key string, value string) { cs.mu.Lock() defer cs.mu.Unlock()
req := struct { Key string `json:"key"` Value string `json:"value"` }{ Key: key, Value: value, }
data, _ := json.Marshal(req)
for _, peer := range cs.peers { if peer != "" && peer != cs.selfID { go func(peer string) { peerURL := peer if !strings.HasPrefix(peerURL, "http://") { peerURL = "http://" + peerURL }
client := &http.Client{} req, err := http.NewRequest("POST", peerURL+"/set", bytes.NewReader(data)) if err != nil { log.Printf("Failed to create replication request: %v", err) return } req.Header.Set("Content-Type", "application/json") req.Header.Set(replicationHeader, "true")
resp, err := client.Do(req) if err != nil { log.Printf("Failed to replicate to peer %s: %v", peer, err) return } defer resp.Body.Close() }(peer) } }}
Replication strategy:
- Asynchronous replication to all peer nodes
- Special header prevents infinite replication loops
- Graceful error handling for failed replications
Running the System
Single Node Setup
Start a single cache node:
./bin/shreder --port :8060
Multi-Node Cluster Setup
Start a 4-node cluster:
# Terminal 1 - Node 1./bin/shreder --port :8060 --peers localhost:8061,localhost:8062,localhost:8063
# Terminal 2 - Node 2./bin/shreder --port :8061 --peers localhost:8060,localhost:8062,localhost:8063
# Terminal 3 - Node 3./bin/shreder --port :8062 --peers localhost:8060,localhost:8061,localhost:8063
# Terminal 4 - Node 4./bin/shreder --port :8063 --peers localhost:8060,localhost:8061,localhost:8062
API Usage Examples
Setting Values
Store a key-value pair:
curl -X POST http://localhost:8060/set \ -H "Content-Type: application/json" \ -d '{"key": "user:123", "value": "john_doe"}'
// JavaScript exampleconst response = await fetch('http://localhost:8060/set', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ key: 'user:123', value: 'john_doe' })});
Getting Values
Retrieve a value by key:
curl "http://localhost:8060/get?key=user:123"
// JavaScript exampleconst response = await fetch('http://localhost:8060/get?key=user:123');const value = await response.text();console.log(value); // "john_doe"
Testing the System
Automated Testing
Run the comprehensive test suite:
# Start all nodes firstmake server1 &make server2 &make server3 &
# Run tests./test_replication.sh
Manual Testing
Test data distribution and replication:
# Set data on different nodescurl -X POST http://localhost:8060/set -H "Content-Type: application/json" -d '{"key": "test1", "value": "hello"}'curl -X POST http://localhost:8061/set -H "Content-Type: application/json" -d '{"key": "test2", "value": "world"}'
# Verify data is accessible from any nodecurl "http://localhost:8062/get?key=test1" # Should return "hello"curl "http://localhost:8063/get?key=test2" # Should return "world"
Performance Characteristics
Benchmarking
The system provides excellent performance characteristics:
- Throughput: Handles thousands of requests per second per node
- Latency: Sub-millisecond response times for cache hits
- Scalability: Linear scaling with additional nodes
- Memory efficiency: LRU eviction prevents memory exhaustion
Monitoring
Monitor your cluster with these key metrics:
# Check node healthcurl http://localhost:8060/health
# Monitor cache statisticscurl http://localhost:8060/stats
Production Considerations
Security
For production deployment, consider:
// Add authentication middlewarefunc authMiddleware(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { token := r.Header.Get("Authorization") if !validateToken(token) { w.WriteHeader(http.StatusUnauthorized) return } next.ServeHTTP(w, r) })}
Configuration Management
Use environment variables for configuration:
// main.gofunc main() { port := os.Getenv("CACHE_PORT") if port == "" { port = ":8060" }
peers := strings.Split(os.Getenv("CACHE_PEERS"), ",")
cache := shreder.NewCacheServer(peers, port) cache.Start(port)}
Docker Deployment
Create a Dockerfile for containerized deployment:
FROM golang:1.23.1-alpine AS builderWORKDIR /appCOPY . .RUN go build -o shreder main.go
FROM alpine:latestRUN apk --no-cache add ca-certificatesWORKDIR /root/COPY --from=builder /app/shreder .EXPOSE 8060CMD ["./shreder"]
Advanced Features
Custom Eviction Policies
Extend the cache with custom eviction strategies:
type EvictionPolicy interface { ShouldEvict(item *CacheItem) bool OnAccess(key string)}
type TimeBasedEviction struct { maxAge time.Duration}
func (t *TimeBasedEviction) ShouldEvict(item *CacheItem) bool { return time.Since(item.CreatedAt) > t.maxAge}
Metrics and Observability
Add Prometheus metrics:
import "github.com/prometheus/client_golang/prometheus"
var ( cacheHits = prometheus.NewCounter(prometheus.CounterOpts{ Name: "cache_hits_total", Help: "Total number of cache hits", })
cacheMisses = prometheus.NewCounter(prometheus.CounterOpts{ Name: "cache_misses_total", Help: "Total number of cache misses", }))
Troubleshooting
Common Issues
Node Discovery Problems:
# Check if nodes can reach each othercurl http://localhost:8061/health
Replication Failures:
# Check logs for replication errorstail -f server1.log | grep "replication"
Memory Issues:
# Monitor memory usageps aux | grep shreder
Conclusion
You've now built a production-ready distributed cache system! Shreder demonstrates key concepts in distributed systems:
- Consistent hashing for data distribution
- Peer-to-peer replication for fault tolerance
- RESTful APIs for easy integration
- Concurrent programming with Go
Next Steps
Consider extending Shreder with:
- Persistent storage backends
- Advanced monitoring and alerting
- Authentication and authorization
- Compression and serialization options
- Multi-datacenter replication
The foundation you've built here can scale to handle production workloads and serves as an excellent starting point for more advanced distributed systems projects.
Resources
Happy caching! 🚀