Tutorial 16: Metrics & Monitoring with Actuator 📊
📋 Table of Contents
- Understanding the Question
- Actuator Basics
- Built-in Endpoints
- Custom Metrics
- Health Indicators
- Monitoring in Production
- Best Practices
- Practice Questions
Understanding the Question ❓
What are we trying to achieve?
Monitoring answers: "How is my application behaving right now?"
- Is it responding to requests?
- How fast are requests?
- How much memory is it using?
- Are there errors?
- What about external dependencies (database, APIs)?
Without monitoring, you only find out about problems when users complain.
The Monitoring Gap
text
WITHOUT MONITORING:
User: "Your app is slow!"
You: "How slow? When? Which endpoint?"
User: "... I don't know, it was just slow this morning"
You: "😞 Can't do anything with that information"
WITH MONITORING:
Dashboard shows:
├── Response times over last hour
├── Error rate per endpoint
├── Memory usage trending
├── Database query performance
├── Dependency health (external APIs)
└── Alert: "Error rate exceeded 1% at 3:45 PM"
You: Can pinpoint exact issueActuator Basics
Adding Actuator to Your Project
xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>That's it! No other configuration needed.
Enabling Endpoints
yaml
# application.yml
management:
# Expose all endpoints (careful in production!)
endpoints:
web:
exposure:
include: "*" # Exposes all endpoints
# For production, be selective:
# include: health,metrics,info,prometheus
# Exclude sensitive endpoints
endpoints:
web:
exposure:
exclude: env,configprops # Hide environment detailsWhat Actuator Provides (Automatically)
text
Before adding one line of code, you get:
/actuator/health - Is the application running?
/actuator/metrics - List available metrics
/actuator/metrics/{metric} - Get specific metric data
/actuator/info - Application version and info
/actuator/env - Environment variables (careful!)
/actuator/configprops - Configuration properties
/actuator/loggers - Logging configuration
/actuator/shutdown - Gracefully shutdown (disabled by default)
That's 7 endpoints with zero lines of code!Built-in Endpoints
Health Endpoint
The most important endpoint - is the application working?
bash
# Simple health check
GET /actuator/health
Response:
{
"status": "UP"
}
# Detailed health check (with groups)
GET /actuator/health/readiness
{
"status": "UP",
"components": {
"db": {
"status": "UP",
"details": {
"database": "PostgreSQL",
"validationQuery": "isValid()"
}
},
"livenessState": {
"status": "UP"
}
}
}Kubernetes Probes Use Health Endpoints
yaml
# kubernetes-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: my-app:1.0.0
# Kubernetes asks: Is the container alive?
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
# Kubernetes asks: Is the container ready for traffic?
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20
periodSeconds: 5Metrics Endpoint
Access detailed application metrics:
bash
# List all available metrics
GET /actuator/metrics
Response:
{
"names": [
"jvm.memory.used",
"jvm.threads.live",
"http.server.requests",
"process.cpu.usage",
"system.load.average.1m",
...
]
}
# Get specific metric
GET /actuator/metrics/jvm.memory.used
Response:
{
"name": "jvm.memory.used",
"description": "The amount of used memory",
"baseUnit": "bytes",
"measurements": [
{
"statistic": "VALUE",
"value": 350000000
}
],
"availableTags": [
{
"tag": "area",
"values": ["heap", "nonheap"]
},
{
"tag": "id",
"values": ["PS Eden Space", "PS Survivor Space", ...]
}
]
}
# Get metrics filtered by tag
GET /actuator/metrics/jvm.memory.used?tag=area:heap
Response:
{
"measurements": [
{ "statistic": "VALUE", "value": 250000000 }
]
}Common Metrics Explained
text
JVM Metrics (Java Virtual Machine):
├── jvm.memory.used - Memory currently in use
├── jvm.memory.max - Maximum available memory
├── jvm.threads.live - Number of live threads
├── jvm.gc.memory.allocated - Memory allocated by GC
└── jvm.gc.pause - How long garbage collection took
Process Metrics (Your Application):
├── process.cpu.usage - CPU percentage
├── process.uptime - How long app has been running
└── process.files.open - Open file descriptors
HTTP Metrics (Request/Response):
├── http.server.requests - Total requests
├── http.requests.count - By status (2xx, 4xx, 5xx)
└── http.requests.duration - Response time percentiles
Custom Business Metrics:
├── users.created - Users created today
├── orders.processed - Orders processed
└── errors.authentication - Authentication failuresInfo Endpoint
Display application information:
yaml
# application.yml
info:
app:
name: User Service
description: Manages user accounts
version: @project.version@ # From pom.xml
company:
name: MyCompany
email: info@mycompany.com
java:
version: @java.version@
# Response from GET /actuator/info
{
"app": {
"name": "User Service",
"description": "Manages user accounts",
"version": "1.2.3"
},
"company": {
"name": "MyCompany",
"email": "info@mycompany.com"
},
"java": {
"version": "17.0.1"
},
"build": {
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.2.3"
}
}Environment Endpoint (Use Carefully!)
bash
GET /actuator/env
Response:
{
"propertySources": [
{
"name": "systemProperties",
"properties": {
"java.version": "17.0.1",
...
}
},
{
"name": "systemEnvironment",
"properties": {
"PATH": "/usr/local/bin:/usr/bin",
"DB_PASSWORD": "****" # Masked!
...
}
}
]
}
# Specific property
GET /actuator/env/java.version⚠️ Security Warning: This exposes sensitive information. Disable in production:
yaml
management:
endpoints:
web:
exposure:
exclude: env,configprops # Hide from publicCustom Metrics
Creating Business Metrics
java
/**
* Track custom business events
*/
@Service
public class UserService {
private final UserRepository userRepository;
private final MeterRegistry meterRegistry; // Spring provides this!
// Counter: Count how many users created
private final AtomicInteger usersCreated;
// Timer: Track how long operations take
private final Timer userCreationTimer;
public UserService(UserRepository userRepository,
MeterRegistry meterRegistry) {
this.userRepository = userRepository;
this.meterRegistry = meterRegistry;
// Register counter
this.usersCreated = meterRegistry.counter("users.created").getId();
// Register timer
this.userCreationTimer = Timer.builder("users.creation.time")
.description("Time taken to create a user")
.publishPercentiles(0.5, 0.95, 0.99) // Report p50, p95, p99
.register(meterRegistry);
}
public User createUser(CreateUserRequest request) {
// Measure how long this takes
return userCreationTimer.recordCallable(() -> {
User user = new User();
user.setEmail(request.getEmail());
User saved = userRepository.save(user);
// Increment counter
usersCreated.incrementAndGet();
return saved;
});
}
}
// Metrics now available:
// GET /actuator/metrics/users.created
// → {"value": 42}
//
// GET /actuator/metrics/users.creation.time
// → {"p50": 100, "p95": 250, "p99": 500} (milliseconds)Common Metric Types
java
@Service
public class MetricsExamples {
private final MeterRegistry meterRegistry;
// COUNTER - Count occurrences (only goes up)
private AtomicInteger loginAttempts;
// GAUGE - Current value (can go up or down)
private AtomicInteger activeUsers;
// TIMER - Measure duration
private Timer requestTimer;
// DISTRIBUTION SUMMARY - Measure values (like response size)
private DistributionSummary responseSize;
public MetricsExamples(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
// Register counter - for events
this.loginAttempts = meterRegistry.counter("login.attempts").getId();
// Register gauge - current state
this.activeUsers = new AtomicInteger();
meterRegistry.gauge("users.active", activeUsers);
// Register timer - time operations
this.requestTimer = meterRegistry.timer("http.requests.time");
// Register distribution - measure values
this.responseSize = meterRegistry.summary("http.response.size");
}
public void login() {
loginAttempts.increment(); // Counter goes up
}
public void userConnected() {
activeUsers.incrementAndGet(); // Gauge goes up
}
public void userDisconnected() {
activeUsers.decrementAndGet(); // Gauge goes down
}
public void recordRequest(long durationMs, long responseSizeBytes) {
requestTimer.record(Duration.ofMillis(durationMs));
responseSize.record(responseSizeBytes);
}
}Monitoring with @Aspect
java
/**
* Automatically measure all @Monitored methods
*/
@Aspect
@Component
public class MonitoringAspect {
private final MeterRegistry meterRegistry;
@Around("@annotation(monitored)")
public Object monitor(ProceedingJoinPoint joinPoint,
Monitored monitored) throws Throwable {
String methodName = joinPoint.getSignature().getName();
Timer timer = meterRegistry.timer("method.execution." + methodName);
try {
return timer.recordCallable(() -> {
try {
return joinPoint.proceed();
} catch (Throwable e) {
meterRegistry.counter("method.errors." + methodName).increment();
throw e;
}
});
} catch (Exception e) {
throw e;
}
}
}
// Usage: Just add @Monitored annotation
@Service
public class UserService {
@Monitored
public User createUser(CreateUserRequest request) {
// Automatically tracked!
}
}
// Metrics generated:
// method.execution.createUser
// method.errors.createUserHealth Indicators
Custom Health Checks
java
/**
* Create custom health indicator for your dependencies
*/
@Component
public class PaymentServiceHealthIndicator extends AbstractHealthIndicator {
private final PaymentServiceClient paymentClient;
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
try {
// Try to call payment service
boolean isHealthy = paymentClient.ping();
if (isHealthy) {
builder.up()
.withDetail("service", "Payment Service")
.withDetail("status", "operational")
.withDetail("latency_ms", 50);
} else {
builder.down()
.withDetail("reason", "Payment service not responding");
}
} catch (Exception e) {
builder.down()
.withException(e)
.withDetail("error", e.getMessage());
}
}
}
// GET /actuator/health returns:
{
"status": "UP",
"components": {
"paymentServiceHealthIndicator": {
"status": "UP",
"details": {
"service": "Payment Service",
"status": "operational",
"latency_ms": 50
}
},
"db": {
"status": "UP"
}
}
}Health Groups (For Kubernetes)
yaml
# application.yml
management:
health:
# Liveness: Is the app alive? (if false, restart container)
livenessState:
enabled: true
# Readiness: Can the app handle traffic? (if false, route traffic elsewhere)
readinessState:
enabled: true
group:
# Only check essential systems for liveness
liveness:
include: livenessState
# Check all systems for readiness
readiness:
include: readinessState,db,paymentServiceHealthIndicatorbash
# Kubernetes uses these:
# Liveness: GET /actuator/health/liveness
# Response: {"status": "UP"} or {"status": "DOWN"}
# Action: If DOWN → restart pod
#
# Readiness: GET /actuator/health/readiness
# Response: {"status": "UP"} or {"status": "DOWN"}
# Action: If DOWN → remove from service load balancerMonitoring in Production
Export to Prometheus
xml
<!-- Add Prometheus support -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Automatically exposes /actuator/prometheus endpoint -->yaml
# application.yml
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
metrics:
export:
prometheus:
enabled: truetext
GET /actuator/prometheus
Response: Prometheus format (readable by Prometheus scraper)
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap"} 350000000
jvm_memory_used_bytes{area="nonheap"} 50000000
# Prometheus periodically pulls this (every 30 seconds)
# Stores in time-series database
# You can query and alert on itPrometheus Configuration
yaml
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "spring-boot-app"
static_configs:
- targets: ["localhost:8080"]
metrics_path: "/actuator/prometheus"Grafana Dashboards
text
Prometheus → Grafana (dashboards and alerts)
Grafana queries Prometheus:
- "Give me avg response time over last hour"
- "Give me error rate per endpoint"
- "Alert if memory > 80%"
Grafana displays:
- Beautiful graphs
- Real-time dashboards
- Alert notificationsSpring Boot Metrics Stack
text
Your Application
↓
Spring Boot Actuator (MeterRegistry)
↓
Micrometer (vendor-agnostic metrics)
↓
Prometheus (time-series database)
↓
Grafana (visualization & alerting)
↓
PagerDuty (alert notifications)Best Practices
✅ DO: Expose Only Necessary Endpoints
yaml
# ✅ Good - Production config
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus,info
exclude: env,configprops,shutdown❌ DON'T: Expose Everything
yaml
# ❌ Bad - Dangerous in production
management:
endpoints:
web:
exposure:
include: "*" # Exposes everything!✅ DO: Use Meaningful Metric Names
java
// ✅ Good
meterRegistry.counter("orders.created").increment();
meterRegistry.counter("orders.failed").increment();
// ❌ Bad - Vague
meterRegistry.counter("count1").increment();
meterRegistry.counter("error").increment();✅ DO: Monitor Dependencies
java
/**
* Track health of external services
*/
@Component
public class DatabaseHealthIndicator extends AbstractHealthIndicator {
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
// Spring Boot already provides database health!
// But custom ones are useful for external services
}
}
// Monitor:
// - Database connections
// - External APIs
// - Message queues
// - Cache servers✅ DO: Correlate Metrics with Logs
java
// Use MDC to link metrics with logs
MDC.put("requestId", UUID.randomUUID().toString());
meterRegistry.counter("request.processed").increment();
logger.info("Processed request"); // Log includes requestId
// Now you can find all logs for a request:
// 1. See high latency in metrics
// 2. Search logs by requestId
// 3. Find root causePractice Questions
Question 1: What's the Difference Between Health and Metrics?
Q: When should I check /health vs /metrics?
A:
- /health - Binary: UP/DOWN - "Is the app working?"
- /metrics - Detailed: trends, latency, errors - "How is the app performing?"
text
Health: Quick status check (used by Kubernetes)
Metrics: Deep analysis (used by Grafana for optimization)Question 2: How Do You Alert on Metrics?
Q: How do I get notified when something goes wrong?
A: Use Prometheus + Grafana + alerting system:
text
Prometheus scrapes metrics
↓
Stores in database
↓
Grafana queries Prometheus
↓
Evaluates alerting rules (e.g., "error_rate > 1%")
↓
Sends alert to PagerDuty/Slack/EmailQuestion 3: Can You Update Metrics in Real-Time?
Q: Can I see metric changes immediately, or do they update on a schedule?
A: Metrics update immediately in memory, but Prometheus scrapes on a schedule (default 15 seconds). Grafana shows what Prometheus has collected.
text
Your app: Metric changes immediately ✓
Prometheus: Scrapes every 15 seconds
Grafana: Shows data from Prometheus (15-90 second delay)Key Takeaways
- Actuator provides free monitoring: Just add the dependency, get 7+ endpoints
- Health endpoint for availability: Binary UP/DOWN status
- Metrics for performance analysis: Track latency, errors, throughput
- Custom metrics for business events: Track orders, users, etc.
- Kubernetes integration: Health endpoints used for pod management
- Expose selectively: Don't expose env/configprops in production
- Prometheus + Grafana: Standard monitoring stack for Spring Boot
- Monitor dependencies: Database, APIs, caches health indicators
- Correlate metrics with logs: Use trace IDs to drill down
- Alert proactively: Set up alerts before problems affect users