DFR Checklist

The Developer First Responder, DFR, is a rotating on-call role responsible for monitoring system health, triaging issues, and serving as the first point of contact for operational concerns related to the Invoicing Platform Service, IPS. This rotation typically lasts one week. Additional responsibilities include facilitating team standups, responding to internal support requests, and documenting incidents and resolutions.

Quick Reference - Common scenarios
Get Started - Week setup and handoff
Daily Responsibilities - Time-tracking, standup, and monitoring
Investigation Best Practices - Handling outages, restarting systems, escalating
Mid-Month Tasks - Code freshness reviews
MIP Tasks - Monthly Invoicing Process monitoring and checklist

Quick Reference

Scenario	Steps
Got paged?	Check `#ips-signals` → Review observability dashboard → Escalate to `#ips-dev` if needed
Service down?	Check audit logs → Post in `#ips-signals` before restarting → Document incident
Support question?	Answer if possible → Tag SME if unsure → Add to standup parking lot if unresolved
End of rotation?	Close ticket → Update handoff notes → Brief incoming DFR on Monday

Get Started

Complete the following steps to setup for success:

Create a DFR-tracking ticket
- Project: Financial Systems
- Component: IPS
- Labels: DFR, IPS, Invoicing, NO-QA
- Workflow: In-Progress
- Tip: clone the previous week’s DFR-tracking ticket to use as a template
Attend Handoff meeting
- Usually Monday morning, Google calendar title - “IPS-Ops Handoff”
- Discuss open issues and incomplete work with outgoing DFR
- If not on the calendar, request access from team lead
- Note: This meeting is optional - coordinate with outgoing DFR
Document the work
- Add comments to the DFR-tracking ticket to record completed tasks
- Use judgment on what’s worth documenting to reduce noise
- Include support channel Q&A to share knowledge with the team
Wrap-up the week
- Close the DFR-tracking ticket
- Update any incomplete investigations for the incoming DFR
- Prepare notes for the next IPS-Ops Handoff meeting

Daily Responsibilities

Time Tracking: all team members must log time in Jira against the current week’s DFR-tracking ticket for any DFR-related work, log time for -

Facilitating or delegating tasks to others
Addressing production issues
Identifying, asking, or answering questions
Investigations, reporting, and verification
Incident remediation

Standup Facilitation: run the team’s Kanban board during daily standups, present any overnight and/or updates for ongoing issues, and discuss any issues allocated to the parking lot

System Monitoring

Email monitoring is not necessary, as all Invoicing system alerts, including those administered by Datadog and PagerDuty, should route to the primary monitoring Slack channels -

Channel	Purpose	Action Required
`#ips-signals`	System health alerts and anomalies	Investigate, document findings, escalate if needed
`#finsys-alerts`	Cross-system financial alerts	Monitor for IPS-related issues
`#support-ips`	Internal user support requests	Respond or route to the SME, subject matter expert
`#support-general`	General technical support	Monitor for IPS-related questions
`#incident-alerts`	New incident tickets	Triage issues tagged with IPS components

Secondary Monitoring: more #finsys-* channels may be relevant depending on integrations and dependencies; refer to the IPS Onboarding documentation for a complete list

Dashboard Checks: while Slack alerts should cover critical issues, occasional dashboard checks can help identify trends; check the Observability Dashboard for irregular patterns in service metrics and the SLO Dashboard to review daily service availability

Investigation Best Practices

Focus on work items that improve operational stability and reduce future incidents. Triage tasks by impact - production issues first, quick wins (answer any <5 minute questions), and set expectations for non-urgent requests. Communication is key – close any feedback loops opened.

Handling Health Check Outages: services may experience brief outages due to network issues or infrastructure changes and when investigating, check the audit logs first - remember that any outages recovering in <10 minutes typically don’t require deep investigation, but document any patterns of short, repetitive outages

Before Restarting Services:

⚠️ Always communicate before manually restarting any service -

Post a brief note in #ips-signals or #ips-dev before proceeding
For QA environments, notify the QA team to avoid disrupting active testing
For Production restarts, communication is required

Escalation Path:

Document investigation in #ips-signals
Tag/ping relevant team members for additional perspective
If urgent and/or unresolved, escalate to #ips-dev
Add open questions to standup parking lot

Mid-Month Tasks

Code Freshness Review: monitor code freshness emails, usually titled “[Action Required] Some of your applications/libraries are stale or at-risk of being stale”, review flagged services and libraries, approve and merge outstanding library updates, and coordinate with the team if major updates are needed

Known Issues: mid-month spike in queued messages metric is expected and related to database and IO-Poller performance; this typically resolves within an hour and will be addressed in the database migration

MIP Tasks

MIP, the last couple of days of each month, are a heightened monitoring period, ensure to:

Increase monitoring frequency, hourly checks instead of daily
Prioritize invoice processing and financial close activities
Be available for urgent fixes

Refer to the Invoicing Reference Guide for comprehensive information; key points include -

Code Freeze Protocol: Hot fixes for MIP issues are exempt from standard code freeze
Approval Process: Hot fixes don’t require pre-approval, but must be tracked
Communicate: Post all MIP deployments in #finsys-freeze-approvals
Coordinate: Include broader Financial Systems teams as needed

Note: FFEH is a practice documentation site created for portfolio purposes; while the patterns and approaches reflect real-world engineering practice, the onboarding materials below are fictional

Essential Resources:

How to Create and Maintain API Keys
How to Retry Failed Invoice Requests
Developer First Responder Process
Getting Ready for On-Call (PagerDuty Setup)

Reference Materials:

IPS Onboarding Guide
IPS Architecture Overview
Incident Response Playbook
IPS Runbook

Calendars & Schedules:

IPS DFR Rotation Calendar

_Last updated: [2026-02-02]

Maintained by: IPS Team @ Fake Company, Inc._