- Career Center Home
- Search Jobs
- Site Reliability Engineer (Remote)
Description
The following states/districts are excluded from this job ad: AK, CA, CO, CT, DC, HI, LA, MA, MN, MO, NE, NV, NH, NJ, NM, NY, ND, OR, PR, RI, VT, WA, WY
Future Need - Actively Interviewing
Location: Remote in any United States jurisdiction not excluded from this job advertisement.
Be the technical backbone of reliability for a mission-critical Department of Veterans Affairs (VA) cloud platform. As a Site Reliability Engineer, you will serve as the liaison between platform and product teams, ensuring 99.99% availability across hundreds of applications through proactive incident response, capacity planning, and automation.
Position Description: The Site Reliability Engineer leads incident triage and root cause analysis, and drives automation to reduce operational toil across the platform.
Tasks/activities include, but are not limited to:
- Performs daily monitoring and reporting of performance metrics including the four Golden Signals and incident-free availability for each assigned product
- Participates in 24x7 on-call support rotations as needed providing subject matter expertise for platform services and product dependencies
- Leads or supports incident response activities including triage, mitigation, resolution, and recovery in accordance with established standards
- Conducts RCA and develops corrective and preventive actions
- Produces incident reports and post-incident reviews within defined timelines
- Defines, implements, and maintains KPIs providing real-time and historical insight into product reliability and availability
- Conducts capacity planning and load analysis ensuring systems meet current and forecasted demand across all environments
- Designs, implements, and maintains automation for deployment, testing, configuration, scaling, and recovery processes to reduce manual operational toil
- Validates system readiness prior to releases, patches, and upgrades
- Monitors and supports post-deployment activities ensuring stability and performance
- Participates in daily Change Control Board (CCB) meetings supporting safe, reliable, and repeatable releases across assigned products
- Serves as liaison between the platform team and product teams, providing subject matter expertise regarding platform and product architecture, configurations, and dependencies
Compensation & Benefits: The annual projected pay range for this position is $80,241 - $119,603 with consideration being given to various factors including but not limited to qualifications, experience, job responsibilities, and geographic location.
Oxley Enterprises, Inc. offers a full array of benefits including:
- Medical, dental, vision and prescription drug coverage for you and your family.
- Life Insurance, short-term disability and long-term disability paid for by the Company.
- Supplemental coverages including Accident, Critical Illness, and Hospital.
- Additional Life insurance coverage for you and your dependents.
- 401k plan with various options to select based on your retirement goals.
Oxley Enterprises®, Inc. is a certified service-disabled veteran-owned (SDVOSB), veteran-owned (VOSB), and woman-owned small business (WOSB) that has 26 years of experience building and delivering quality IT systems and programs. Oxley is ranked in the INC 5000 7 times (2016, 2017, 2018, 2021, 2023, 2024, 2025). Oxley is a 2019 - 2025 Department of Labor HIRE Vets Medallion Award Winner. Oxley is Virginia Values Veterans certified.
All qualified applicants will receive consideration for employment without regard to any status protected by applicable federal, state, or local law.
If you require a reasonable accommodation to apply for a position at Oxley Enterprises, Inc., please send an email to our Human Resources Department at: careers@oxleyenterprises.com with the following information:
Subject Line: Accommodation Request
Provide a description of your accommodation request
Include your contact information: Full name, Email address, Best number to reach you (optional)
We participate in the E-Verify program. http://www.dhs.gov/E-Verify
Requirements
Minimum/General Experience: 3 years of experience in site reliability engineering or platform operations
Minimum Education: Bachelor's Degree in computer science, information technology, or related field; AWS Certified DevOps Engineer - Associate or Certified Kubernetes Administrator (CKA) (preferred)
Essential Skills/Qualifications:
- Excellent experience performing daily monitoring and reporting of the four Golden Signals (latency, error rate, volume, saturation)
- Excellent ability to participate in 24x7 on-call support rotations
- Excellent experience leading or supporting incident response including triage, mitigation, resolution, and recovery
- Excellent ability to conduct Root Cause Analysis (RCA) and develop corrective and preventive actions
- Excellent experience defining, implementing, and maintaining Key Performance Indicators (KPIs)
- Above average experience designing and maintaining automation for deployment, testing, configuration, scaling, and recovery processes
- Above average knowledge of Infrastructure as Code (IaC) practices
- Above average ability to support and maintain Terraform-based environment provisioning
- Working knowledge of capacity planning and load analysis
- Experience supporting a federal agency
- Excellent verbal and written communication skills
General Physical Requirements needed to perform the essential functions of this job may vary based on the location of the assignment.
- Assignment Location - Remote
- Sedentary Work - Exerting up to 10 pounds of force occasionally and/or a negligible amount of force frequently or constantly to lift, carry, push, pull or otherwise move objects.
- Typing, communicating, repetitive motions.
- Close visual acuity to prepare and analyze data, view computer monitors and read. May need to view presentation screens and other visual aids in a virtual setting.
- Inside environmental conditions with protection from outside elements.
Security: Active Federal Civilian Public Trust clearance
- U.S. Citizenship or Permanent Resident that has lived in the United States for at least 3 years
Federal Civilian Public Trust Consists of a review of up to but not limited to:
- Covers 10 year period and in some instances lifetime events
- OPM Security Investigations Index (SII)
- DOD Defense Central Investigations Index (DCII)
- National Agency Check (NAC) records
- FBI name check
- FBI fingerprint check
- Credit report check
- Written inquiries to previous employers and references listed on the application for employment
- Potential interviews with the subject, spouse, neighbors, supervisor, coworkers
- Law enforcement check
- Court records check
- Education check - Attendance and Degrees