[Remote] Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.

Responsibilities

Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows
Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability
Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes
Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance
Curate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality control
Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale
Implement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalent
Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGI
Train and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollar
Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metrics
Build healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical experts
Apply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical use
Perform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoning

Skills

Bachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related field
Demonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or research
Hands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflows
Strong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflows
Experience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architectures
Strong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or Ray
Experience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflows
Deep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offs
Strong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelines
Strong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environments
Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve
Limited immigration sponsorship may be available
Experience building or optimizing reasoning models, agentic models, or tool-using LLM systems
Familiarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or Ollama
Experience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed compute
Contributions to open-source AI projects, research publications, benchmark development, or model releases
Familiarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sector

Benefits

Substantial performance-based incentive opportunity designed to grow with the value you help create - startup-style upside, with the backing of a committed, well-capitalized platform
You may also be eligible for a discretionary annual incentive based on individual and organizational performance
Limited immigration sponsorship may be available
Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve

Company Overview

Deloitte drives progress. Our firms around the world help clients become leaders wherever they choose to compete. It was founded in 1900, and is headquartered in Marunouchi, Tokyo, JPN, with a workforce of 10001+ employees. Its website is https://www2.deloitte.com/jp/en/pages/about-deloitte/articles/dtrs/dtrs-company-profile.html.

Company H1B Sponsorship

Deloitte has a track record of offering H1B sponsorships, with 1055 in 2026, 6871 in 2025, 4911 in 2024, 5604 in 2023, 8090 in 2022, 5993 in 2021, 10388 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Related remote jobs

[Remote] Senior DevOps Engineer (V)

[Remote] Senior Full Stack Engineer (AWS, NodeJS exp.)

[Remote] Regional Account Manager - Chicago

[Remote] Performance Marketing Manager, Paid Social

[Remote] Account Growth Manager (South Central)

[Remote] Distinguished Engineer, GPU Fleet Operations Automation

[Remote] Senior Software Engineer - Platform & Integrations

[Remote] Healthcare Finance & Operations Intern (Contract, ~3 Months)

[Remote] Earned Value (EV) Control Account Manager (Remote)

[Remote] Digital Signage National Account Manager

Ruby On Rails Backend Developer job at GovCIO in US National (Not hiring in HI)

Remote Data Entry Specialist – Flexible Day & Night Shifts | Work-From-Home Position at arenaflex | Competitive Compensation $25-$45/Hour

Experienced Customer Service Representatives – Live Chat Support Team (FULLY REMOTE)

Associate, Model Risk - Actuarial (Hybrid)

Medical Transportation Dispatcher | Remote Overnight (9:00pm-5:30am)

Hardware Security and Vulnerability Analyst - Remote

Experienced Live Chat Specialist – Deliver Exceptional Customer Service in a Dynamic Remote Environment

Experienced Customer Support Associate – Remote Opportunity to Deliver Exceptional Service at arenaflex

Lecturer - Special Education - College of Educa...

Senior Data Entry Specialist – Remote Data Management & Information Integrity Professional at arenaflex