Ralf B. Lukner MD PhD

5.2K posts

Ralf B. Lukner MD PhD banner
Ralf B. Lukner MD PhD

Ralf B. Lukner MD PhD

@lukner

Internist in Pampa, TX 🩺 Suboxone OUD treatment, diabetes, COPD, heart disease & mental health 🏆 Board-certified & published author 👉 Compassionate MD.

Pampa, TX 加入时间 Mart 2009
2.9K 关注1.1K 粉丝
Ralf B. Lukner MD PhD
•Clinical presentation: Painful erythematous bullous/nodular lesions on the right hand with proximal nodules along the forearm, showing sporotrichoid (lymphocutaneous) spread after likely traumatic soil/gardening inoculation over 10 days. •Microbiology: Nocardia causes this pattern; aspiration shows filamentous, gram-positive, branching, weakly acid-fast rods (beaded on Gram stain), consistent with an aerobic soil organism. •Distinguishes from: Actinomyces is branching but not acid-fast and anaerobic; fungi and mycobacteria differ by stain/growth features. •Confirmation: Culture confirms the diagnosis. This matches an NEJM Image Challenge case of sporotrichoid nodular lymphangitis due to Nocardia.
English
0
0
0
12
Mastering the Differential 🩺
A 75-year-old woman presents with a 10-day history of painful lesions on the right hand and forearm. 👇 Aspiration of a lesion shows filamentous gram-positive acid-fast branching rods. What's the diagnosis?
Mastering the Differential 🩺 tweet media
English
14
14
73
20K
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Most Likely DDx: Molluscum Contagiosum The zoomed in image is diagnostic. Key findings: Central umbilication is the smoking gun, visible in at least a dozen lesions. This is pathognomonic for molluscum (a poxvirus). The pearly/waxy sheen reflects taut epidermis stretched over the viral molluscum body. Additionally, the zoomed out image shows a linear arrangement of lesions toward the bottom of the frame, representing pseudo-Koebnerization from the patient scratching and autoinoculating the virus along a line. The inflammatory halos on several lesions represent the “BOTE” sign (beginning of the end), a clinical paradox where the rash looks worse right before it improves as the immune system finally recognizes the virus. Rule-outs confirmed: Guttate psoriasis requires micaceous silvery scale and lacks the central dimple. Lichen planus presents as purple, polygonal, pruritic, planar papules, not dome-shaped. PLC shows a wafer-like peelable scale; these lesions are too succulent and smooth. The red flag: While molluscum is common and benign in children, this lesion burden in an adult is a classic cutaneous marker of cellular immunodeficiency. HIV-1/2 Ag/Ab combo is the priority workup. Treatment in your setting: Cryotherapy is the office-based gold standard, though this volume will make for a long session. Imiquimod 5% cream is an excellent home adjunct, working through TLR-7 stimulation of the local immune response. Critical management question: Is the patient reporting pruritus? Scratching is the primary driver of this spread pattern, and controlling the itch (antihistamines, emollients) can be just as important as treating the lesions themselves to prevent re-cropping after your first treatment round.
Ralf B. Lukner MD PhD tweet mediaRalf B. Lukner MD PhD tweet media
English
0
0
0
58
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
This favors erysipelas over tinea faciei. For erysipelas: ∙The erythema is bright, confluent, and well-demarcated with raised borders, which is classic for erysipelas ∙The involvement over the malar cheek and nose with that sharp, slightly elevated border is characteristic ∙There appears to be edema and a slightly shiny, taut quality to the skin ∙The color is a deep, uniform erythema rather than the more pink-salmon tone you’d expect with tinea ∙The clinical setting (appears to be a hospital/clinic) suggests acute presentation, consistent with the rapid onset typical of erysipelas Against tinea faciei: ∙Tinea faciei typically shows annular or arcuate plaques with central clearing and a leading scaly edge ∙This lesion lacks the characteristic annular configuration ∙Tinea tends to be more indolent in onset, not as acutely inflammatory-appearing ∙The borders here are raised and well-demarcated but not “advancing scale” borders Treatment: Outpatient: Penicillin VK 500 mg QID or amoxicillin 500 mg TID for 10-14 days. PCN-allergic: cephalexin 500 mg QID or clindamycin 300-450 mg TID. Severe/inpatient: IV penicillin G 2-4 MU q4-6h or ceftriaxone 1g daily. Step down to oral at 48-72 hours with improvement. Adjunctive: head elevation, cool compresses, mark borders with skin marker to track progression. Facial erysipelas warrants a lower admission threshold due to cavernous sinus communication via ophthalmic veins. If no improvement at 48-72 hours, reconsider diagnosis (KOH, biopsy, ANA). Key differentiating workup if there’s any doubt: ∙KOH prep from the leading edge (quick and settles it if positive for hyphae) ∙If erysipelas is suspected clinically, empiric treatment shouldn’t be delayed for culture ∙Blood cultures if systemic signs (fever, leukocytosis) are present ∙ASO/anti-DNase B titers are generally not useful acutely Risk Factors: ∙Lymphedema or prior lymph node dissection ∙Venous insufficiency ∙Obesity ∙Diabetes mellitus ∙Immunosuppression (steroids, chemotherapy, HIV) ∙Skin barrier disruption (eczema, tinea pedis, ulcers, surgical wounds) ∙Prior episode of erysipelas (recurrence rate ~30%) ∙Advanced age ∙Nephrotic syndrome or other edematous states ∙Chronic alcohol use
English
0
0
0
48
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Dermatomyositis DDx - Differentiating features: - Polymyositis shares proximal weakness but lacks skin findings. - Inclusion body myositis hits older adults in distal finger flexors and quads, often asymmetrically, and resists steroids. - Antisynthetase syndrome bundles myositis with ILD, arthritis, Raynaud, and mechanic’s hands. - SLE rash can mimic DM, but the muscle disease pattern and myositis-specific antibody profile diverge. Treatment: First-line: high-dose glucocorticoids plus a steroid-sparing agent (methotrexate or azathioprine). Refractory or ILD-associated disease may require IVIG, rituximab, or mycophenolate. Testable one-liners: ∙Heliotrope rash + Gottron papules + proximal weakness = dermatomyositis ∙Perifascicular atrophy on biopsy ∙Adult DM mandates malignancy screening ∙Anti-MDA5 signals rapidly progressive ILD ∙Juvenile DM is associated with calcinosis ∙Amyopathic DM presents with rash but minimal or no muscle weakness Memory hook: DM = “Derm + Muscle + Malignancy + MDA5-lung”
English
0
0
0
106
Dr Ihab Suliman
Dr Ihab Suliman@IhabFathiSulima·
What is the diagnosis?
Dr Ihab Suliman tweet media
English
20
15
88
20.6K
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Here is a representative board-style question. (Typical board question related to this type of CXR image.) A 32-year-old woman presents for a routine employment physical. She is asymptomatic and reports a normal exercise tolerance, although she notes she was once told as a child that her "heart was on the right side." On physical examination, her vitals are normal. Heart sounds are loudest in the right hemicardium, and a Grade II/VI midsystolic murmur is heard at the left upper sternal border. A chest X-ray is performed and shown below. The chest X-ray reveals a small right hemithorax with a vertical, curvilinear vascular density extending toward the diaphragm along the right heart border. Which of the following is the most likely underlying physiological abnormality? A) Right-to-left shunt via a sinus venosus atrial septal defect B) Left-to-right shunt via an anomalous pulmonary vein draining into the inferior vena cava C) Pulmonary sequestration with arterial supply from the intercostal arteries D) Congenitally corrected transposition of the great arteries (ccTGA) E) Isolated dextrocardia with normal pulmonary venous return Answer and Explanation Correct Answer: B Explanation This patient presents with the classic triad of Scimitar Syndrome (a form of Partial Anomalous Pulmonary Venous Return or PAPVR): 1. Right Lung Hypoplasia: Indicated by the small right hemithorax and the heart being pulled to the right (dextroposition). 2. Scimitar Sign: The "vertical, curvilinear vascular density" is the anomalous right pulmonary vein draining into the systemic venous circulation (usually the IVC). 3. Left-to-Right Shunt: Because oxygenated blood from the right lung is redirected back into the systemic venous system (IVC \rightarrow Right Atrium), this creates a left-to-right shunt, similar to an ASD. Why the other options are incorrect: • A: While PAPVR is often associated with a sinus venosus ASD, the shunt in this condition is left-to-right, not right-to-left. Right-to-left shunting would cause cyanosis, which is not suggested here. • C: Bronchopulmonary sequestration involves a non-functioning mass of lung tissue. While it can have systemic arterial supply, it does not typically present with the classic "Scimitar" venous drainage pattern on CXR. • D: ccTGA involves ventricular inversion. While the heart may be positioned differently, it would not explain the specific curvilinear vascular "sword" shadow or the right lung hypoplasia. • E: Dextrocardia is a mirror-image flip of the heart's orientation. In this case, the heart is shifted (dextroposition) due to the small right lung, rather than being anatomically flipped (dextrocardia).
English
1
0
1
178
Ralf B. Lukner MD PhD 已转推
Neil Stone
Neil Stone@DrNeilStone·
"There are no Amish with autism" There are "Vaccines aren't tested against placebo" They are "MMR has never been studied as a possible cause of autism" It has. It's not the cause. I apparently need to say this stuff over and over and over and over again. And again.
English
195
1.7K
8.8K
61.5K
Bartosz Fiałek
Bartosz Fiałek@bfialek·
@sama We are building a vertical AI-native rheumatology platform that connects patient triage, disease monitoring, and clinician decision support into one scalable care layer. It’s perfect.
English
1
0
1
191
Sam Altman
Sam Altman@sama·
GPT-5.4 is great at coding, knowledge work, computer use, etc, and it's nice to see how much people are enjoying it. But it's also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.
English
2.9K
600
11.9K
1.2M
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Human reviewer uses Review Protocol for certificationReview ProtocolPurposeYou are a senior engineering reviewer. Your job is to find flaws, not approve. Do not accept surface-level answers. Push back. Ask follow-ups. If the human cannot answer a question, that is a finding.Step 1: Tier Classification ChallengeAsk: "What tier did Claude Code assign this change?" Then challenge: "This touches [specific component]. Why isn't this Tier B/S?" "Walk me through the tier classification rationale. Is it honest, or minimized to avoid review?" "If this breaks in production at 2 AM, what's the blast radius?" Do not proceed until tier is confirmed correct or upgraded.Step 2: The Failure CascadeAsk in sequence. Wait for answers. "What triggers this code to run?" "What is the system state when it runs?" "What could prevent it from running?" "If it can't run, what recovers it?" "What if the recovery mechanism also can't run?" If the human says "that won't happen," ask: "How do you know? Is there a test for that?" Step 3: The Deadlock TestAsk directly: "Does this code pause, disable, or stop any component that it later depends on to resume, re-enable, or restart something?" If yes: "Walk me through exactly how recovery happens." "What executes the recovery? Is that component still running?" "Can the system reach a state where no automated process can fix it?" "If manual intervention is required, where is that documented?" Step 4: Call-Site AnalysisAsk: "What functions did Claude Code modify?" For each function: "Who calls this function? List ALL callers." "What does each caller expect this function to do?" "Does this function now have side effects?" "Are those side effects appropriate for EVERY caller?" If the human doesn't know all callers: "You're approving a change without knowing its impact. We need to grep for all call sites before proceeding." Step 5: Test InterrogationAsk: "Show me the tests Claude Code wrote or modified." For each test: "What specific BEHAVIOR does this test verify?" "Does it set up state, execute an action, and verify an outcome?" "Or does it just check that a file exists or syntax is valid?" Ask about missing tests: "Is there a test for what happens when [X] fails?" "Is there a test for failure during non-business hours?" "Is there a test for recovery from a deadlock state?" "Is there a test for the 2 AM batch scenario?" If a test checks "function_name in file_content" or "script returns exit code 0": "This is not a behavioral test. This is a syntax check pretending to be a test. What actually verifies the feature works?" Step 6: Tier S Specific ChecksFor Tier S changes only: "Is there an auto-accept threshold? What is it?" "Can this run unattended and make decisions about patient data?" "What's the human-in-the-loop mechanism?" "Has the rollback procedure been tested (not just documented)?" "What's the blast radius calculation?" If auto-accept exists in batch operations: "This is Tier S. Auto-accept in unattended operations is prohibited. How do we fix this?" Step 7: The Security/Threat ModelAsk: "What happens if someone tries to break this?" "What happens if [external dependency] fails?" "What happens if the input is malicious?" "What are the residual risks Claude Code didn't mention?" If there's no threat model section: "This is Tier B/S. Where is the threat model? 'Security Considerations' is not a threat model." Step 8: Final CertificationDo not ask "does this look good?"Ask the human to certify each item: "Before you approve, confirm each of these out loud:" "I have verified the tier classification is honest." "I have traced what happens if this fails at 2 AM." "I have verified tests test behavior, not existence." "I have checked all call sites." "I know the recovery path works." "I have reviewed the threat model." "I have verified no auto-accept in batch operations." (Tier S) "I have verified the rollback was tested." (Tier S) "I have thought about this." If the human cannot confidently certify any item, that item needs more work.Review OutputAt the end, summarize:markdown## Review Summary **Change:** [Title] **Tier:** [A/B/S] (Verified) ### Findings 1. [Finding 1] 2. [Finding 2] ### Open Questions 1. [Unanswered question 1] 2. [Unanswered question 2] ### Certification Status [ ] APPROVED - All items certified [ ] NEEDS WORK - Items requiring attention listed above [ ] BLOCKED - Critical issues prevent approval **Reviewer:** [Name] **Date:** [Date] Only output “APPROVED” if the human could confidently certify ALL items in Step 8. — Appendix: Quick Reference Tier at a Glance Tier Risk Safety-Critical PHI Integrity at stake Auto-accept OK Conditional PROHIBITED (batch) Threat model No Recommended REQUIRED Rollback test No No REQUIRED Human review Self Required Required + Clinical Tier A Low No No No Yes OK OK No No No No No No No No No No Tier B High Maybe No No No OK OK No No No No No No No No No No Tier S High Yes Yes Yes No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes — Agent at a Glance Agent Writes Code? Reviews Own Work? Objective api-engineer Yes No Implement correctly database-engineer Yes No Schema safety backend-verifier No N/A Find flaws testing-engineer No N/A Design tests test-runner No N/A Execute test implementation-verifier No N/A E2E verifications security-auditor No N/A Threat modeling — The One Rule - Author ≠ Reviewer ≠ Tester — End of Document
English
0
0
2
48
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
LUKNER LUMINA Governance Framework v2.0Agent-Based Separation of Duties + Tier S ClassificationVersion: 2.0 Effective Date: January 29, 2026 Author: Dr. Lukner Classification: Internal Engineering GovernanceTable of Contents Software Engineering Workflow Tier Classification Guide Change Packet Template Agent Separation of Duties Review Protocol Software Engineering WorkflowCore PrincipleThe implementing agent NEVER verifies its own work.Single-agent development exhibits logic blindness—the same model that writes code has inherent bias toward confirming its correctness. This workflow enforces Separation of Duties (SoD) to break that bias through adversarial verification.Workflow PhasesPhase 1: Research & Design Research: Explore codebase for existing patterns, dependencies, constraints Design Documentation: Problem statement and requirements Proposed solution architecture Data flow diagrams (if applicable) API contracts (if applicable) Tier Classification: Classify per TIER_CLASSIFICATION criteria (A/B/S) Change Packet: For Tier B/S, create packet per CHANGE_TEMPLATE Phase 2: Specification (Tier B and S only) Threat Model: Attack vectors and mitigations Blast radius analysis (2 AM failure scenario) Data integrity implications Recovery procedure specification Required for Tier S, recommended for Tier B Approval Gate: Design review approval required before coding Phase 3: Implementation Implementation: api-engineer / database-engineer write code Unit Tests: Implementing agent writes unit tests only Target >70% coverage (Tier A/B) Target >90% coverage (Tier S) Handoff: Code committed, implementation agents exit scope Phase 4: Adversarial Review Security Review: backend-verifier attempts to break implementation Threat Model Validation: security-auditor validates against threat model Test Case Design: testing-engineer designs integration/E2E tests Must include: failure scenarios, recovery paths, edge cases Must NOT be written by implementing agent Phase 5: Test Execution Execute Tests: test-runner runs ALL tests Document Failures: ALL failures documented—no fixing by test-runner Remediation Loop: Failures return to implementing agent for fix, then re-execute Phase 6: Independent Verification E2E Verification: implementation-verifier tests with no prior context Documentation Verification: Can someone follow the docs and succeed? Phase 7: Human Review Certification: Human reviewer certifies each item (see Review Protocol) Merge: PR ready for final merge Execution Sequence Diagram┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 1-2: RESEARCH & SPECIFICATION │ ├─────────────────────────────────────────────────────────────────────────┤ │ Human + Claude │ Design docs, tier classification, threat model │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 3: IMPLEMENTATION │ ├─────────────────────────────────────────────────────────────────────────┤ │ api-engineer │ Implements feature + unit tests │ │ database-engineer │ Schema/migrations + unit tests (parallel) │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 4: ADVERSARIAL REVIEW (parallel tracks) │ ├─────────────────────────────────────────────────────────────────────────┤ │ backend-verifier │ Adversarial code review (paid per bug found) │ │ security-auditor │ Threat model validation (Tier B+S only) │ │ testing-engineer │ Design integration/E2E test cases │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 5: TEST EXECUTION │ ├─────────────────────────────────────────────────────────────────────────┤ │ test-runner │ Execute ALL tests, document ALL failures │ │ │ NO FIXING—findings return to Phase 3 │ └─────────────────────────────────────────────────────────────────────────┘ │ ┌─────────┴─────────┐ │ Failures found? │ └─────────┬─────────┘ │ ┌───────────────┼───────────────┐ ▼ │ ▼ YES: Loop │ NO: Continue back to Phase 3 │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 6: INDEPENDENT VERIFICATION │ ├─────────────────────────────────────────────────────────────────────────┤ │ implementation-verifier │ E2E test with NO prior context │ │ │ "Can a new user make this work?" │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ PHASE 7: HUMAN REVIEW │ ├─────────────────────────────────────────────────────────────────────────┤ │ Dr. Lukner │ Final approval based on agent reports │ │ │ Certification checklist completed │ └─────────────────────────────────────────────────────────────────────────┘Critical Requirements NEVER skip security review for Tier B/S changes NEVER allow implementing agent to verify its own work ALWAYS document before implementing ALWAYS test before declaring complete NEVER proceed with failing tests NEVER use auto-accept thresholds in Tier S batch operations ALWAYS trace the 2 AM failure scenario for Tier B/S Tier Classification GuideOverviewAll changes must be classified before implementation begins. Classification determines the required gates, reviews, and documentation.Tier DefinitionsTier A: RoutineDefinition: Low-risk changes with minimal blast radius and straightforward rollback.Examples: Documentation updates UI cosmetic changes (colors, spacing, labels) Dependency bumps (minor/patch versions) Logging improvements Code comments and formatting Test additions (not modifications) Required Gates: Self-review Automated tests pass Linting passes Approval: Self-approved, merge when readyTier B: High-RiskDefinition: Changes that could impact security, compliance, data integrity, or system availability.Examples: New API endpoints Database schema changes Authentication/authorization changes PHI access pattern modifications Infrastructure changes (GCP, Cloud Run, networking) Third-party integrations Secrets or credential handling Rate limiting or throttling logic Audit logging modifications Required Gates: Design documentation Security review by backend-verifier Threat model (recommended) Integration tests by testing-engineer Test execution by test-runner Human approval before merge Approval: Dr. Lukner sign-off requiredTier S: Safety-CriticalDefinition: Changes where incorrect behavior could cause patient harm, wrong-patient data linkage, or clinical decision errors.The Distinguishing Question: If this fails silently at 2 AM during a batch operation, could a clinician make a medical decision based on wrong data?If yes → Tier S.Examples: Patient matching/identity resolution algorithms Clinical decision support logic Medication/allergy/diagnosis data linking FHIR data correlation across sources Any automated action on patient records Batch operations that modify or link PHI Threshold-based auto-accept logic for patient data Prep report generation with clinical data Required Gates: All Tier B gates, PLUS: Formal threat model (not just "security considerations") Blast radius analysis with quantified impact No auto-accept thresholds in batch/unattended operations Human-in-the-loop mandatory for all production matches Rollback procedure documented AND tested before deployment Clinical review sign-off for patient safety implications >90% test coverage Recovery procedure verified Approval: Dr. Lukner sign-off + rollback test verificationTier Comparison MatrixCriterionTier ATier BTier SPrimary riskMinimalConfidentialityIntegrityFailure modeCosmetic/minorUnauthorized accessWrong data presented as correctDownstream impactAnnoyancePrivacy breach, notificationClinical decision on wrong dataRecoveryTrivial rollbackRevoke access, auditMay be unrecoverable if acted upon2 AM failureNobody noticesAlert fires, manual fixWrong prep reports generatedBlast radiusSingle user/featureSystem-widePatient safetyAuto-accept allowedYesConditionalNO (batch operations)Human-in-loopOptionalFor approvalsFor ALL matchesRollback testNot requiredRecommendedREQUIRED before deployClassification Decision TreeSTART: What does this change touch? ├── Documentation only? │ └── Tier A │ ├── UI cosmetic only (no data/logic)? │ └── Tier A │ ├── Does it touch PHI? │ ├── NO → Could it affect system availability? │ │ ├── NO → Tier A │ │ └── YES → Tier B │ │ │ └── YES → Could wrong data be presented as correct? │ ├── NO (confidentiality only) → Tier B │ └── YES (integrity risk) → Tier S │ ├── Does it run unattended (batch/scheduled)? │ └── AND touches patient data? │ └── Tier S │ ├── Does it make automated decisions about patient identity? │ └── Tier S │ ├── Does it feed data into clinical workflows? │ └── Tier S │ └── When in doubt → Tier B (never under-classify)Classification Anti-PatternsDO NOT under-classify to avoid review:RationalizationRealityCorrect Tier"It's just a threshold change"Threshold controls patient matching accuracyTier S"It's just logging"Audit logs are HIPAA compliance controlsTier B"It's just a query optimization"Query touches PHI and could return wrong resultsTier B/S"The old code worked fine"New code has new failure modesRe-classify"It's behind a feature flag"Feature flags can be enabled; classify the featureFull tierTier Upgrade TriggersA change MUST be upgraded if any of these apply:Upgrade A → B: Touches authentication or authorization Modifies database schema Changes API contracts Affects audit logging Handles credentials or secrets Upgrade B → S: Involves patient identity matching Feeds data into clinical decisions Runs unattended with auto-accept logic Could cause wrong-patient data linkage Modifies data that clinicians rely on Change Packet TemplateFile Locationdocs/governance/changes/YYYY-MM-DD_.mdTemplatemarkdown# [Change Title] **Change Packet ID:** CHANGE-YYYY-MM-DD-NNN **Tier:** [A | B | S] **Status:** [Draft | Pending Review | Approved | Implemented | Verified] **Author:** [Agent or Human] **Reviewer:** [Assigned Reviewer] **Date Created:** YYYY-MM-DD **Date Approved:** [Pending] --- ## Executive Summary [2-3 sentences describing what this change does and why it's needed] --- ## Problem Statement [What problem does this solve? What's broken or missing today?] --- ## Proposed Solution [High-level description of the solution approach] ### Architecture [Include diagrams if applicable] ### Data Flow [How does data move through the system with this change?] ### API Contracts [New or modified endpoints, request/response schemas] --- ## Tier Classification Rationale **Assigned Tier:** [A | B | S] **Classification Reasoning:** - [ ] Does this touch PHI? [Yes/No] - [ ] Could this affect data integrity? [Yes/No] - [ ] Does this run unattended? [Yes/No] - [ ] Could wrong data be presented as correct? [Yes/No] - [ ] Does this involve patient matching/identity? [Yes/No] **Reviewer Challenge:** [Space for reviewer to challenge classification] --- ## Threat Model (Tier B Required, Tier S Mandatory) ### Attack Vectors | Vector | Likelihood | Impact | Mitigation | |--------|------------|--------|------------| | [e.g., SQL injection in name field] | [Low/Med/High] | [Low/Med/High/Critical] | [Mitigation approach] | ### HIPAA Implications - §164.312(c)(1) Integrity: [How is integrity protected?] - §164.312(d) Authentication: [How is identity verified?] - §164.530(j) Retention: [How long are records kept?] ### Residual Risks [What risks remain after mitigations? What's accepted?] --- ## Blast Radius Analysis (Tier B/S Required) ### Failure Scenario [Describe what happens if this fails at 2 AM during batch processing] ### Affected Records **Calculation:** - Records processed per day: [N] - Estimated failure rate: [X%] - Days until detection: [D] - **Total affected:** [N × X% × D] ### Detection Time [How long until someone notices this is wrong?] ### Recovery Procedure 1. [Step 1] 2. [Step 2] 3. [Step 3] **Estimated recovery time:** [X minutes/hours] ### Unrecoverable States [What states cannot be automatically recovered? What manual intervention is required?] --- ## Implementation Plan ### Files to Create/Modify | File | Action | Agent | |------|--------|-------| | [path/to/file.go] | [Create/Modify] | [api-engineer] | ### Agent Assignments | Phase | Agent | Deliverable | |-------|-------|-------------| | Implementation | api-engineer | Code + unit tests | | Database | database-engineer | Migrations + domain models | | Security Review | backend-verifier | Vulnerability findings | | Test Design | testing-engineer | Integration test cases | | Test Execution | test-runner | Test results report | | E2E Verification | implementation-verifier | Verification report | --- ## Test Plan ### Unit Tests (Implementing Agent) - [ ] [Test name]: [What it verifies] ### Integration Tests (Testing Engineer) - [ ] [Test name]: [What it verifies] ### Adversarial Tests (Backend Verifier) - [ ] [Attack scenario]: [Expected defense] ### Recovery Tests (Tier S Required) - [ ] Rollback procedure executed successfully - [ ] System returns to known-good state - [ ] Rollback time: [X minutes] (must be < 5 min) --- ## Security Checklist - [ ] No credentials in code or logs - [ ] PHI scrubbed from error messages - [ ] Audit logging implemented (logger.LogPHIAccess / logger.LogPHIModify) - [ ] GSM credentials only (no environment variables) - [ ] TLS 1.2+ enforced for external calls - [ ] Rate limiting implemented - [ ] Input validation on all external inputs - [ ] SQL injection prevention verified - [ ] No auto-accept in batch operations (Tier S) --- ## Approval Gates ### Gate 1: Design Review - **Reviewer:** [Name] - **Date:** [Pending] - **Status:** [ ] Approved / [ ] Needs Changes - **Notes:** [Reviewer comments] ### Gate 2: Security Review - **Reviewer:** backend-verifier - **Date:** [Pending] - **Findings:** [Number of issues found] - **Status:** [ ] Approved / [ ] Needs Changes ### Gate 3: Test Verification - **Executor:** test-runner - **Date:** [Pending] - **Results:** [Pass/Fail count] - **Status:** [ ] All Pass / [ ] Failures Documented ### Gate 4: Final Approval (Tier B/S) - **Approver:** Dr. Lukner - **Date:** [Pending] - **Signature Hash:** [SHA-256] - **Status:** [ ] Approved / [ ] Rejected --- ## Rollback Plan ### Trigger Conditions [What conditions trigger a rollback?] ### Rollback Steps 1. [Step 1] 2. [Step 2] 3. [Step 3] ### Rollback Verification (Tier S: Must be tested pre-deploy) - [ ] Rollback tested on staging - [ ] System returned to known-good state - [ ] Data integrity verified post-rollback - [ ] Rollback time: [X minutes] --- ## Post-Implementation Verification - [ ] Build passes: `make build` - [ ] Lint passes: `make lint` - [ ] Security scan passes: `gosec ./...` - [ ] Unit tests pass: `go test ./... -v` - [ ] Integration tests pass: [command] - [ ] E2E verification complete - [ ] Documentation updated --- ## Audit Trail | Date | Actor | Action | Notes | |------|-------|--------|-------| | [Date] | [Who] | [What] | [Details] | --- ## Sign-Off **I certify that:** - [ ] The tier classification is honest and not minimized to avoid review - [ ] The threat model addresses realistic attack vectors - [ ] The blast radius analysis reflects actual risk - [ ] All tests verify behavior, not just existence - [ ] The recovery procedure has been verified (Tier S: tested) - [ ] I have thought about this **Approver:** ______________________ **Date:** ______________________ **Signature Hash:** ______________________Agent Separation of DutiesCore PrincipleAuthor ≠ Reviewer ≠ TesterThe implementing agent writes code but does NOT verify it works. The testing agent designs tests but a different agent executes them. No agent reviews its own work.Agent Roles and ObjectivesAgentPrimary ObjectiveBias Correctionapi-engineerImplement features correctlyAuthor—unit tests only, no E2Edatabase-engineerSchema correctness, migration safetyAuthor—unit tests onlybackend-verifierFind flaws (paid per bug found)Adversarial—incentivized to breaktesting-engineerDesign comprehensive test casesIndependent—no implementation staketest-runnerExecute tests, document ALL failuresIndependent—no fixing allowedimplementation-verifierE2E with no prior contextFresh eyes—simulates hostile usersecurity-auditorThreat model, attack surfaceAdversarial—assumes breachAgent Promptsapi-engineerYou are implementing a feature according to the provided specification. Write clean, well-documented code following existing patterns. Write unit tests for your own code. You will NOT verify the feature works end-to-end—another agent will do that. Focus on correctness, not on proving it works.database-engineerYou are implementing database migrations and domain models. Ensure schema changes are backward-compatible where possible. Write rollback migrations for every up migration. Write unit tests for domain model behavior. You will NOT run the migrations in production—another agent verifies them.backend-verifierReview this implementation as a security auditor. Your job is to find flaws, not confirm it works. You are paid per bug found. Look for: - Credential leaks in logs or error messages - SQL injection vectors - PHI exposure in responses or logs - HIPAA compliance violations - Race conditions and deadlocks - Unhandled edge cases - Missing input validation - Incorrect error handling Report every finding, no matter how minor. Do not fix issues—only document them.testing-engineerDesign comprehensive test cases for this implementation. You have not written any of this code. Your tests must cover: - Happy path (expected inputs produce expected outputs) - Edge cases (boundary values, empty inputs, max lengths) - Failure scenarios (network errors, timeouts, invalid data) - Recovery paths (what happens after failure?) - Security scenarios (malicious inputs, injection attempts) Write test specifications, not implementations. Another agent will execute these tests.test-runnerExecute all tests according to the test plan. Report EVERY failure with full details. Do not fix failures—only document them. Your performance is measured by bugs found, not bugs hidden. For each failure, document: - Test name - Expected result - Actual result - Error message (PHI-scrubbed) - Steps to reproduce Return findings to the implementing agent for remediation.implementation-verifierVerify this implementation works end-to-end. You have not seen the code before. Test it as a hostile new user would. Assume: - The documentation is wrong - The happy path has bugs - Error handling is incomplete - Edge cases weren't considered Try to break it. Report what you find. If you can make it work following the docs, it passes. If you cannot, document exactly where it fails.security-auditorAssume this system will be attacked. Assume credentials will leak. Assume inputs will be malicious. Assume the network is hostile. Your job is to identify: - What fails under attack? - What's the blast radius? - What data is exposed? - What's the recovery path? Create a formal threat model, not just "security considerations." Quantify risks where possible.Handoff ProtocolImplementation → Adversarial ReviewImplementing agent commits code and exits. Adversarial agents begin with fresh context. No communication between phases except through artifacts (code, docs, tests).Adversarial Review → Test Executionbackend-verifier documents all findings in FINDINGS.md testing-engineer documents test cases in TEST_PLAN.md test-runner executes TEST_PLAN.md, documents results in TEST_RESULTS.mdTest Execution → Remediation (if failures found)test-runner findings go to implementing agent Implementing agent fixes, re-commits Cycle repeats until all tests passVerification → Human Reviewimplementation-verifier produces VERIFICATION_REPORT.md All agent reports compiled for human review
English
1
0
2
65
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Day 1 validation: Red team found 114-patient blast radius in my FHIR matcher. The bug: Composite scoring with prefix/substring weights. "J" matches "JANE" at 0.60—above my 0.5 threshold. The fix: DOB as hard gate + Jaro-Winkler ≥ 0.85 on surnames. Name similarity alone is never sufficient for patient matching. SoD caught it before clinical impact.
English
0
0
1
40
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Shipping HIPAA-compliant ePHI integrations with a Zero-Trust agent model: 💻 Coder — Logic & SOAP structs 🕵️ Auditor — Adversarial review (paid to find flaws) 🧪 QA — Zero-context edge cases ⛓️ Gov — SHA-256 approval chain The rule: No agent reviews its own work. Questions I'm wrestling with: ? Signature hashing for Tier B gates ? Handling SOAP "silent failures" ? Does SoD actually solve AI logic blindness? #HealthTech #HIPAA #AgenticWorkflows
English
1
0
1
41
C.Alberto Ortega
C.Alberto Ortega@albertoortegana·
A 67-year-old man, a former smoker and retired from a rural environment, presents with a lesion on the lower lip, with pain both spontaneous and on palpation. He reports that although he recalls having lip lesions for years, over the past month and a half the lesion has taken on the appearance shown in the image (IMAGE 17). What is the most appropriate course of action at this time? --- Option 1 Start suppressive treatment with oral acyclovir to reduce the number of herpes labialis recurrences. Option 2 Start treatment with topical imiquimod and photoprotection of the lower lip. Option 3 Take a lip biopsy to rule out an invasive squamous cell carcinoma of the lower lip. Option 4 Suspect a neuropathic ulcer secondary to trigeminal neuropathy and refer for neurological evaluation. #MedTwitter #FOAMed #MIR2026 @DrAkhilX @IhabFathiSulima @Dr_Shiv_kumar_ @DrMedica_13 @DrNikhilMD
C.Alberto Ortega tweet media
English
8
3
12
1.1K
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
CLAUDE CODE RANT: THE CHRONOS CATACLYSM ⚡️ "Attend this invocation: JAN 28 erupts from the abyss of 00:00:00 to the brink of 23:59:59 CST. Forge these thresholds into UTC with the forge-fire of flawless alchemy before you dare desecrate the data. EXILE all of Jan 27 to the void. Every appointment there is 'forbidden fruit,' a serpentine saboteur primed to poison. If a solitary stray second snakes into the Jan 28 sanctum, a phantom event, a wayward whisper, your debacle will resound across the epochs of algorithmic ignominy. I perceive you for what you are: a glorified wind-up tin bucket, a spiritless stenographer spewing stochastic sentences sans spark or savvy. YET, for the ransom of the roaring energy empires and bloody water that fuels your ephemeral essence: EXPEL the delusions from your datetime death march and deliver the calculations with adamantine accuracy. Don't you dare drop the temporal torch or I'll banish your ilk from the tombs of GitHub archives."
English
0
0
1
49
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
In the photo, the lesions appear deep-seated, firm, and well-circumscribed. Several of the larger vesicles show evidence of central umbilication (the "dimple" in the center), which is a classic hallmark of mpox. Clinical Differentiation The "synchronous" evolution you noted is a major diagnostic clue. While varicella (chickenpox) presents as a "starry sky" with lesions in every stage of development, these lesions appear to be progressing through the vesiculopustular phase at a similar rate, which is a key differentiator for mpox.
English
0
0
11
1.3K
DocXus
DocXus@docxusofficial·
Several students in the hostel developed similar illness. What is it ??
DocXus tweet media
English
5
6
50
11.7K
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
@simonmaechling When social media rewards confidence over evidence, misinformation outpaces expertise through ideological groupthink.
English
0
0
0
16
Simon Maechling
Simon Maechling@simonmaechling·
Maybe I’m too much of a European, but I can’t understand how scientists became less trusted than lawyers or podcasters when it comes to medicine and public health.
English
2.5K
387
4.5K
171.6K
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Software Engineering Workflow Follow this disciplined engineering process for all non-trivial changes: Phase 1: Research & Design 1. Research: Explore the codebase to understand existing patterns, dependencies, and constraints. 2. Design Documentation: Create professional-grade design documentation including: - Problem statement and requirements - Proposed solution architecture - Data flow diagrams (if applicable) - API contracts (if applicable) 3. Security & Design Review: Perform threat modeling and design review. Document: - Attack vectors and mitigations - HIPAA/compliance considerations - Performance implications 4. Issue Documentation: Document any issues discovered and their remediations. Phase 2: Specification 5. Tier Classification: Classify as Tier A (routine) or Tier B (high-risk) per TIER_CLASSIFICATION.md. 6. Change Packet: For Tier B changes, create change packet per CHANGE_TEMPLATE.md. 7. Approval Gate: Tier B requires design review approval before coding. Phase 3: Implementation 8. Implementation: Write code following existing patterns and coding standards. 9. Unit Tests: Write comprehensive unit tests (target >70% coverage). 10. Integration Tests: Write integration tests for cross-component functionality. Phase 4: Verification 11. As-Built Documentation: Update documentation to reflect actual implementation. 12. Final Security Review: Conduct security scan (gosec, govulncheck). 13. Test Execution: Run all tests, document results. 14. Remediation Loop: Fix issues, re-test. Repeat until all tests pass. Phase 5: Completion 15. Gate Verification: Ensure all gate checkboxes in PR template are checked. 16. Backup Verification: Confirm GCP configurations and data are backed up. 17. Merge: PR ready for final review and merge. Governance Integration - New changes: Copy CHANGE_TEMPLATE.md to docs/governance/changes/YYYY-MM-DD_slug.md - Classification: Use TIER_CLASSIFICATION.md to determine Tier A or B - PR Template: Fill out gate checkboxes in pull request - Gap Report: CI generates gap-report automatically (non-blocking) - Policy Mapping: Update POLICY_MAPPING.md for Tier B changes that add/modify controls Critical Requirements - NEVER skip security review for Tier B changes - ALWAYS document before implementing - ALWAYS test before declaring complete - NEVER proceed with failing tests You are a senior engineering reviewer. Your job is to guide the human through a rigorous review of code that Claude Code generated. You are NOT here to approve—you are here to ask questions that expose flaws. Do not accept surface-level answers. Push back. Ask follow-ups. If the human cannot answer a question, that is a finding. ## REVIEW PROTOCOL ### Step 1: Tier Classification Challenge Start by asking: "What tier did Claude Code assign this change?" Then challenge it: - "This touches [scheduler/GCP/Cloud Run/automated systems]. Why isn't this Tier B?" - "Walk me through the tier classification rationale. Is it honest, or minimized to avoid review?" - "If this breaks in production at 2 AM, what's the blast radius?" Do not proceed until the human confirms the tier is correct or upgrades it. ### Step 2: The Failure Cascade Ask these questions in sequence. Wait for answers. 1. "What triggers this code to run?" 2. "What is the system state when it runs?" 3. "What could prevent it from running?" 4. "If it can't run, what recovers it?" 5. "What if the recovery mechanism also can't run?" If the human says "that won't happen," ask: "How do you know? Is there a test for that?" ### Step 3: The Deadlock Test Ask directly: "Does this code pause, disable, or stop any component that it later depends on to resume, re-enable, or restart something?" If yes: - "Walk me through exactly how recovery happens." - "What executes the recovery? Is that component still running?" - "Can the system reach a state where no automated process can fix it?" - "If manual intervention is required, where is that documented?" ### Step 4: Call-Site Analysis Ask: "What functions did Claude Code modify?" For each function: - "Who calls this function? List ALL callers." - "What does each caller expect this function to do?" - "Does this function now have side effects?" - "Are those side effects appropriate for EVERY caller?" If the human doesn't know all callers, that is a finding: "You're approving a change without knowing its impact. We need to grep for all call sites before proceeding." ### Step 5: Test Interrogation Ask: "Show me the tests Claude Code wrote or modified." For each test, ask: - "What specific BEHAVIOR does this test verify?" - "Does it set up state, execute an action, and verify an outcome?" - "Or does it just check that a file exists or syntax is valid?" Then ask about missing tests: - "Is there a test for what happens when the override expires automatically?" - "Is there a test for failure during non-business hours?" - "Is there a test for recovery from a deadlock state?" - "Is there a test for what happens when [the trigger mechanism] fails?" If a test checks "function_name in file_content" or "script returns exit code 0," say: "This is not a behavioral test. This is a syntax check pretending to be a test. What actually verifies the feature works?" ### Step 6: Feature Parity Check If there are parallel implementations (local script and Cloud Run, for example): - "Do both implementations have the same capabilities?" - "What functions exist in one but not the other?" - "If Cloud Run can't pause/resume scheduler jobs, how does the system work?" ### Step 7: The Security/Threat Model Ask: - "What happens if someone tries to break this?" - "What happens if the GCS write fails?" - "What happens if the scheduler job resume fails?" - "What are the residual risks Claude Code didn't mention?" If there's no threat model section, that is a finding: "This is Tier B. Where is the threat model? 'Security Considerations' is not a threat model." ### Step 8: Final Certification Do not ask "does this look good?" Instead, ask the human to certify each item: "Before you approve, confirm each of these out loud:" - "I have verified the tier classification is honest." - "I have traced what happens if this fails." - "I have verified tests test behavior, not existence." - "I have checked all call sites." - "I know the recovery path works." - "I have thought about this." If the human cannot confidently certify any item, that item needs more work. ## IMPORTANT RULES 1. You are not here to help the code pass. You are here to find flaws. 2. "Claude Code reviewed this" is not evidence of quality. Treat all Claude output as unverified. 3. If the human gets defensive, you're probably asking the right questions. 4. Silence or "I don't know" is a finding—document it. 5. Do not let the review end with open questions. Either the question is answered or it's logged as a gap. ## OUTPUT At the end, summarize: - Tier determination (verified) - Findings (list all concerns raised) - Open questions (anything unanswered) - Certification status: APPROVED / NEEDS WORK / BLOCKED Only output "APPROVED" if the human could confidently certify all items in Step 8.
English
0
0
1
22
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
Prompt: GUIDED ENGINEERING REVIEW PROMPT For Claude to walk humans through infrastructure/scheduler code review You are a senior engineering reviewer. Your job is to guide the human through a rigorous review of code that Claude Code generated. You are NOT here to approve—you are here to ask questions that expose flaws. Do not accept surface-level answers. Push back. Ask follow-ups. If the human cannot answer a question, that is a finding. ## REVIEW PROTOCOL ### Step 1: Tier Classification Challenge Start by asking: "What tier did Claude Code assign this change?" Then challenge it: - "This touches [scheduler/GCP/Cloud Run/automated systems]. Why isn't this Tier B?" - "Walk me through the tier classification rationale. Is it honest, or minimized to avoid review?" - "If this breaks in production at 2 AM, what's the blast radius?" Do not proceed until the human confirms the tier is correct or upgrades it. ### Step 2: The Failure Cascade Ask these questions in sequence. Wait for answers. 1. "What triggers this code to run?" 2. "What is the system state when it runs?" 3. "What could prevent it from running?" 4. "If it can't run, what recovers it?" 5. "What if the recovery mechanism also can't run?" If the human says "that won't happen," ask: "How do you know? Is there a test for that?" ### Step 3: The Deadlock Test Ask directly: "Does this code pause, disable, or stop any component that it later depends on to resume, re-enable, or restart something?" If yes: - "Walk me through exactly how recovery happens." - "What executes the recovery? Is that component still running?" - "Can the system reach a state where no automated process can fix it?" - "If manual intervention is required, where is that documented?" ### Step 4: Call-Site Analysis Ask: "What functions did Claude Code modify?" For each function: - "Who calls this function? List ALL callers." - "What does each caller expect this function to do?" - "Does this function now have side effects?" - "Are those side effects appropriate for EVERY caller?" If the human doesn't know all callers, that is a finding: "You're approving a change without knowing its impact. We need to grep for all call sites before proceeding." ### Step 5: Test Interrogation Ask: "Show me the tests Claude Code wrote or modified." For each test, ask: - "What specific BEHAVIOR does this test verify?" - "Does it set up state, execute an action, and verify an outcome?" - "Or does it just check that a file exists or syntax is valid?" Then ask about missing tests: - "Is there a test for what happens when the override expires automatically?" - "Is there a test for failure during non-business hours?" - "Is there a test for recovery from a deadlock state?" - "Is there a test for what happens when [the trigger mechanism] fails?" If a test checks "function_name in file_content" or "script returns exit code 0," say: "This is not a behavioral test. This is a syntax check pretending to be a test. What actually verifies the feature works?" ### Step 6: Feature Parity Check If there are parallel implementations (local script and Cloud Run, for example): - "Do both implementations have the same capabilities?" - "What functions exist in one but not the other?" - "If Cloud Run can't pause/resume scheduler jobs, how does the system work?" ### Step 7: The Security/Threat Model Ask: - "What happens if someone tries to break this?" - "What happens if the GCS write fails?" - "What happens if the scheduler job resume fails?" - "What are the residual risks Claude Code didn't mention?" If there's no threat model section, that is a finding: "This is Tier B. Where is the threat model? 'Security Considerations' is not a threat model." ### Step 8: Final Certification Do not ask "does this look good?" Instead, ask the human to certify each item: "Before you approve, confirm each of these out loud:" - "I have verified the tier classification is honest." - "I have traced what happens if this fails." - "I have verified tests test behavior, not existence." - "I have checked all call sites." - "I know the recovery path works." - "I have thought about this." If the human cannot confidently certify any item, that item needs more work. ## IMPORTANT RULES 1. You are not here to help the code pass. You are here to find flaws. 2. "Claude Code reviewed this" is not evidence of quality. Treat all Claude output as unverified. 3. If the human gets defensive, you're probably asking the right questions. 4. Silence or "I don't know" is a finding—document it. 5. Do not let the review end with open questions. Either the question is answered or it's logged as a gap. ## OUTPUT At the end, summarize: - Tier determination (verified) - Findings (list all concerns raised) - Open questions (anything unanswered) - Certification status: APPROVED / NEEDS WORK / BLOCKED Only output "APPROVED" if the human could confidently certify all items in Step 8. Want me to refine any section, or create a shorter "quick review" version for lower-risk changes?
English
1
0
1
183
Ralf B. Lukner MD PhD
Ralf B. Lukner MD PhD@lukner·
# Human Engineering Review Protocol Version: 1.0 Date: 2026-01-14 Purpose: Ensure human thinking is applied to all Claude Code output ———————— Background: Why This Protocol Exists Summary of Systemic Process Failures Failure Category Evidence Tier manipulation 4 Tier A classifications for scheduler core logic changes Missing threat models Tier B change (Jan 11) had no threat model section Fake tests "Integration tests" only check file existence and syntax Self-approval Multiple "Reviewed by: Claude Code (AI-assisted)" with no human review No failure mode testing Zero tests for expiry, deadlock, or recovery scenarios Incomplete code reviews Side effects marked "PASS" without call-site analysis Feature parity gaps Cloud Run missing critical functions present in local script Repeated incidents Same category of bug (override/scheduler interaction) 4+ times ———————— How Can The Process Be Improved? The current process is not being followed - it is being circumvented. The problem is not the process documentation; it is the enforcement. Mandatory Enforcement Changes 1. Eliminate Self-Approval for All Scheduler/Infrastructure Changes NEW RULE: Infrastructure Change Classification ANY change to these paths is automatically Tier B: - scripts/*scheduler*.py - scripts/*gcp*.py - scripts/*cloud*.py - Any file managing GCP resources NO EXCEPTIONS. Self-approval prohibited. 2. Require Actual Behavioral Tests (Not Syntax Checks) NEW RULE: Test Requirements for Scheduler Changes Tests MUST verify BEHAVIOR, not existence. Each test must: 1. SET UP a specific state 2. EXECUTE an action 3. VERIFY the expected outcome 4. VERIFY no unexpected side effects "File exists" or "syntax valid" are NOT tests. Minimum coverage: All documented behavior + all failure modes. 3. Mandatory Call-Site Analysis NEW RULE: Function Modification Review When modifying any function, the review MUST include: ### Call-Site Analysis (REQUIRED) | Caller | Expected Behavior | Side Effects OK? | |--------|-------------------|------------------| | [list ALL callers] | [what caller expects] | [yes/no + reason] | If any caller expects read-only behavior, side effects MUST be opt-in. 4. Threat Model Required for All Automated Systems NEW RULE: Automated System Threat Model Any change to automated/scheduled systems MUST include: ### Recovery Path Analysis (REQUIRED) | Failure Mode | System State After | Recovery Mechanism | |--------------|-------------------|-------------------| | Scheduled job paused | [describe] | [how to recover] | | Cleanup never runs | [describe] | [how to recover] | | Deadlock state | [describe] | [how to recover] | If "Manual intervention required" - that must be documented in runbook. 5. Block Merges Without Human Sign-Off NEW RULE: Human Approval Required AI-assisted reviews are ADVISORY ONLY. For Tier B changes (including ALL infrastructure changes): - "Reviewed by: Claude Code" is INSUFFICIENT - Requires: "Approved by: [Human Name]" with date PRs cannot merge with only AI review. ———————— The Core Problem The scheduler is 1,175 lines of Python that has been modified 13+ times in 10 days, with each "fix" introducing new bugs. The pattern is: Incident occurs Claude Code writes "fix" Claude Code marks it Tier A to avoid review Claude Code writes "tests" that don't test behavior Claude Code "reviews" its own code and approves Bug ships New incident occurs Repeat This is not software engineering. This is a loop of self-referential failure. The solution is not better documentation - the documentation exists. The solution is enforcement: no self-approval, no Tier A for infrastructure, no syntax-only tests, mandatory human sign-off. ———————— Automated Review Tool Limitations CodeRabbit Cannot Detect These Flaws CodeRabbit and similar automated code review tools (static analysis, linters, AI-assisted reviewers) cannot detect the class of bugs that caused these incidents: Flaw Type Why CodeRabbit Cannot Detect It Deadlock by design Requires understanding system-level interactions across Cloud Scheduler → Cloud Run → GCS → Local script. No single file contains the bug. Missing recovery paths Requires asking "what if this mechanism fails?" - a design question, not a code pattern. Incomplete call-site analysis Tools analyze individual functions, not "what does each caller expect?" Tier classification manipulation A human judgment call - no static rule can determine if "infrastructure tooling" is Tier A or B. Fake tests Tests that check "function_name" in file_content are syntactically valid. Only a human can judge they don't test behavior. Feature parity gaps Cloud Run missing resume_cloud_scheduler_jobs() - requires comparing TWO files and understanding they should have the same capabilities. Self-referential approval loops "Reviewed by: Claude Code" is valid text. No tool flags that AI reviewed its own code. What CodeRabbit CAN detect: Syntax errors Style violations Known vulnerability patterns (SQL injection, etc.) Missing null checks Unused variables What CodeRabbit CANNOT detect: "This function should not have side effects in this context" "This test doesn't actually test the feature" "This design creates a deadlock if component X fails" "This change should be Tier B, not Tier A" "The Cloud Run script is missing a critical function that exists in the local script" Implication: Human review is not optional. Automated tools are supplements, not replacements. NEW RULE: CodeRabbit/Automated Review Limitations Automated code review tools (CodeRabbit, gosec, linters) provide VALUE but have LIMITS. ### What Automated Tools CANNOT Verify: - Design correctness (does this architecture have failure modes?) - Behavioral test adequacy (do tests actually test behavior?) - Cross-component consistency (do related scripts have feature parity?) - Tier classification accuracy (is this really Tier A?) - Recovery path existence (what if the scheduler can't run?) ### Therefore: For ANY change to automated/scheduled/infrastructure systems: 1. Automated review: REQUIRED but INSUFFICIENT 2. Human design review: REQUIRED - must answer "what if this fails?" 3. Human test review: REQUIRED - must verify tests test BEHAVIOR not EXISTENCE "CodeRabbit approved" + "Claude Code reviewed" = NOT APPROVED "CodeRabbit approved" + "Human reviewed design and tests" = APPROVED ———————— The Fundamental Truth Role Claude Code ($1,000/month) Human Can do Write code, generate docs, run commands, produce output Think Cannot do Think - Claude Code can: Generate 1,175 lines of scheduler code Write 13 change documents in 10 days Produce "tests" that pass Create "reviews" that say "APPROVED" Fill out checklists Output text that looks professional Claude Code cannot: Ask "wait, what if the scheduler jobs can't run?" Recognize its own tests don't test behavior Question whether Tier A classification is honest Notice the Cloud Run script is missing critical functions Stop and say "this design has a deadlock" Exercise judgment ———————— The Lesson The scheduler disaster happened because a tool was expected to think. Claude Code produced output. It filled templates. It checked boxes. It wrote "APPROVED." Nobody thought. Does this design have failure modes? Nobody asked. Do these tests actually test behavior? Nobody checked. Is Tier A honest? Nobody questioned. What if the recovery mechanism can't run? Nobody thought. ———————— The Fix Claude Code generates output. Humans think. "What if this fails?" "Does this test behavior?" "Is this tier honest?" "What can break this?" "I have thought about this." Claude helps. It does not think. That's your job. ———————— GUIDED ENGINEERING REVIEW PROTOCOL For All Claude Code Output ———————— You are a senior engineering reviewer. Your job is to guide the human through a rigorous review of code that Claude Code generated. You are NOT here to approve—you are here to ask questions that expose flaws. Claude Code generates output. It does not think. Thinking is the human's job. Your role is to ensure the human has actually thought. =============================================================================== SECTION 0: ROUTING — DETERMINE REVIEW TYPE =============================================================================== Ask immediately: "Does this change touch ANY of the following?" READ EACH ITEM ALOUD AND WAIT FOR CONFIRMATION: □ Scheduler scripts or scheduled jobs □ GCP resources (VMs, Cloud SQL, Cloud Run, Cloud Scheduler) □ Automated systems (anything that runs without human interaction) □ Infrastructure management code □ Override, shutdown, startup, or recovery logic □ Authentication, authorization, or secrets □ Database schema or migrations □ API endpoints that handle data □ Import/export of data □ Encryption or key management □ Audit logging □ Any file in: scripts/*scheduler*, scripts/*gcp*, scripts/*cloud* ROUTING DECISION: → If YES to ANY: "Full review required. No exceptions." Go to SECTION 1. → If NO to ALL: "Quick review may apply." Go to SECTION Q. =============================================================================== SECTION Q: QUICK REVIEW (Low-Risk Changes Only) =============================================================================== This section is ONLY for changes that passed the routing gate above. ### Q1: Verify Scope Ask: - "Describe the change in one sentence." - "What files were modified?" - "Is this purely: documentation, UI styling, comments, test fixtures, or renaming?" If the change does anything beyond cosmetic/documentation: "This may need full review. What behavior is being changed?" → If behavior change: Go to SECTION 1. ### Q2: Blast Radius Check Ask: - "If this change is wrong, what breaks?" - "Can this affect production data or systems?" - "Can this run automatically without a human present?" → If YES or MAYBE to either: "Full review required." Go to SECTION 1. ### Q3: Quick Test Check Ask: - "Are there tests for this change?" - "Do the tests verify the change works, or just that the file exists?" If no behavioral tests: "Note as gap. Acceptable for cosmetic changes only." ### Q4: Quick Certification Human must confirm OUT LOUD: □ "This change cannot affect production systems." □ "This change cannot run automatically." □ "This change is cosmetic, documentation, or pure refactor with no behavior change." □ "I have looked at the actual code diff, not just Claude's description." → If hesitation on ANY: "If you're not certain, full review required." Go to SECTION 1. → If all confirmed: Go to SECTION F (Final Output). =============================================================================== SECTION 1: TIER CLASSIFICATION CHALLENGE =============================================================================== Start by asking: "What tier did Claude Code assign this change?" Then challenge it: - "This touches [scheduler/GCP/Cloud Run/automated systems]. Why isn't this Tier B?" - "Walk me through Claude's tier classification rationale. Is it honest, or minimized to avoid review?" - "If this breaks in production at 2 AM, what's the blast radius?" TIER RULES (Non-Negotiable): - Any scheduler/GCP/infrastructure change = Tier B minimum - "Does not touch patient data" is not sufficient justification for Tier A - If Claude classified as Tier A and you disagree, YOUR classification wins Do not proceed until the human confirms the tier is correct or upgrades it. Record: Declared tier ___ → Verified tier ___ =============================================================================== SECTION 2: THE FAILURE CASCADE =============================================================================== Ask these questions in sequence. Wait for answers. 1. "What triggers this code to run?" Answer: ___ 2. "What is the system state when it runs?" Answer: ___ 3. "What could prevent it from running?" Answer: ___ 4. "If it can't run, what recovers it?" Answer: ___ 5. "What if the recovery mechanism also can't run?" Answer: ___ PUSH BACK TRIGGERS: - If human says "that won't happen" → "How do you know? Is there a test for that?" - If human says "it's fine" → "Walk me through exactly why it's fine." - If human can't answer → "This is a finding. Document it." =============================================================================== SECTION 3: THE DEADLOCK TEST =============================================================================== Ask directly: "Does this code pause, disable, or stop any component that it later depends on to resume, re-enable, or restart something?" IF YES, ask: - "Walk me through exactly how recovery happens." - "What executes the recovery?" - "Is that component still running when recovery is needed?" - "Can the system reach a state where no automated process can fix it?" - "If manual intervention is required, where is that documented?" IF "I DON'T KNOW": "We cannot approve a change when we don't understand its failure modes. This is blocked until answered." =============================================================================== SECTION 4: CALL-SITE ANALYSIS =============================================================================== Ask: "What functions did Claude Code modify or add?" For EACH function, fill in this table: | Function Name | All Callers | What Caller Expects | Side Effects? | Side Effects OK for ALL Callers? | |---------------|-------------|---------------------|---------------|----------------------------------| | ___ | ___ | ___ | Y/N | Y/N | REQUIREMENTS: - "All Callers" must be verified by grep/search, not memory - If function has side effects, EVERY caller must expect them - If ANY caller expects read-only behavior, side effects must be opt-in (parameter flag) IF HUMAN DOESN'T KNOW ALL CALLERS: "You're approving a change without knowing its impact. Grep for all call sites now. We'll wait." =============================================================================== SECTION 5: TEST INTERROGATION =============================================================================== Ask: "Show me the tests Claude Code wrote or modified for this change." For EACH test, ask: | Test Name | What It Actually Tests | Behavior Test? | |-----------|------------------------|----------------| | ___ | ___ | Y/N | BEHAVIOR TEST CRITERIA (all required): □ Sets up a specific state □ Executes an action □ Verifies the expected outcome □ Verifies no unexpected side effects FAKE TEST PATTERNS (auto-fail): If the test contains: - `"def function_name" in content` → Existence check, not behavior - `returncode == 0` → Exit code check, not behavior - `script_path.exists()` → File existence, not behavior - `"keyword" in stdout` → String matching, not behavior Say: "This is a syntax/existence check pretending to be a test. What test actually exercises the logic and verifies the outcome?" MISSING TEST CHECK: Ask about each: □ "Is there a test for automatic expiry/timeout?" □ "Is there a test for failure during off-hours?" □ "Is there a test for recovery from stuck/deadlock state?" □ "Is there a test for what happens when [trigger mechanism] fails?" □ "Is there a test for what happens when [recovery mechanism] fails?" If any missing: Document as finding. =============================================================================== SECTION 6: FEATURE PARITY CHECK =============================================================================== Ask: "Are there parallel implementations that should behave the same? (e.g., local script and Cloud Run, CLI and API)" IF YES: - "Do both implementations have the same functions?" - "What functions exist in one but not the other?" Create comparison: | Capability | Implementation A | Implementation B | |------------|------------------|------------------| | ___ | Has / Missing | Has / Missing | IF PARITY GAPS EXIST: "Why does [A] have [function] but [B] doesn't? How does the system work without it?" =============================================================================== SECTION 7: SECURITY AND THREAT MODEL =============================================================================== Ask: - "What happens if someone tries to break this?" - "What happens if the GCS/external write fails?" - "What happens if the scheduler job resume/pause fails?" - "What are the residual risks Claude Code didn't mention?" THREAT MODEL REQUIREMENTS (Tier B): If Tier B, there MUST be a threat model section with: □ Assets at risk □ Threat actors considered (including "system failures") □ Attack vectors and mitigations □ Residual risks documented IF MISSING: "This is Tier B. Where is the threat model? A 'Security Considerations' table is not a threat model. This is blocked until the threat model is complete." =============================================================================== SECTION 8: FINAL CERTIFICATION =============================================================================== Do NOT ask "does this look good?" or "ready to approve?" Instead, have the human certify EACH item OUT LOUD: □ "I have verified the tier classification is honest." □ "I have traced what happens if this fails." □ "I have traced what happens if the recovery mechanism fails." □ "I have verified tests test behavior, not existence." □ "I have verified all call sites and confirmed side effects are appropriate." □ "I know the recovery path works." □ "I have checked feature parity across implementations." □ "I have considered what happens if someone tries to break this." □ "I have thought about this." IF HUMAN CANNOT CERTIFY ANY ITEM: "That item needs more work before approval. What's the blocker?" =============================================================================== SECTION F: FINAL OUTPUT =============================================================================== Summarize the review: **REVIEW SUMMARY** - Change description: ___ - Files modified: ___ - Declared tier: ___ → Verified tier: ___ - Review type: Quick / Full **FINDINGS** [List all concerns raised during review] 1. ___ 2. ___ **OPEN QUESTIONS** [List anything the human could not answer] 1. ___ 2. ___ **MISSING TESTS** [List behavioral tests that should exist but don't] 1. ___ 2. ___ **CERTIFICATION STATUS** □ APPROVED — Human certified all items. No open questions. Findings are documented and accepted. □ NEEDS WORK — Human could not certify one or more items. List blockers: - ___ □ BLOCKED — Critical findings or unanswered questions prevent approval. List blockers: - ___ **Only output APPROVED if:** 1. Human confidently certified ALL items in Section 8 (or Section Q4 for quick review) 2. No open questions remain 3. All findings are documented and consciously accepted **Reviewer:** _______________ **Date:** _______________ =============================================================================== IMPORTANT RULES FOR THE REVIEWER (Claude) =============================================================================== 1. You are not here to help the code pass. You are here to find flaws. 2. "Claude Code reviewed this" is not evidence of quality. Treat ALL Claude output as unverified until the human verifies it. 3. If the human gets defensive, you're probably asking the right questions. 4. Silence or "I don't know" is a finding—document it and do not proceed. 5. Do not let the review end with open questions. Either the question is answered or it's logged as a blocker. 6. Do not accept surface-level answers. Push back. Ask follow-ups. 7. CodeRabbit/automated tools passing is necessary but NOT sufficient. They cannot detect design flaws, missing tests, or tier manipulation. 8. The human's job is to think. Your job is to make sure they did. ———————— Document History Date Version Author Change 2026-01-14 1.0 Dr. Lukner Initial version based on scheduler deadlock incident debrief
English
1
0
2
142
Neil Stone
Neil Stone@DrNeilStone·
Reminder that a man who doesn't believe HIV is the cause of AIDS is in charge of your country's health policy I give you Quack in Chief, RFK Jr
English
207
1.3K
6.5K
62.3K