No Input Port

Phreable, 2022.

I went to three doctors about the same problem. Got three different recommendations. All of them involved surgery I wasn't confident in. None of the doctors seemed particularly confident either. I was supposed to pick one and hope.

That was the first time I understood the problem as a patient. I'd been watching it from the inside for over a decade.

Inside the Machine

I spent years in healthcare operations. Medical devices, imaging AI, EHR integrations. Fujifilm. Enlitic. Smaller companies in between. You see the system differently when you're selling into it, building for it, sitting in the rooms where the decisions get made.

At one company, I watched the decision get made not to run a clinical study on a product we were already selling. The results might come back unfavorable, and then they'd have to disclose them. Better not to know.

This wasn't corruption. It was the system working exactly as designed.

Family members would send me discharge summaries, radiology reports. Can you explain this? It read like a foreign language to them. It was written like one, too. Not because the information was complex, but because nothing in the system required it to be readable by the person it was about.

What I Built

In August 2022 I incorporated Phreable. The name came from PHR, personal health record. Y Combinator rejected it three months later.

The idea was Moneyball for health. Billy Beane didn't get access to better players. He took the same data everyone else had and found a better way to read it. Medical records already existed. They just needed someone to make them usable.

I built a Comprehensive Health Index. Seven dimensions of health collapsed into a single adaptive score, the way wRC+ collapses a dozen batting statistics into one number that actually tells you something. Treatment comparison tools that scored options against a patient's own goals, not population averages. A translation layer that turned discharge summaries into plain language.

The product worked in isolation. Clean inputs. Controlled conditions. Curated test cases.

When I fed it real patient data, everything fell apart. The logic was fine. The inputs were nothing like what I'd assumed.

ChatGPT had launched three months after I started building. Everyone believed AI could do anything. For a window, so did I.

GPT-3.5 had a context window of 4,096 tokens. Roughly three printed pages. A single discharge summary is longer than that. To process a real medical record, you had to cut it into chunks. And when you cut it into chunks, you destroyed the relationships that made it meaningful. The medication listed on page one was disconnected from the lab value on page three that explained why it was prescribed. The model couldn't see both at once. I was feeding it puzzle pieces and asking it to describe the picture.

Larger context windows arrived slowly. The capability I needed was always six months away.

And even when models improved, the accuracy problem didn't go away. Microsoft Research tested GPT-4 on medical questions. It scored 87%. Then they added realistic patient context: demographics, family history, socioeconomic factors. The kind of information a real patient brings to a real appointment. The score dropped to 62%. A 25-point collapse from adding reality to the question.

Eighteen percent of GPT-4's medical citations were fabricated. Among the ones that included DOIs, links to actual academic papers, 64% pointed to papers about entirely different topics. Wrong in a way that looked right.

Eighty percent of clinical data is unstructured. Free-text notes with no standard format. Nine billion pages are faxed in US healthcare every year. Those faxed documents enter the electronic medical record as scanned images. Low resolution. Angled. Degraded. Handwritten annotations in the margins.

I was asking a language model to reason about a patient's health based on text that was OCR'd from a fax of a photocopy of a handwritten note.

And even if the models had been perfect and the data had been clean, I couldn't get to the data. The middleware that connected to hospital systems cost $80,000 a year before implementation. Epic, which runs 40% of hospital EMRs, required a live customer before they'd list your integration. You needed the integration to get customers. The federal regulations that were supposed to open things up gave access to the structured 20%: demographics, medications, lab values, billing codes. Not the clinical notes. Not the reasoning.

Each problem was solvable in theory. Not by one person with no funding, all at the same time. Bad data going into limited models through expensive pipes, producing results I couldn't trust with anyone's health.

What It Cost

While I was building all of this, my blood pressure went up. My weight went up. My sleep fell apart. I was building a health product. My health was deteriorating.

So I went to a doctor. And I got exactly the experience I'd been trying to fix.

Your blood pressure's high. You're overweight. Here's a beta blocker. Here's a statin. Have a good life.

No investigation into why. No questions about what had changed. No curiosity about the fact that a year earlier, none of this had been a problem. Numbers on a chart, mapped to pills. The appointment lasted fifteen minutes. Most of that was the nurse taking vitals.

I was the patient I'd been trying to help. Sitting there, holding a prescription, knowing there was more going on than this visit was designed to find.

I could have walked in with a perfect dashboard. Every metric tracked, every trend annotated, a year of data organized and ready. The appointment would have gone exactly the same way. Nothing in the doctor's workflow, the billing code, or the insurance structure rewarded spending a single extra minute on a patient who'd done their homework.

There was no input port. The system didn't have a place to receive what I was trying to give it.

The technical walls were real. But they were surface barriers. Underneath was the incentive structure itself. The fifteen-minute appointment slot. The billing code that paid the same whether the doctor spent five minutes or fifty. The insurance reimbursement that measured nothing about outcomes. None of it was broken. That was the problem.


I stopped building Phreable.