The Gap Between Building ML Systems and Doing ML Research

1 minute read

Published:

I started at DASION in 2021 as a high school intern. By the time I enrolled at Berkeley this month, I had spent three years building ML systems that actually ran in clinical settings — models that processed real patient data, infrastructure that stayed up at 99.9%, pipelines that clinicians depended on. I thought that experience would translate directly to research.

It doesn’t. Or at least, not in the way I expected.

The difference is subtle but it matters enormously. In industry, the question is: does this work? Can we ship it? Is it reliable enough? The success condition is the system running without breaking. In research, the question is: why does this work? What does it tell us about the problem? Where does it fail, and what does that reveal?

I spent three years optimizing for the first set of questions. Getting to 93% diagnostic accuracy was a win. Understanding why the model failed on the other 7% — what those cases had in common, what the failure mode revealed about the representation — that wasn’t the priority. We had clinical partners waiting.

The shift I’m trying to make at Berkeley is genuinely hard. It requires slowing down in a way that feels unproductive. Sitting with a failure instead of patching it. Asking “what does this mean” instead of “how do we fix this.” It’s a different muscle and I’m aware I haven’t built it yet.

What I’m hoping is that the industry experience doesn’t go away — it just gets reframed. I’ve seen what breaks in deployment. I know which failure modes matter and which ones are theoretical. That’s not nothing. But turning observation into contribution requires a different kind of work, and I’m just starting to understand what that looks like.