backfill oct

benthecoder · benthecoder · commit d81d97ae9e14 · 2025-10-21T17:41:10.000-05:00
diff --git a/posts/141025.md b/posts/141025.md
@@ -0,0 +1,7 @@
+---
+title: 'responding effectively'
+tags: 'journal'
+date: 'Oct 14, 2025'
+---
+
+felt anxious today when i got a message that i wasn't sure how to respond. so i ask for help on suggestions on how to respond. but my help is not responding me as well. which means i'm leaving that person on read. as each minute passes by, i overthink about whether that person is waiting for my response as well. so i'm stuck in between, i have all the context about the project, but not the authority nor skill to communicate effectively. me not replying says something about my character, and what i actually say also tells a lot about my character as well, so i'm paralyzed. i need to learn how to avoid this in the future.
diff --git a/posts/151025.md b/posts/151025.md
@@ -0,0 +1,19 @@
+---
+title: 'presentation skills'
+tags: 'journal'
+date: 'Oct 15, 2025'
+---
+
+watching my colleagues present during prod review, i observed a few things that make them good presenters, they:
+
+- show they are excited and proud about the thing more than anyone else
+- are brief on the technical details
+- present a scenario and gives you a problem
+- jump right into a demo
+- explain what problem it solves
+  - presents clear metrics, instead of 10 hours, it's now 10 minutes
+- talk about what can be better, and how it's going to be used today, how it will evolve
+
+most people are so skillful at presenting that i can't help but admire them, and want to learn how to present like them as well.
+
+after the presentations and news, i felt immensely grateful for being at this company. and i was filled with even more excitement and hope for the upcoming months, as i continue to work on interesting projects that have a huge impact on health in the US.
diff --git a/posts/161025.md b/posts/161025.md
@@ -0,0 +1,53 @@
+---
+title: 'limitations of embeddings'
+tags: 'journal'
+date: 'Oct 16, 2025'
+---
+
+here's a scenario:
+
+you have a vector db of disease terms mapping to information you want, say journal_id, and given a user query, you want to find all the journal_ids for that disease term.
+
+say the disease term is "autoimmune polyendocrine syndrome type 1"
+
+and you use using text-embedding-ada-002 and got this back:
+
+```txt
+0.6462 │ Type 1 Diabetes Mellitus          │ ID_001, ID_003 (+92 more)
+0.6452 │ Type 1 Diabetes                   │ ID_002, ID_004 (+95 more)  
+0.6443 │ Autoimmune Polyendocrinopathy...  │ ID_099
+```
+
+my first thought, being medically illiterate like me, is oh the model saw Type 1 in both terms, so it latched on to that heavily. these general embeddings are bad at medical concepts. but it understood it more than i give it credit for.
+
+here's the gist:
+
+type 1 diabetes is actually a component diseases of [APS-1](https://pmc.ncbi.nlm.nih.gov/articles/PMC2859971/). APS-1 (aka [APECED](https://emedicine.medscape.com/article/124183-overview)) is caused by mutation in the [AIRE](https://en.wikipedia.org/wiki/Autoimmune_regulator) gene.
+
+The AIRE protein normally teaches your immune system not to attack your own organs, but when its broken, it goes rogue and attacks multiple organs.
+
+diagnosis requires at least 2 out of 3 classicial triad symptoms:
+
+- chronic candidiasis (yeast infections)
+- hypoparathyroidism (low calcium)
+- addison's disease (adrenal failure)
+
+but only 45-67% of patients actually develop all three, many patients develop other autoimmune conditions or in addition, including: type 1 diabetes (18% of APS-1 patients), autoimmune hepatitis, vitiligo (skin), alopecia (hair loss), thyroid problems
+
+so APS-1 is a syndrome that can include T1D as one of the many possible manifestation, and the embedding model picked up on this relationship
+
+```txt
+APS-1 (the syndrome - broken AIRE gene)
+├── Chronic candidiasis (73-100%)
+├── Hypoparathyroidism (76-93%)
+├── Addison's disease (72-100%)
+├── Type 1 diabetes (~18%)
+├── Alopecia (29-40%)
+└── Other manifestations
+```
+
+so searching for APS-1 and getting a T1D is like seraching heart attack and getting "chest pain" back. yes chest pain is a symptom of heart attack, but if someone needs info on heart attacks, giving them chest pain resources misses the point. they want the specific condition, not one of its symptoms. 
+
+the solution?
+
+medical specific embeddings, a two stage retrieval (common in industry), contrastive finetuning w triplet loss where positive pairs are synonyms and exact matches, and hard negatives are manifestations and siblings.
diff --git a/posts/171025.md b/posts/171025.md
@@ -0,0 +1,7 @@
+---
+title: 'look at the data'
+tags: 'journal'
+date: 'Oct 17, 2025'
+---
+
+friendly reminders for self: when you refactor code please please document it somewhere, and make sure that change is reflected on all code that uses it. this way you won't face an egg on face situation where you train a model on bad data. and always. look. at. the. data.
diff --git a/posts/181025.md b/posts/181025.md
@@ -0,0 +1,69 @@
+---
+title: 'training with less data'
+tags: 'journal'
+date: 'Oct 18, 2025'
+---
+
+i was wondering if there's a way to know which data actually matters before you even train. like can you look at your dataset and say "these 2k examples are worth more than those 10k"
+
+does more data always = better model?
+
+[research](https://arxiv.org/abs/2001.08361) shows it's a power law, not linear:
+
+```
+100 samples → loss = 10
+1,000 samples (10x more) → loss = 5 (not 1)
+10,000 samples (10x more) → loss = 2.5 (not 0.5)
+```
+
+it has diminishing returns that [holds across seven orders of magnitude](https://www.pnas.org/doi/10.1073/pnas.2311878121).
+
+what about finetuning? since the base model already knows a lot, you're just teaching it something specific, does the same rule apply?
+
+yes but you might only need 20-50% of your data to get 95% performance. so which 20-50%?
+
+j morris showed that models have a [capacity limit](https://arxiv.org/abs/2505.24832). GPT-style models memorize ~3.6 bits per parameter.
+
+this means a 1B parameter model can only memorize ~450MB of information. that's your budget.
+
+training on more data doesn't increase budget. just spreads it thinner.
+
+when you exceed capacity, model is forced to generalize instead of memorize. this explains grokking - that moment when performance suddenly jumps.
+
+so the question becomes: which data fills the budget?
+
+if you have lots of data, keep hard examples. easy ones are redundant.
+
+if you have little data, keep easy examples. hard ones might just be noise.
+
+[someone showed](https://arxiv.org/abs/2206.14486) you can discard 20% of ImageNet without hurting performance. potentially breaking power law scaling.
+
+how do you actually do this though?
+
+there's [information bottleneck](https://adityashrm21.github.io/Information-Theory-In-Deep-Learning/) theory - find maximally compressed mapping that preserves info about output. keep only data that tells you something useful.
+
+practical methods exist:
+- [coreset selection](https://arxiv.org/abs/1907.04018) - finds small weighted subset that approximates full dataset
+- geometry-based pruning - preserve feature space structure
+- uncertainty-based - keep what model is uncertain about
+- error-based - keep high-loss examples
+
+problem: most don't scale well. best ones are expensive to compute.
+
+there's also this idea of [four scaling regimes](https://www.pnas.org/doi/10.1073/pnas.2311878121). basically asking two questions:
+
+1. is the bottleneck your data or your model?
+2. is the problem noise or lack of detail?
+
+two limitations:
+
+- **variance-limited:** error from noise in limited samples (like photos in a dark room)
+- **resolution-limited:** can't capture fine-grained patterns (like a pixelated image)
+
+knowing which regime you're in tells you if more data helps or if you need something else.
+
+j morris continues to show [embeddings](https://arxiv.org/abs/2505.12540) from different models converge to similar representation geometries. 
+
+if there's a universal geometry, maybe there's an optimal compression of training data that fills that structure efficiently.
+
+there's also a ton of research on synthetic data that can fit into the equation as well. a rabbit hole that i would love to dive into some other time.
diff --git a/posts/191025.md b/posts/191025.md
@@ -0,0 +1,16 @@
+---
+title: 'adding some features'
+tags: 'journal'
+date: 'Oct 19, 2025'
+---
+
+had a bit of free time as things slowed down.
+
+i added a few things to my blog
+
+- a [heatmap](/posts) like the github commits graph, but of my posts, and the shades represent length of my posts.
+- a spotify listening [activity](/now), showing current, and recent 3 songs
+- and a [umap](/viz) plot of posts embeddings
+- backfilled tags by asking claude code to read all my posts
+
+claude code is amazing. what can it NOT do?
diff --git a/posts/201025.md b/posts/201025.md
@@ -0,0 +1,37 @@
+---
+title: 'meaning beyond the sun'
+tags: 'journal'
+date: 'Oct 20, 2025'
+---
+
+## create meaning vs discovering meaning
+
+you can't create meaning for yourself.
+
+viktor frankl, the jewish psychologist who was put into death camps during wwii, noticed something. some prisoners who lost everything became bad—they started to steal from other prisoners. others became zombies, they just curled up and died. but there were other prisoners who stayed strong, who stayed courageous.
+
+why? why do some people lose themselves and some remain themselves?
+
+it depended on what they made their meaning in life. if you make your meaning of life something that death camps can take away, then you got no self left.
+
+so what can it take away? anything under the sun. if you live for status, career, family, money, sex, whatever under the sun—anything, it can be taken away.
+
+the only people who stayed strong were people who lived for something that wasn't under the sun. something like God, something like faith.
+
+the lesson: you can't create meaning for yourself. you have to discover meaning in some reality higher than yourself.
+
+## what it means to serve God
+what does it mean to live for God vs living for pleasure? it is possible to obey God not for God's sake but for your own sake. you may think you're very religious, but **if you live for God just to get things—to get blessings, health, success—then you're just as unstable as everyone else**. and you will get that spiritual nausea.
+
+you should serve and obey God just to give him pleasure.
+
+## what is real love?
+
+real love is not just emotional. it's not just "i want a relationship with you because i'm attracted to you," desiring them because you feel happy. it's also not mainly volitional, like "i'm doing my duty."
+
+you know you love somebody when you put your happiness into their happiness. so your greatest happiness in love is just to see them happy. you don't make them happy to feel good about yourself—their joy is your joy, their delight is your delight. there's nothing beyond it.
+
+the only way to get a meaningful life is not just obeying God in some dutiful way. you need to obey God because you love God.
+
+
+– [sickness unto death, tim keller](https://podcast.gospelinlife.com/e/the-sickness-unto-death-1754056766/)