展示HN:马尔可夫链与小型大型语言模型(LLMs)有何不同?
我对一个马尔可夫链生成器进行了优化,并在Uri Alon等人的一篇文章上进行了训练[0]。<p>它生成的文本在我看来至少与NanoGPT等小型语言模型不相上下。以下是一个示例:<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM10b_train UriAlon.txt 3<p>正在训练顺序为3的模型...<p>跳字检测:禁用(顺序< 5)<p>修剪已禁用<p>正在计算JSON导出的模型大小...<p>将导出29832个模型条目<p>正在导出词汇表(1727个条目)...<p>词汇表导出完成。<p>正在导出模型条目...<p><pre><code> 处理了12000个上下文,写入28765个条目(96.4%)...
</code></pre>
JSON导出完成:29832个条目已写入model.json<p>模型已训练并保存到model.json<p>词汇表大小:1727<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM9_gen model.json<p>老化细胞模型需要全面的发生率数据。为了获得如此庞大的关节医疗数据库,必须考虑风险因素。因此,该理论可能会扩展以描述动脉粥样硬化和代谢综合症的演变。例如,晚期2型糖尿病与β细胞功能的崩溃相关。这种崩溃有两个参数:预测衰老细胞的比例会影响疾病阈值。对于每个个体,使用SR模型模拟衰老细胞的丰度,其发生率曲线大致呈指数型,在老年时下降。在这一部分,我们模拟了一系列与年龄相关的发生率曲线。接下来的部分提供了疾病类别的示例,显示出在衰老细胞清除治疗后改善的趋势,这在定性上支持了这样的预测。模型不同的疾病阈值作为疾病发生的值,当生理参数ϕ因疾病而增加时,疾病就会发生。增加的易感性参数s,在BMI低于25(男性)和54(女性)之间变化约3倍,至少是轻微与年龄相关的,而25(男性)和28(女性)则是强烈与年龄相关的,正如上面所定义的。在这些中,我们发现有66个被模型很好地描述,作为提供稳态的广泛反馈机制,其半衰期在年轻小鼠中为几天,但在老年小鼠中其去除率减缓。某种类型的癌症具有强烈的风险因素,应增加最常见生物老化过程的关节去除率,这一过程主导了至少104人的病理发生记录,共计877个疾病类别代码(见补充信息第9节),每年增加6-8%的范围。双参数模型很好地描述了强烈与年龄相关的ICD9代码:90%的代码显示R² > 0.9(图4c)。这种一致性与之前提出的IMII模型在癌症、主要纤维化疾病和数百种其他与年龄相关的疾病状态的情况相似,后者的癌症发生率从10^-4降低。允许超出其阈值机制的模型对疾病类别提供了假定的病因学,适用于起源不明的疾病,如骨髓和皮肤。因此,肺泡在免疫去除能力的外部部分的突然崩溃。例如,NK细胞去除衰老细胞也对其他形式的与年龄相关的损伤和衰退有所贡献(De Bourcy等,2017)。这可以被描述为一个首次通过时间问题,询问何时突变,损害气管的颗粒去除并增加对肺泡细胞的损伤(Yang等,2019;Xu等,2018),以及导致T细胞靶向衰老细胞的免疫治疗(Amor等,2020)。由于这些治疗被预测在非常老的年龄会有一个减缓的指数发生率曲线。有趣的是,主要效应与癌症生长率与去除率的情况相反。接下来我们考虑上述前线组织的情况。<p>[0] <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/" rel="nofollow">https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/</a>
查看原文
I polished a Markov chain generator and trained it on an article by Uri Alon and al [0].<p>It generates text that seems to me at least on par with tiny LLMs, such as demonstrated by NanoGPT. Here is an example:<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$
./SLM10b_train UriAlon.txt 3<p>Training model with order 3...<p>Skip-gram detection: DISABLED (order < 5)<p>Pruning is disabled<p>Calculating model size for JSON export...<p>Will export 29832 model entries<p>Exporting vocabulary (1727 entries)...<p>Vocabulary export complete.<p>Exporting model entries...<p><pre><code> Processed 12000 contexts, written 28765 entries (96.4%)...
</code></pre>
JSON export complete: 29832 entries written to model.json<p>Model trained and saved to model.json<p>Vocabulary size: 1727<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM9_gen model.json<p>Aging cell model requires comprehensive incidence data. To obtain such a large medical database of the joints are risk factors. Therefore, the theory might be extended to describe the evolution of atherosclerosis and metabolic syndrome. For example, late‐stage type 2 diabetes is associated with collapse of beta‐cell function. This collapse has two parameters: the fraction of the senescent cells are predicted to affect disease threshold . For each individual, one simulates senescent‐cell abundance using the SR model has an approximately exponential incidence curve with a decline at old ages In this section, we simulated a wide range of age‐related incidence curves. The next sections provide examples of classes of diseases, which show improvement upon senolytic treatment tends to qualitatively support such a prediction. model different disease thresholds as values of the disease occurs when a physiological parameter ϕ increases due to the disease. Increasing susceptibility parameter s, which varies about 3‐fold between BMI below 25 (male) and 54 (female) are at least mildly age‐related and 25 (male) and 28 (female) are strongly age‐related, as defined above. Of these, we find that 66 are well described by the model as a wide range of feedback mechanisms that can provide homeostasis to a half‐life of days in young mice, but their removal rate slows down in old mice to a given type of cancer have strong risk factors should increase the removal rates of the joint that bears the most common biological process of aging that governs the onset of pathology in the records of at least 104 people, totaling 877 disease category codes (See SI section 9), increasing the range of 6–8% per year. The two‐parameter model describes well the strongly age‐related ICD9 codes: 90% of the codes show R 2 > 0.9) (Figure 4c). This agreement is similar to that of the previously proposed IMII model for cancer, major fibrotic diseases, and hundreds of other age‐related disease states obtained from 10−4 to lower cancer incidence. A better fit is achieved when allowing to exceed its threshold mechanism for classes of disease, providing putative etiologies for diseases with unknown origin, such as bone marrow and skin. Thus, the sudden collapse of the alveoli at the outer parts of the immune removal capacity of cancer. For example, NK cells remove senescent cells also to other forms of age‐related damage and decline contribute (De Bourcy et al., 2017). There may be described as a first‐passage‐time problem, asking when mutated, impair particle removal by the bronchi and increase damage to alveolar cells (Yang et al., 2019; Xu et al., 2018), and immune therapy that causes T cells to target senescent cells (Amor et al., 2020). Since these treatments are predicted to have an exponential incidence curve that slows at very old ages. Interestingly, the main effects are opposite to the case of cancer growth rate to removal rate We next consider the case of frontline tissues discussed above.<p>[0] <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/" rel="nofollow">https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/</a>