问HN:数据科学家的高性能计算学习路径

1作者: proudmo大约 2 个月前原帖
我拥有数学学位,目前担任数据科学家。虽然我对Python和核心机器学习技术比较熟悉,但我意识到需要加深对高性能计算(HPC)和性能工程的理解,以便优化我的代码,提高速度,并将算法扩展到大型系统中。 具体来说,我对以下内容感兴趣: * 编写高性能、内存高效的代码(例如,使用C++、SIMD、GPU、并行计算) * HPC系统设计与架构 * 优化大规模数据处理和机器学习基础设施 * 针对数据密集型任务的性能分析、延迟优化和内存管理 我在寻找: 1. 能够指导我从扎实的数学和机器学习基础过渡到性能优化的书籍、资源、教程和在线学位课程 2. 从一般数据科学角色转变为处理性能关键系统和大规模计算环境的有效学习路径 我渴望提高构建更高效系统的能力,并在必要时处理大型数据集或复杂模型,以实现近实时的性能。 非常希望能得到任何推荐、个人经验或资源,以帮助我指导学习!
查看原文
I have a degree in mathematics and currently work as a data scientist. While I’m comfortable with Python and core machine learning techniques, I’ve realized that I need to deepen my understanding of high-performance computing (HPC) and performance engineering in order to optimize my code for speed and scale up algorithms for large systems.<p>Specifically, I’m interested in: * Writing high-performance, memory-efficient code (e.g., using C++, SIMD, GPU, parallel computing) * HPC system design and architecture * Optimizing large-scale data processing and ML infrastructure * Profiling, latency optimization, and memory management for data-heavy tasks<p>I’m looking for: 1. Books, resources, tutorials, online degrees that can guide me from a strong mathematical and ML foundation into performance optimization 2. Effective learning paths to transition from a general data science role to working with performance-critical systems and large-scale compute environments<p>I’m keen to improve my ability to build more efficient systems and handle large datasets or complex models with near real-time performance where necessary.<p>Would love any recommendations, personal experiences, or resources to help guide my learning!