👋 Welcome to  Cuterwrite 's Blog

Featured image of post Implementing Local RAG Service: Integrating Open WebUI, Ollama, and Qwen2.5

Implementing Local RAG Service: Integrating Open WebUI, Ollama, and Qwen2.5

This article introduces how to implement an efficient and intuitive Retrieval-Augmented Generation (RAG) service locally, integrating Open WebUI, Ollama, and the Qwen2.5 model through Docker. Steps include deploying Open WebUI, configuring Ollama to use the bge-m3 embedding model for document vectorization, and using the Qwen2.5 generation model to answer user queries. Ultimately, a localized system capable of document retrieval and answer generation is achieved. This method not only simplifies the operation process but also enhances data privacy protection and the application capabilities of generative AI.

Featured image of post Arm Matrix Acceleration: Scalable Matrix Extension SME

Arm Matrix Acceleration: Scalable Matrix Extension SME

This article introduces the Scalable Matrix Extension (SME) in the Arm architecture, focusing on its efficient matrix computation capabilities in the Streaming SVE mode, and the mechanism of using the ZA array for large-scale data storage and flexible access, providing powerful hardware acceleration support for high-performance computing applications.

Featured image of post Arm Performance Optimization: Scalable Vector Extension SVE

Arm Performance Optimization: Scalable Vector Extension SVE

This article introduces Arm's Scalable Vector Extension (SVE) and its enhanced version SVE2. They significantly improve the performance of data-intensive applications (such as HPC and ML) by providing variable-length vector registers, flexible per-lane predication, and a rich instruction set, and ensure portability across different hardware platforms through software binary compatibility. Additionally, SVE provides ACLE (ARM C Language Extensions) to assist developers in programming, allowing SVE instructions to be used directly in C/C++ code by calling intrinsic functions in the arm_sve.h header file for efficient vectorized operations.

Featured image of post RDMA: Memory Window

RDMA: Memory Window

This article is reprinted from Zhihu Column: 14. RDMA Memory Window, Author: Savir. To more flexibly and conveniently control memory access permissions, the IB protocol designed MW. This article mainly introduces the function of MW, its relationship with MR, interfaces, and classification. Additionally, compared to the article on MR, it provides a more in-depth introduction to L_Key and R_Key.

本博客已稳定运行
总访客数: Loading
总访问量: Loading
发表了 25 篇文章 · 总计 60.67k

Built with Hugo
Theme Stack designed by Jimmy
基于 v3.27.0 分支版本修改