Build and Evaluate High Performance Taxonomy-Based LLMs From Scratch
- Vincent Granville
- April 21, 2024
One obvious way to dramatically improve the quality of LLM and RAG systems is to use high-quality input sources, as opposed to just raw text from the crawled or parsed content. Combine it with specialization: one LLM per top domain, allowing the user to customize parameters and specify the domain in addition to standard concise […]
Read MoreHallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings
- Vincent Granville
- April 12, 2024
The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems. The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower […]
Read MoreExtreme LLM: Case Study, Documentation, Best Practices, and Python sources
- Vincent Granville
- March 2, 2024
Extreme LLM, abbreviated as xLLM, relies on multiple specialized large language models, one per top category, to deliver highly relevant answers to specific questions, covering the entire human knowledge or targeted content such as corporate repositories. The user, in addition to the classic prompt, is invited to select or guess top categories. Behind the scenes, […]
Read MoreProbabilistic ANN: The Swiss Army Knife of GenAI
- Vincent Granville
- February 11, 2024
ANN — Approximate Nearest Neighbors —Â is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]
Read MoreNew GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness
- Vincent Granville
- January 27, 2024
This article covers three different GenAI topics. First, I introduce one of the best random number generators (PRNG) with infinite period. Then I show how to evaluate the synthesized numbers using the full multivariate empirical distribution (same as KS that I used for NoGAN evaluation), but this time with ultra-fast radix search, a competitor to […]
Read MoreMy Top 10 GenAI Articles of the Year
- Vincent Granville
- December 22, 2023
Here is some good reading for the holiday season. More than just reading as the material includes full Python implementations and datasets. The most up-to-date versions are in my new book Statistical Optimization for GenAI and Machine Learning, available here. As a courtesy, if you buy it by December 31, you are entitled to a […]
Read MoreGenome: Synthesizing DNA Sequences with LLM Techniques
- Vincent Granville
- December 8, 2023
This methodology is not focused on genome data alone. The purpose is to design a generic solution that may also work in other contexts, such as synthesizing molecules. The problem involves dealing with a large amount of “text”. Indeed, the sequences discussed here consist of letter arrangements, from an alphabet that has 5 symbols: A, […]
Read More10 GenAI Notebooks: OpenAI, LLM, RAG, GPT, and More
- Vincent Granville
- December 1, 2023
For developers and AI/ML professionals. This comprehensive free resource offered by our sponsor is designed to provide you with hands-on experience and deeper insights into building cutting-edge GenAI applications. 🌟 Special Opportunity: You can win a pair of Apple Airpods simply by following the tutorial and learning something new. How to Participate Follow these 2 […]
Read MoreEasy Trick to Debias GenAI Models: Quantile Convolution
- Vincent Granville
- November 26, 2023
All of the GenAI apps that I tested, including my own, have the same problem. They cannot easily generate data outside the observation range. As an example, let’s focus on the insurance dataset discussed in my new book. I use it to generate synthetic data with GAN (generative adversarial networks) and the NoGAN models discussed […]
Read MoreNew Book: Understanding Deep Learning
- Vincent Granville
- November 16, 2023
By Simon Prince, computer science Professor at the University of Alberta. To be published by MIT Press, Dec 2023. The author shares the associated Jupyter notebooks on his website, here. Very popular, it got over 5,000 likes when the author announced the upcoming book on LinkedIn. I pre-ordered my copy. Summary An authoritative, accessible, and […]
Read More
You must be logged in to post a comment.