AI Shubka
  • Home
No Result
View All Result
AI Shubka
  • Home
No Result
View All Result
AI Shubka
No Result
View All Result
  • Home
  • Affiliate & Tool Guides
  • AI & Future Tech
  • AI Learning & Tutorials
  • Business & Digital Strategy
  • Gadgets & Reviews
  • Motivation & Personal Growth
Three AI engines walk into a bar in single file… • The Register

Three AI engines walk into a bar in single file… • The Register

ShubkaAi by ShubkaAi
February 8, 2026
in AI & Future Tech, AI breakthroughs (GPT updates, generative models), Best AI tools for creators, Robotics & automation, Tech forecasts
0
585
SHARES
3.3k
VIEWS
Summarize with ChatGPTShare to Facebook


Developers looking to gain a better understanding of machine learning inference on local hardware can fire up a new llama engine.

Software developer Leonardo Russo has released llama3pure, which incorporates three standalone inference engines. There’s a pure C implementation for desktops, a pure JavaScript implementation for Node.js, and a pure JavaScript version for web browsers that don’t require WebAssembly.

“All versions are compatible with the Llama and Gemma architectures,” Russo explained to The Register in an email. “The goal is to provide a dependency-free, isolated alternative in both C and JavaScript capable of reading GGUF files and processing prompts.”

GGUF stands for GPT-Generated Unified Format; it is a common format for distributing machine learning models.

Llama3pure is not intended as a replacement for llama.cpp, a widely used inference engine for running local models that’s significantly faster at responding to prompts. Llama3pure is an educational tool.

“I see llama3pure as a more flexible alternative to llama.cpp specifically when it comes to architectural transparency and broad hardware compatibility,” Russo explained. “While llama.cpp is the standard for high-performance optimization, it involves a complex ecosystem of dependencies and build configurations, llama3pure takes a different approach.”

Russo believes developers can benefit from having an inference engine in a single, human-readable file that makes evident the logic of file-parsing and token generation.

“The project’s main purpose is to provide an inference engine contained within a single file of pure code,” he said. “By removing external dependencies and layers of abstraction, it allows developers to grasp the entire execution flow – from GGUF parsing to the final token – without jumping between files or libraries. It’s built for those who need to understand exactly what the hardware is doing.”

Russo also sees utility for situations where the developer is running legacy software or hardware, where client-side WebAssembly isn’t an option, and where having an isolated tool without the potential for future dependency conflicts might be desirable.

The C and Node.js engines, he said, have been tested with Llama models up to 8 billion parameters and with Gemma models up to 4 billion parameters. The main limiting factor is the physical RAM required to host model weights.

The RAM required to run machine learning models on local hardware is roughly 1GB per billion parameters when the model is quantized at 8 bits. Double or halve the precision and you double or halve the memory required. Models are commonly quantized at 16 bits, so for a 1 billion-parameter model, 2GB would typically be required.

According to Russo, the calculation for GGUF weights is different.

“GGUF weights are loaded directly into RAM, which usually means the RAM usage matches the entire file size,” he explained. “You can reduce the context window size by passing a specific parameter (context_size) – a feature supported by most inference engines, including the three I designed. While reducing the context window size this is a common ‘trick’ to save RAM when running models locally, it also means the AI won’t ‘remember’ as much as it was originally designed to.”

He also said that llama3pure is presently focused on single-turn inference. He expects to implement chat history state management at a later date.

For daily work, Russo says he uses Gemma 3 as a personal assistant, powered by his C-based inference engine, to ensure that sensitive data is handled privately and offline.

“For a coding assistant, I recommend Gemma 3 27B,” he said. “Regarding the latency concerns, while local models were historically slow, running optimized versions on modern hardware now provides an experience very close to cloud-based models like Claude and without the need to pay for such a service.”

While Russo expects common general use cases for AI assistance will continue to rely on cloud-hosted models, he foresees developers and businesses looking increasingly at local AI. While developer machines with 32GB or 48GB of RAM may lack the context window available with cloud-hosted models, they provide security and privacy without being dependent on service providers.

Asked how he feels as a developer about the AI transition, Russo said he expects developers to eventually transition to AI supervisors.

“Since AI models present answers with high confidence – even when incorrect – a human expert must remain in the loop to verify the output,” he said. “Technical knowledge will not become obsolete; rather, it will become increasingly vital for auditing AI-generated work. 

“While job titles may change, senior developers will always be necessary to maintain these systems, creating a workflow significantly faster than human-only development. For junior and mid-level developers, AI offers the opportunity to learn faster than previous generations. If managed correctly, AI can facilitate a significant leap in the industry’s intellectual evolution.” ®



Source link

SummarizeShare234
ShubkaAi

ShubkaAi

Related Stories

Reddit on the rise: What is it and why is AI search popularising it?

Reddit on the rise: What is it and why is AI search popularising it?

by ShubkaAi
March 1, 2026
0

If you do a Google search nowadays, you no longer see a list of links at the very top. Instead, you see a summary of search results curated...

Share values of property services firms tumble over fears of AI disruption | AI (artificial intelligence)

US military reportedly used Claude in Iran strikes despite Trump’s ban | AI (artificial intelligence)

by ShubkaAi
March 1, 2026
0

The US military reportedly used Claude, Anthropic’s AI model, to inform its attack on Iran despite Donald Trump’s decision, announced hours earlier, to sever all ties with the...

Can ‘friction-maxxing’ fix your focus?

Can ‘friction-maxxing’ fix your focus?

by ShubkaAi
March 1, 2026
0

Thrilled by his initial success, the artist has now traded the instant gratification of Instagram for longer and more meaningful interactions on Substack, takeaways for home-cooked meals and...

SaaS-pocalypse isn’t coming any time soon • The Register

SaaS-pocalypse isn’t coming any time soon • The Register

by ShubkaAi
March 1, 2026
0

Opinion Say goodbye to the SaaS-pocalypse theory, which posits that advances in AI will bring the software-as-a-service market to its knees. Say hello to "a feedback loop with...

Next Post
If you’re not terrified by AI, you’re not paying attention

If you’re not terrified by AI, you’re not paying attention

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Ai Shubka

AI-Shubka | Smarter Business. Automated Future. Helping entrepreneurs and creators earn more with AI tools, automation, and digital strategy.

Follow us

Recent Posts

On the Future of Species — unnatural selection – Financial Times

On the Future of Species — unnatural selection – Financial Times

March 1, 2026
New to Claude? Use these 6 simple starter prompts to unlock better answers instantly

New to Claude? Use these 6 simple starter prompts to unlock better answers instantly

March 1, 2026

Weekly Newsletter

© 2026 aishubka - Smarter Business. & Automated Future. by aishubka.

Powered by
►
Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
None
►
Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
None
►
Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
None
►
Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
None
►
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
None
Powered by
No Result
View All Result
  • Home
  • Affiliate & Tool Guides
  • AI & Future Tech
  • AI Learning & Tutorials
  • Business & Digital Strategy
  • Gadgets & Reviews
  • Motivation & Personal Growth

© 2026 aishubka - Smarter Business. & Automated Future. by aishubka.