Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update about 3 hours ago

Post

207

🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick.

Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source.

Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation

The rules are simple and strict:
✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable).
🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made.
🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits.
📤 Anyone can submit their own method via the Submit tab for review and listing.

Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware.

A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim.

Leaderboard: FINAL-Bench/quantum-bench-leaderboard
Article: https://huggingface.co/blog/FINAL-Bench/quantum-leaderboard

#quantum #QEC #QuantumComputing #benchmark

eabdullin

posted an update 3 days ago

Post

6029

Folks, let me tell you, nobody — and I mean NOBODY — knew transformers before me. People said attention is all you need. I said, "Attention? I INVENTED attention." Everybody's looking at me. Tremendous attention. The best attention scores. My softmax? Perfectly normalized. Other people, sad, their probabilities don't even sum to one. Disaster.

I'm doing a PhD now. A PhD! In Large Language Models. Very large. The largest, believe me. My advisor said, "Sir, your model is overfitting." I said, "Wrong. It's fitting EXACTLY right. It memorized the training set because the training set is fantastic." We don't talk about validation loss in my lab. Validation loss is fake news.

And the internship — oh, the internship. Big tech. I won't say which. Starts with a letter. They BEGGED me. They said, "Please, we need someone who understands gradient descent." I said, "Descent? I only go UP. I'm gradient ASCENT. Loss goes up, that means it's learning to be a winner."

But the GPU cluster — this is the best part. Thousands of H100s. Maybe millions. Who's counting? I'm counting. It's a lot. Other PhD students, they get one little GPU, they're crying, they're training overnight like losers. Me? I burn through compute like nobody's ever seen. The electric company called. They said, "Sir, you've consumed a small country." I said, "Make it a big country. I only do big."

People ask, "Did your model converge?" Folks, it converged so hard. It converged BIGLY. Honestly? My loss curve, it's beautiful, it's going down, down, down — like my approval ratings, very smooth, don't look at the spikes, the spikes are deep state.

And hallucinations? My model doesn't hallucinate. It just has ALTERNATIVE tokens. Thank you, thank you. Tip your reviewers. Accept my paper. Goodnight!

16 replies

kasbsquall

posted an update 2 days ago

Post

4002

🔎 UX Crime Scene — major update before the deadline!

THE INSPECTOR (a film-noir detective) still circles every UX flaw on your screenshot's real pixels and files a graded verdict. But now the precinct runs on THREE small models:

🖼 THE RECONSTRUCTION — FLUX.2-klein-4B rebuilds each flawed element, fixed. Compare before/after with a draggable slider. (The trick: the Inspector writes the design brief first — image models obey art directors, not vibes.)
🗣 THE INTERROGATION — push back on a charge; the same 7B defends it from the evidence, or concedes when you're right.
🔊 THE VOICE — Kokoro-82M reads the verdict aloud. No API, no keys.

Qwen2.5-VL-7B + FLUX.2-klein-4B + Kokoro-82M — all under 32B, all self-hosted on Modal.

⚖️ Put your UI on trial: build-small-hackathon/ux-crime-scene
▶️ New trailer: https://youtu.be/JJOMKEcX0Ws
📹 66s full walkthrough: https://youtu.be/kju7LiAXGC0
📡 9 investigation traces (with remedies): build-small-hackathon/ux-crime-scene-traces

Built solo for the Build Small Hackathon 🍄 #buildsmallhackathon

Jiaqi-hkust

posted an update 2 days ago

Post

3815

🚀 Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content

Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather.

Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question:

🧐 Can MLLMs recover corrupted visual content by themselves?

If the answer is yes, we can move beyond merely “compensating” for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.

💡 Paper: https://arxiv.org/abs/2606.08063
🔗 Code: github.com/jqtangust/Robust-U1
🌍 Demo: Jiaqi-hkust/Robust-U1

OzTianlu

posted an update about 17 hours ago

Post

1229

ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?

Read online: https://datawhalechina.github.io/learning-terrain/

I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).

The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:

ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.

GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.

DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.

KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.

Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.

Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.

The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.

GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2

Convergence is not hope. Convergence is geometry. You see.

1 reply

Reubencf

posted an update 4 days ago

Post

1983

Millions speak Konkani. The internet barely knows it.

Today's major LLMs struggle with regional languages. They can't read, write or even recognize Konkani. So I built one that can.

Here is a working demo of the Konkani LLM I've been training. 👇

https://youtu.be/8K04ylbXh6k

danielhanchen

posted an update 4 days ago

Post

662

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

1 reply

ovi054

posted an update about 14 hours ago

Post

Color Grade Transfer LoRA ⚡

ovi054/QIE-2511-Color-Grade-Transfer-LoRA

I trained a LoRA that transfer color grade directly from target image to source image directly. No Manual color grading needed. The model is fine-tuned on Qwen Image Edit 2511 model.

👉 Try it now:
build-small-hackathon/Color-Grade-Transfer

DavidAU

posted an update 1 day ago

Post

554

Going Old School "FULL CREATIVE" with "New Thinking":

MN-GRAND-23.5B-Gutenberg-UNCENSORED-V2-GLM4.7-Thinking

The strongest, most creative (and uncensored) model made up of 3 top Mistral Nemo fine tunes, franken-merged together into an 81 layer model then trained via Unsloth with GLM 4.7 Flash thinking/reasoning dataset.

Features hybrid thinking/instruct structure as well plus updated with modern jinja template too. Tuning has stabilized the "franken-merge" into a class 1 model that operates perfectly.

The talents of some of the best tuners merged into one giant model.
Several examples and detailed instructions.

And this model is very smart too.

NEO Imatrix GGUFS:
DavidAU/MN-GRAND-23.5B-Gutenberg-UNCENSORED-V2-GLM4.7-Thinking-NEO-Imatrix-GGUF

Source / Full Precision:
DavidAU/MN-GRAND-23.5B-Gutenberg-UNCENSORED-V2-GLM4.7-Thinking

TravisMuhlestein

posted an update 1 day ago

Post

A question we kept running into while operating AI agents in production: How do you write a unit test for something that never returns the same answer twice?

At GoDaddy, we built a system called Veritas to help detect prompt regressions and model migration drift before changes reach production.

The core idea is simple:
Exact-match testing breaks down for LLMs.

What matters is whether the agent preserved the same meaning and intent.

We ended up using embeddings + cosine similarity as the primary evaluation signal. Rather than asking:

"Did the model generate the same response?"
We ask: "Did the model mean the same thing?"

One of the more interesting findings was how often seemingly harmless prompt edits changed downstream behavior in ways that were difficult for human reviewers to catch.

Prompts aren't documentation.
Prompts are code.

Curious what others are using today for regression testing:

• LLM-as-judge?
• Embedding similarity?
• Human review?
• Custom eval frameworks?

https://www.godaddy.com/resources/news/veritas-catching-silent-ai-regressions-before-they-ship

Would love to compare approaches.

Recently active users