AI News – Bai Scope

OCRmyPDF Tutorial: Convert Scanned Documents into Searchable PDF/A Files with Sidecar Text Extraction and Batch Processing

AI NewsJune 30, 20264Views 0Likes 0Comments

def _purge(*prefixes): for name in [m for m in list(sys.modules) if any(m == p or m.startswith(p + ".") for p in prefixes)]: del sys.modules[name] def _load_ocrmypdf(): _purge("PIL", "ocrmypdf") import…

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

AI NewsJune 25, 20263Views 0Likes 0Comments

Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema. This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra,…

Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude

AI NewsJune 15, 202613Views 0Likes 0Comments

Zyphra has released Zamba2-VL, a family of open vision-language models. The release covers three sizes: 1.2B, 2.7B, and 7B parameters. Each model is built on the Zamba2 hybrid SSM–Transformer backbone. Vision-language models (VLMs) read images and text together. They answer questions about charts, documents, and photos. Most open VLMs use a dense Transformer as…

A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment

AI NewsJune 5, 202617Views 0Likes 0Comments

In this tutorial, we work through an end-to-end workflow for Qualcomm AI Hub Models. We start by setting up the required package, discovering the available model collection, and loading MobileNet-V2 for local PyTorch inference. We also handle an important input-shape issue by converting NHWC image tensors into the NCHW format expected by the model. From…

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference

AI NewsMay 21, 202618Views 0Likes 0Comments

Liquid AI just released LFM2.5-VL-450M, an updated version of its earlier LFM2-VL-450M vision-language model. The new release introduces bounding box prediction, improved instruction following, expanded multilingual understanding, and function calling support — all within a 450M-parameter footprint designed to run directly on edge hardware ranging from embedded AI modules like NVIDIA Jetson Orin, to mini-PC…

Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation

AI NewsMay 16, 20269Views 0Likes 0Comments

For years, the computer vision community has operated on two separate tracks: generative models (which produce images) and discriminative models (which understand them). The assumption was straightforward — models good at making pictures aren’t necessarily good at reading them. A new paper from Google, titled “Image Generators are Generalist Vision Learners” (arXiv:2604.20329), published April 22,…

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

AI NewsMay 11, 202614Views 0Likes 0Comments

If you’ve ever watched a motion capture system struggle with a person’s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity.…

How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control

AI NewsMay 6, 202611Views 0Likes 0Comments

import random, numpy as np, torch, torch.nn as nn, torch.nn.functional as F import matplotlib.pyplot as plt from dataclasses import dataclass from typing import Tuple, Dict, List from torch.utils.data import Dataset, DataLoader try: from tqdm.auto import tqdm except Exception: def tqdm(x, **kwargs): return x SEED = 7 random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED) if device.type == "cuda": torch.backends.cudnn.benchmark = True @dataclass class WorldConfig: …

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

AI NewsMay 1, 202620Views 0Likes 0Comments

Video foundation models can paint a beautiful frame. They are still notoriously bad at remembering it. Push the camera through a corridor in Wan 2.1 or CogVideoX and walls warp, objects morph, and details vanish — the giveaway that these models are fitting 2D pixel correlations rather than simulating a coherent 3D scene. A team…

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

AI NewsApril 21, 202622Views 0Likes 0Comments

The open-source AI landscape has a new entry worth paying attention to. The Qwen team at Alibaba has released Qwen3.6-35B-A3B, the first open-weight model from the Qwen3.6 generation, and it is making a compelling argument that parameter efficiency matters far more than raw model size. With 35 billion total parameters but only 3 billion activated…