Introducing DecorLM

Interior design by conversation: a mood board, a floor plan, photoreal renders, and a shopping list of real furniture, in minutes. On why that is technically hard — and the system we built for it.

The problem behind the product

Interior design is how people express who they are and how they want to live — each design as distinct as the person it belongs to. Our lives are intimately connected to the spaces we inhabit: they are the backdrop for our memories, our hobbies, our time with the people we love. And yet making a space actually fit a life means aligning functional and visual expectations with the hard constraints of physical geometry, and that alignment is a professional skill. Which is why good interior design is, in practice, a luxury.

Even hiring a professional does not close the gap, because the gap is linguistic before it is financial. What an inhabitant truly wants and what they can convey in the language of professionals are different things, and the distance between them is where unsatisfactory results come from. People end up living in ill-fitting spaces, and the cost is not aesthetic — it is felt in physical and mental well-being, every day, in the place where they spend most of their lives. Designing a space that fits should not require learning to speak designer. It should be accessible to everyone.

Stated as a technical task, this is 3D indoor scene synthesis: given an unstructured, grammar-free description of preferences — “make it feel calmer, keep the bed, around £3,000” — deliver a 3D design solution that aligns with it. Hidden inside that one sentence are four distinct competences a system must have. It must interpret abstract input: “calmer” is not a parameter. It must know what belongs in a room beyond what was said — a bedroom needs a wardrobe whether or not the user mentions one, so the system needs the common sense to include high-frequency objects unprompted. It must reason about plausible spatial relationships drawn from an unrestricted vocabulary. And it must know real objects — their function, their dimensions, their style — because a design is ultimately a claim about specific things in specific places.

What the field has tried

Every serious approach to this task has hit a different wall, and the walls are instructive.

Learned priors. Generative models trained on datasets of expert-designed rooms — autoregressive transformers, diffusion models over layouts — produce diverse, realistic arrangements. But their performance is governed by the closed-set data they were trained on, which is inevitably a biased and incomplete sample of the world. They accept only structured text with predefined grammar, and they struggle to produce practical designs for real, unseen interiors — your oddly shaped room was not in the dataset, and your preferences do not fit the grammar.

Generate images, lift to 3D. A second family generates multi-view images or panoramas and lifts them into 3D via monocular depth estimation and stitching. These systems inherit the visual richness of image models and none of their geometric discipline: they cannot integrate the constraints that make a design buildable — floor plans, walls — depth estimation injects uncertainty into the final mesh, and the loose semantic coupling across views produces rooms that are locally pretty and globally absurd. The canonical failure is multiple beds in one bedroom: each view was plausible, and nothing enforced that they describe the same room.

Rules and grammars. Classical methods encode placement know-how directly — hand-written rules, procedural grammars. They are precise, interpretable, and frozen: every design principle must be authored in advance, so they cannot absorb open-ended preference language, and they scale with the patience of whoever maintains the rulebook.

Ask a language model for the layout. The newest family hands the whole problem to an LLM and requests absolute coordinates for every object. For a handful of objects this works. At realistic density it collapses: published evaluations of direct coordinate prediction report roughly half to two-thirds of scenes containing furniture out of bounds — through walls, outside the floor plan — and single-step generation offers no interpretability about why anything is where it is.

The conclusion we draw from the landscape: language models are the only available generator with the world knowledge the task demands — they encode, from internet-scale data, what rooms are like, what objects do, and what “calmer” means. And retrieval from a product database, rather than asset generation, is the only path with direct real-world applicability — retrieved objects can be bought. But raw LLM output is not a design. Structured output without hard constraints, communication interfaces, and cross-checks is exciting at first glance and leaves much room for improvement. The interesting engineering is everything wrapped around the model.

Why this is hard

Here is the same conclusion, mechanism by mechanism — the specific places where the naive system breaks.

Coordinates demand bookkeeping models cannot do. Direct position prediction fails for a structural reason, not a prompting one. Each placement reshapes the feasible region of every remaining object, so emitting a layout token-by-token asks the model to do cumulative geometric bookkeeping across its own output — and attention over text is a poor ledger. Language models learned space from descriptions of space, and descriptions do not carry collision constraints.

So you ask for relations instead. Design intent is genuinely relational — the rug goes under the coffee table, the sofa faces the television — so the natural intermediate representation is a scene graph: objects as nodes carrying dimensions, rotation, and the clearance their children will need; edges drawn from a deliberately closed vocabulary of spatial prepositions — on, left of, right of, in front, behind, under, in the corner — plus an adjacency flag. The vocabulary is closed on purpose. Open spatial language invites the model to invent relations that sound right and resolve to nothing.

Then you discover the graph lies to you. A relational sketch can be locally plausible and globally impossible in at least four distinct ways. It can contain cycles — A behind B, B behind C, C behind A — which read fine sentence-by-sentence and make ordered placement impossible. It can contain physically absurd edges: an object placed left of something already against the west wall now lives outside the room; a floor lamp asked to support a bookcase violates gravity. It can be ambiguous about siblings — five ornaments “on the shelf” says nothing about their order, so something must impose one. And it can be quietly incompatible about size, with children whose combined footprint exceeds the surface of their parent. None of these can be fixed by asking the model to be more careful, because the model’s sizes and claims are themselves unreliable. They have to be caught by hard checks — computed predicates over the graph, boundary tests, adjacency feasibility, parent-child packing — that repair or reject edges before anything is placed.

A valid graph still has to be solved. Relations only become a room through search. Sort the graph topologically by depth — distance from the room’s fixed elements — and place layer by layer: walls and floor first, then what touches them, then what sits on those. Each preposition defines a feasible region; sample a position from it, check collisions, continue. When a feasible region comes back empty, the only honest move is to backtrack — tear out the layer above and re-place it — because the failure was caused upstream. This search is randomized, and in a small room with many objects it can fail to terminate at all: configurations that are feasible in principle but vanishingly unlikely to be sampled. Box approximations add their own artifacts — objects hovering above a curved surface their bounding box claims to touch. Placement, in other words, is an algorithms problem wearing a design problem’s clothes.

The model also runs out of room in its own head. Output budgets are finite, and long structured outputs degrade before they truncate: as a response grows, attention to the original brief decays, suggestions begin to repeat, and the model loses track of what it already placed. Past a modest scene size you cannot ask one model instance to select the furniture, define the relations, and emit validated structure in one breath. The work has to be decomposed — selection, spatial relations, schema, correction, refinement — each pass small enough to stay sharp. Decomposition rescues quality and introduces its own failure surface: every interface between passes is a place where intent can be dropped.

Evaluation has no ground truth. When a layout is wrong geometrically, you can measure it — out-of-bounds rates, collision losses. When it is wrong aesthetically, you are scoring taste. Vision-language judges correlate tolerably with human preference, but a single score conflates every failure mode — was it the graph, the assets, the palette, the camera angle? The pattern in published results is telling: systems that eliminate geometric errors entirely still earn middling ratings on atmosphere and colour scheme. Functional correctness is the solvable half of design. The judged half — does this room feel like what the person meant — stays genuinely hard, which is why our pipeline checks what can be checked with hard metrics and reserves judgment calls for points where a human approves.

And then, the asset gap. Everything above can go right and the room still falls apart at the last step, because the layout was computed over idealised boxes and the scene must be filled with actual 3D assets. Retrieval by joint text-shape embeddings is lossy: the nearest neighbour to “laptop” may be open when the planner assumed closed — two objects with the same name and entirely different proportions — and rescaling the asset to the predicted box distorts it. Canonical orientations disagree (which side of a desk is the front is a judgment call the database did not make for you), and retrieved textures ignore your palette. Generating assets instead of retrieving them solves none of this and adds the fatal flaw: generated furniture cannot be bought. Which returns to the constraint our users set — the output must be a shopping list.

The solution

Those are the problems any system in this space must answer. We made three commitments that change the shape of the answers.

1. Start from the real room. A brief about an abstract rectangle produces a design for an abstract rectangle. DecorLM reconstructs your room first: a LiDAR scan on iOS, ordinary photos or video lifted to multi-view point clouds (VGGT with SAM-3 segmentation), a single panorama, or a floor plan you draw. The point cloud then goes through semantic floor-plan extraction in the SpatialLM family — and the thin structures that step tends to lose, doors, windows, openings, are exactly why our first paper, Coarse Semantic Injection, attacks that failure. A layout that blocks a door is not a design, so the layout planner scores every archetype against circulation clearance and door and window conflicts before anything is placed.

2. A plan, not a prompt. The whole engagement runs as a durable workflow — four tasks, twenty-four nodes — where each step produces a checked artifact the next step consumes: consultation, room capture, style discovery, internal layout planning, product sourcing, a concept board the user approves in chat, then 2D layout generation inside an explicit evaluation loop. That loop renders a top-down view each iteration and holds it against hard metrics — circulation above 30% of the floor, walkways wider than 60 cm, seating conversations at 2–3 m, television viewing angles under 30° — alongside an LLM judge, iterating until the score clears threshold. Free-form feedback routes upstream to style, layout, or sourcing, and the affected artifacts are rebuilt in the same conversation. Geometry is never asked for in one shot, and nothing reaches a render that has not passed its checks.

3. Real products as a first-class constraint. Sourcing searches a catalogue of purchasable furniture per layout zone — style match, dimensions that must fit the zone, budget filter, keep-items excluded — and persists a product schedule as the source of truth for everything downstream. The layout engine places the true dimensions of the actual product, which dissolves the scale-mismatch class of bugs: place what you will buy, and the render, the floor plan, and the FF&E sheet cannot disagree. Materials follow the same rule — image-to-PBR extraction (CHORD) so surfaces in the render behave like the finishes on the schedule.

What’s next

DecorLM is heading into an invite-only beta. The research that came out of building it lives in Research, and its place in the wider system — one specialist in a swarm of them — is the subject of AGI is Swarm Intelligence.

— Logarithms, London