Neural Shapes

STATUS: DRAFT

Did you know you can represent 3D shapes with a neural network?

Why?

The same reason it's useful to represent anything with a neural network. Sometimes it's hard to describe what something should be, but easy to say if it's is good or bad. The textbook example of this is identifying animals in images. It's hard to write the function (code) that takes in any image and outputs a dog or cat label. However, it's very easy for a person to say if a label is correct - e.g. "yes, that is a cat image". Neural networks allow us to represent any function with a fixed number of continuous parameters, think of them as dials you can twiddle. By using this representation, we can twiddle the dials, using a process called Stochastic Gradient Descent (SGD) (think a game of hotter-colder), until the network is getting the answers right.

Similarly, for 3D shapes, sometimes we can say if a shape is good or bad, but don't know exactly what it should look like. A fun example of this is those bridge building games. The player has to build a structure to allow a vehicle to cross a chasm. It's easy to check if the vehicle makes it, but knowing what the bridge should look like is the challenge. There are so many possible bridges that a brute-force search would take too long. Using a neural network lets us use the same dial-twiddling-hotter-colder approach. The technical name for this process of finding a shape is Topology Optimisation. Using a neural network to represent the shape was suggested by Hoyer, et. al. (2019)

How?

There are actually a lot of to represent parts using a fixed number of parameters.

Some early approaches used grey-scale images. There are a fixed number of pixels, the values range between 0 and 1 continuously. By tweaking the pixel values you can draw any shape. This works in 3D as well, you just have a 3D grid of voxels. This approach does have some drawbacks such as pixelation. A diagonal line will have a staircase edge if you zoom in.

Gaussian Splatting is a popular technique at the moment. 3D scenes are reconstructed by positioning (splatting) blobs of a mathematically defined shape (Gaussians). It's sort of like 3D pointillism art, but the size of the blobs is varied. Splatting meets our need of a fixed number of continuous parameters. There are a fixed number of splats, each has a position, rotation, size, colour, transparency, etc. values. These are twiddled until the scene matches reference photos.

However, the technique this post focuses on is called a neural representation. We create a network that takes a position (x, y, z) (3 values) as an input. It outputs a single number, the shortest distance to the surface of the shape from the input point. For example, for a point that was on the surface of our 3D shape, the output should be 0. For a point 2 cm away from a flat surface, the output would be -2. The network is predicting the signed distance field. By checking across a grid of points we can work out the boundary of the shape.

What?

So what does that get us? Lets' go back to how we'd solve the bridge game. We know that we need to connect the sides of the chasm so we can check the points directly between the edges is above zero. We also know that it must be strong enough to support the weight of the vehicle. This is a little more complicated. We can do a stress analysis (using Finite Element Analysis or FEA) to check the stress is low enough everywhere in our part. There is a lot of software available to do this sort of analysis. However, there is a problem. We not only need the analysis to give us the value (what all current software does), but to also give us the hotter-colder signal (gradients). For simple cases (linear-elastic, static cases) it's not too hard to implement the finite element analysis using a machine learning library like jax so that we also get the gradients.

So?

What does this all mean? Large Language Models are the poster-child for a new way of working. Vibe-coding is still controversial, but as the models improve more and more software developers are reporting they get more done using them. This is Karpathy's "Software 2.0" where developers are not limited by what they can make, but by what they can verify. Other areas of engineering like civil and mechanical are no where near the tipping point we see in software. The earlier image-based approaches I've mentioned have been around since the 1970's. There are a whole host of reasons why engineers still tend to design themselves. Briefly, these include being locked into to proprietary tools like CAD (Computer Aided Design) systems, stress analysis being more expensive (computationally) than checking the label of an image (cat? true/false), and that the 3D shape isn't the end of the story (the engineer still has to figure out how to manufacture it). I think they'll be slow to catch up, but I'm optimistic we'll see more tools for engineers to describe their objectives and use algorithms to search for solutions.