We present an infinite family of protocols to distill magic states for $T$-gates that has a low space overhead and uses an asymptotic number of input magic states to achieve a given target error that is conjectured to be optimal. The space overhead, defined as the ratio between the physical qubits to the number of output magic states, is asymptotically constant, while both the number of input magic states used per output state and the $T$-gate depth of the circuit scale linearly in the logarithm of the target error $\delta$ (up to $\log \log 1/\delta$). Unlike other distillation protocols, this protocol achieves this performance without concatenation and the input magic states are injected at various steps in the circuit rather than all at the start of the circuit. The protocol can be modified to distill magic states for other gates at the third level of the Clifford hierarchy, with the same asymptotic performance. The protocol relies on the construction of weakly self-dual CSS codes with many logical qubits and large distance, allowing us to implement control-SWAPs on multiple qubits. We call this code the "inner code". The control-SWAPs are then used to measure properties of the magic state and detect errors, using another code that we call the "outer code". Alternatively, we use weakly-self dual CSS codes which implement controlled Hadamards for the inner code, reducing circuit depth. We present several specific small examples of this protocol.