if a TM receives an input of size X, does it necessarily mean that she ran X steps (for example when the input was written on the tape), or we only consider the running time itself after the input was written on the tape (in some magically way..)?

No, the model we defined goes as follows: At step 0, the entire "x" is already written on the tape and the tape elsewhere is blank.
You can choose to do whatever you wish, including to not read any bit of x, but note that you need at least |x| steps to read the entire x.
Specifically, observe that our transition function does not depend on a specific input that is given to us (like in DFA or PDA), but on the
content of the cell. We can disregard it if we want…