for memcpy, memmove, etc
some design choices:
some silly and not so silly challenges:
[moved in description]
the instruction could be restricted to sizes known statically
Why? That would reduce its utility by a lot and I don't think it's actually necessary for the implementation.
Another note is that the differences between memcpy and memmove will have design consequences.