1. add docker-npu (Dockerfile and docker-compose.yml) 2. move cuda docker to docker-cuda and tiny changes to adapt to the new path