百科问答小站 logo
百科问答小站 font logo



使用pytorch时,训练集数据太多达到上千万张,Dataloader加载很慢怎么办? 第1页

  

user avatar   fang-niu-wa-28-17 网友的相关建议: 
      

下面是我见到过的写得最优雅的,预加载的dataloader迭代方式可以参考下:

使用方法就和普通dataloder一样 for xxx in trainloader .

主要思想就两点 , 第一重载 _iter 和 next_ ,第二点多线程异步Queue加载

       import numbers import os import queue as Queue import threading  import mxnet as mx import numpy as np import torch from torch.utils.data import DataLoader, Dataset from torchvision import transforms   class BackgroundGenerator(threading.Thread):     def __init__(self, generator, local_rank, max_prefetch=6):         super(BackgroundGenerator, self).__init__()         self.queue = Queue.Queue(max_prefetch)         self.generator = generator         self.local_rank = local_rank         self.daemon = True         self.start()      def run(self):         torch.cuda.set_device(self.local_rank)         for item in self.generator:             self.queue.put(item)         self.queue.put(None)      def next(self):         next_item = self.queue.get()         if next_item is None:             raise StopIteration         return next_item      def __next__(self):         return self.next()      def __iter__(self):         return self   class DataLoaderX(DataLoader):     def __init__(self, local_rank, **kwargs):         super(DataLoaderX, self).__init__(**kwargs)         self.stream = torch.cuda.Stream(local_rank)         self.local_rank = local_rank      def __iter__(self):         self.iter = super(DataLoaderX, self).__iter__()         self.iter = BackgroundGenerator(self.iter, self.local_rank)         self.preload()         return self      def preload(self):         self.batch = next(self.iter, None)         if self.batch is None:             return None         with torch.cuda.stream(self.stream):             for k in range(len(self.batch)):                 self.batch[k] = self.batch[k].to(device=self.local_rank,                                                  non_blocking=True)      def __next__(self):         torch.cuda.current_stream().wait_stream(self.stream)         batch = self.batch         if batch is None:             raise StopIteration         self.preload()         return batch   class MXFaceDataset(Dataset):     def __init__(self, root_dir, local_rank):         super(MXFaceDataset, self).__init__()         self.transform = transforms.Compose(             [transforms.ToPILImage(),              transforms.RandomHorizontalFlip(),              transforms.ToTensor(),              transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),              ])         self.root_dir = root_dir         self.local_rank = local_rank         path_imgrec = os.path.join(root_dir, 'train.rec')         path_imgidx = os.path.join(root_dir, 'train.idx')         self.imgrec = mx.recordio.MXIndexedRecordIO(path_imgidx, path_imgrec, 'r')         s = self.imgrec.read_idx(0)         header, _ = mx.recordio.unpack(s)         if header.flag > 0:             self.header0 = (int(header.label[0]), int(header.label[1]))             self.imgidx = np.array(range(1, int(header.label[0])))         else:             self.imgidx = np.array(list(self.imgrec.keys))      def __getitem__(self, index):         idx = self.imgidx[index]         s = self.imgrec.read_idx(idx)         header, img = mx.recordio.unpack(s)         label = header.label         if not isinstance(label, numbers.Number):             label = label[0]         label = torch.tensor(label, dtype=torch.long)         sample = mx.image.imdecode(img).asnumpy()         if self.transform is not None:             sample = self.transform(sample)         return sample, label      def __len__(self):         return len(self.imgidx)     




  

相关话题

  计算商品embedding然后平均得到用户embedding,会不会存在这种问题? 
  如何评价FAIR提出的ConvNeXt:CNN匹敌Swin Transformer? 
  Batch Normalization 训练的时候为什么不使用 moving statistics? 
  机器学习领域是否已经达到饱和? 
  领域自适应需要用到测试集数据,这样的方法有啥意义呢? 
  深度学习应用在哪些领域让你觉得「我去,这也能行!」? 
  把某人的 DNA 序列作为输入,正面照片作为输出,丢到深度神经网络里面学习,可行吗? 
  阿里的TDM树深度模型为什么很少有人用,是有哪些问题吗? 
  机器学习的理论方向 PhD 是否真的会接触那么多现代数学(黎曼几何、代数拓扑之类)? 
  从应用的角度来看,深度学习怎样快速入门? 

前一个讨论
如何看待上海市科委、中科院上海有机所和观视频联合制作的科普微电影《无处不在的手性之有机师姐》?
下一个讨论
表哥说机械比计算机经管都好,如何看待他的言论?





© 2025-02-27 - tinynew.org. All Rights Reserved.
© 2025-02-27 - tinynew.org. 保留所有权利