Inputting Image Data into TensorFlow for Unsupervised Deep Learning

Standard

Everyone is looking towards TensorFlow to begin their deep learning journey. One issue that arises for aspiring deep learners is that it is unclear how to use their own datasets. Tutorials go into great detail about network topology and training (rightly so), but most tutorials typically begin with and never stop using the MNIST dataset. Even new models people create with TensorFlow, like this Variational Autoencoder, remain fixated on MNIST. While MNIST is interesting, people new to the field already have their sights set on more interesting problems with more exciting data (e.g. Learning Clothing Styles).

In order to help people more rapidly leverage their own data and the wealth of unsupervised models that are being created with TensorFlow, I developed a solution that (1) translates image datasets into a file structured similarly to the MNIST datasets (github repo) and (2) loads these datasets for use in new models.

To solve the first part, I modified existing solutions that demonstrate how to decode the MNIST binary file into a csv file and allowed for the additional possibility of saving the data as images in a directory (also worked well for testing the decoding and encoding process):

import numpy as np
# import cv2
import Image
import gzip
import os
import struct 

def _read32(bytestream):
  dt = np.dtype(np.uint32).newbyteorder('>')
  return np.frombuffer(bytestream.read(4), dtype=dt)

def _read8(bytestream):
  dt = np.dtype(np.uint8)
  return np.frombuffer(bytestream.read(1), dtype=dt)

def decode(imgf, outf, n, imgs=False):
  """
  Given a binary file, convert information into a csv or a directory of images

  If imgs is true, outf is a directory that will hold images
  directory MUST exist before this
  If imgs is false, outf will be a csv file
  """
  with gzip.open(imgf) as bytestream:

    if imgs:
      savedirectory = outf
    else:
      o = open(outf, "w")

    magic_num = _read32(bytestream)[0]
    num_images = _read32(bytestream)[0]
    num_rows = _read32(bytestream)[0]
    num_cols = _read32(bytestream)[0]

    images = []

    for i in range(n):
      image = []
      for j in range(num_rows * num_cols):
        image.append(_read8(bytestream)[0])
      images.append(image)

    if imgs:
      for j,image in enumerate(images):
        saveimage = np.array(image).reshape((num_rows,num_cols))
        result = Image.fromarray(saveimage.astype(np.uint8))
        result.save(outf + str(j) + '.png')
        # cv2.imwrite(outf + str(j) + '.png', np.array(image).reshape((num_rows,num_cols)))
    else:
      for image in images:
        o.write(",".join(str(pix) for pix in image)+"\n")
      o.close()

I then reversed the process, making sure to pay attention to how these files were originally constructed (here, at the end) and encoding the information as big endian:

def encode(imgd, imgf, ext='.png'):
  """
  Given a set of black white images (that probably have 3 channels), convert
  it into a gzipped file that has all the information that is in standard
  MNIST files

  imgd = give the folder extension. Must have the trailing '/' in the string
  imgf = the filename for saving
  Provide the extension of the images
  """
  fs = [imgd + x for x in np.sort(os.listdir(imgd)) if ext in x]
  num_imgs = len(fs)
  o = open(imgf, "w")

  # Write items in the header
  # Magic Number for train/test images
  o.write(struct.pack('>i', 2051))
  # Number of images
  o.write(struct.pack('>i', num_imgs))

  # Load the first image to get dimensions
  im = np.asarray(Image.open(fs[0]).convert('L'), dtype=np.uint32)
  # im = cv2.imread(fs[0])[:,:,0] # images must be one dimensional grayscale
  r,c = im.shape 

  # Write the rest of the header
  o.write(struct.pack('>i', r)) # Number of rows in 1 image
  o.write(struct.pack('>i', c)) # Number of columns in 1 image

  # For each image, record the pixel values in the binary file
  for img in range(num_imgs):
    # For opencv, images must be one dimensional grayscale
    # im = cv2.imread(fs[img])[:,:,0]
     im = np.asarray(Image.open(fs[img]).convert('L'), dtype=np.uint32)
     for i in xrange(im.shape[0]):
for j in xrange(im.shape[1]):
         o.write(struct.pack('>B', im[i,j]))

  # Close the file
  o.close()

  # Gzip the file (as this is used in encoding)
  f_in = open(imgf)
  f_out = gzip.open(imgf + '.gz', 'wb')
  f_out.writelines(f_in)
  f_out.close()
  f_in.close()
  os.remove(imgf)

The TensorFlow tutorials allow you to import the MNIST data with a simple function call. I modified the existing input_data function to support the loading of training and testing data that have been created using the above procedure: input_any_data_unsupervised. When using this loading function with previously constructed models, remember that it does not return labels and those lines should be modified (e.g. the call to next_batch in line 5 of this model).

A system for supervised learning would require another encoding function for the labels and a modified input function that uses the gzipped labels.

I hope this can help bridge the gap between starting out with TensorFlow tutorials with MNIST and diving into your own problems with new datasets. Keep deeply learning!