Amy R. Johnson

Troubleshooting Performance in Ruby, Part 2

Once we knew what the source of the performance issues was, my mentor and I could begin optimizing. From our initial benchmarking, it was clear the get_pictures function needed some work. This is the function responsible for requesting photos with the given search term from Google’s image search API, then downloading them and reading them into memory. Here’s the function before optimization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
 #takes 59.405176 seconds

  def get_pictures
      @photo_tiles = ImageList.new
      limit = 50
      photos_per_query = 10
      api_key = API_KEY
      id = SEARCH_ENGINE_ID
      start = 1
      (1..limit).step(photos_per_query) { |start|
        source = "https://www.googleapis.com/customsearch/v1?q=#{search_term}&cx=#{id}&num=#{photos_per_query}&searchType=image&key=#{api_key}"
        data = JSON.load(open(source))
        (0...photos_per_query).each {|i|
            @photo_tiles.read(data["items"][i]["link"])
        }
      } rescue 'no more images found'
  end

  

The first thing we noticed by browsing through the links Google returned was that some of these images were huge! Since the images are being resized to be used as small tiles in a photo mosaic, the extra time downloading larger images is completely wasted. One option to resolve this issue would be to check the size of the photo before downloading it, but thankfully Google provides size range parameters to restrict the result set to small or medium photos so no checks are necessary. Since we settled on a photo tile size of 10x10 pixels, even the thumbnails worked for our purposes.

We also noticed as the function ran it would seem to get hung up on individual photos. Visiting those links sent us to non-responsive or slow pages. To combat that issue we added a rescue clause to the read statement, so the program would continue on in the case of a bad link.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  def get_pictures
      @photo_tiles = ImageList.new
      limit = 50
      photos_per_query = 10
      api_key = API_KEY
      id = SEARCH_ENGINE_ID
      start = 1
      (1..limit).step(photos_per_query) { |start|
        source = "https://www.googleapis.com/customsearch/v1?q=#{search_term}&cx=#{id}&num=#{photos_per_query}&searchType=image&key=#{api_key}"
        data = JSON.load(open(source))
        (0...photos_per_query).each {|i|
            @photo_tiles.read(data["items"][i]["link"])
        }
      } rescue 'no more images found'
  end

Running the whole search process with benchmarking now gives:

1
2
3
4
5
6
$ i.complete_search(10, 10)

=> Getting pictures took 14.296446
resizing pictures took 0.054123
getting picture colors took 0.007894
search took 14.361774

Simply changing the picture size and adding a rescue clause reduced the total time by 75%! However, this example is for a small number of total tiles. My mentor and I decided we could reduce the total time even further by adding more threads to the download process. That way, instead of waiting for each image to download completely before moving on to the next one, they can be downloaded concurrently. Since the download of one image doesn’t depend on any other one, there’s no need to download them one by one. Using more threads speeds up the amount of time the download portion takes.

However, reading the images once they’ve been downloaded has to be done one by one. In the original function the files are being downloaded and read in one line, so we need to separate the downloading from the reading in order to use threading.

For each photo, we create a new thread (Thread.new) to open the image from a link and read it into a temporary file. Then, since all the images need to be finished downloading before we read them, the threads need to be joined using thread.join. Finally, we loop through the temporary files we created and read each one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def get_pictures
  @photo_tiles = ImageList.new
  limit = 50
  photos_per_query = 10
  api_key = API_KEY
  id = SEARCH_ENGINE_ID
  timeout_in_seconds = 1
  threads = []
  start = 1
  (1..limit).step(photos_per_query) { |start|
    source = "https://www.googleapis.com/customsearch/v1?q=#{self.search_term}&cx=#{id}&num=#{photos_per_query}&searchType=image&start=#{start}&key=#{api_key}&imageSize=Medium&fileType=jpg"
    data = JSON.load(open(source))
    (0...photos_per_query).each {|i|
        threads << Thread.new { 
          open("tmp/image_#{start}_#{i}", 'wb') do |file|
            file << open(data["items"][i]["image"]["thumbnailLink"]).read rescue puts 'Cannot read image'
          end
        }
        
    }
  } rescue 'no more images found'

  threads.each do |t|
     t.join
  end

  (1..limit).step(photos_per_query) do |start|
     (0...photos_per_query).each do |i|
        @photo_tiles.read("tmp/image_#{start}_#{i}") rescue puts 'Cannot read image'
     end
  end
end

Now let’s check our performance:

1
2
3
4
5
6
7
8
9
10
11
12
$ i.make_mosaic(50, 50, 10, 10)

=> getting pixels took 0.959601
Getting pictures took 4.384112
resizing pictures took 0.055782
getting picture colors took 0.005212
search took 4.448468
matching pixels with image tiles took 0.087612
putting tiles in order took 0.089187
ordering photos took 0.007696
making mosaic took 0.290649
total time 5.883285

Dividing the work among multiple threads reduced the time the get_pictures function took from 14 seconds to 4 seconds. The whole mosaic operation now takes only 6 seconds, down from an original 69. However, this is still for only a small number of photo tiles. Larger mosaics with higher resolution tiles might require more performance improvements.

To learn more about threads in Ruby check out this tutorial from Sitepoint.