Once we knew what the source of the performance issues was, my mentor and I could begin optimizing. From our initial benchmarking, it was clear the get_pictures
function needed some work. This is the function responsible for requesting photos with the given search term from Google’s image search API, then downloading them and reading them into memory. Here’s the function before optimization:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The first thing we noticed by browsing through the links Google returned was that some of these images were huge! Since the images are being resized to be used as small tiles in a photo mosaic, the extra time downloading larger images is completely wasted. One option to resolve this issue would be to check the size of the photo before downloading it, but thankfully Google provides size range parameters to restrict the result set to small or medium photos so no checks are necessary. Since we settled on a photo tile size of 10x10 pixels, even the thumbnails worked for our purposes.
We also noticed as the function ran it would seem to get hung up on individual photos. Visiting those links sent us to non-responsive or slow pages. To combat that issue we added a rescue clause to the read
statement, so the program would continue on in the case of a bad link.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Running the whole search process with benchmarking now gives:
1 2 3 4 5 6 |
|
Simply changing the picture size and adding a rescue clause reduced the total time by 75%! However, this example is for a small number of total tiles. My mentor and I decided we could reduce the total time even further by adding more threads to the download process. That way, instead of waiting for each image to download completely before moving on to the next one, they can be downloaded concurrently. Since the download of one image doesn’t depend on any other one, there’s no need to download them one by one. Using more threads speeds up the amount of time the download portion takes.
However, reading the images once they’ve been downloaded has to be done one by one. In the original function the files are being downloaded and read in one line, so we need to separate the downloading from the reading in order to use threading.
For each photo, we create a new thread (Thread.new
) to open the image from a link and read it into a temporary file. Then, since all the images need to be finished downloading before we read them, the threads need to be joined using thread.join
. Finally, we loop through the temporary files we created and read each one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Now let’s check our performance:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Dividing the work among multiple threads reduced the time the get_pictures
function took from 14 seconds to 4 seconds. The whole mosaic operation now takes only 6 seconds, down from an original 69. However, this is still for only a small number of photo tiles. Larger mosaics with higher resolution tiles might require more performance improvements.
To learn more about threads in Ruby check out this tutorial from Sitepoint.