The OpenImages dataset from Google is comprised of 675,000 photos scraped from Flikr that were all released under the Creative Commons Attribution license. There are typically 3 to 10 images on a Wikipedia page so there will be about 9 to 30 images in total coming down. I use the pyfileobj() function in Python to download the image files. I use Goldsmith’s Wikipedia search API to find the top 3 pages related to the text query and gather the image descriptions using the CommonsAPI on the Magnus Toolserver. Most of them are released with permissive rights, like the Creative Commons Attribution license. The Wikimedia Commons has over 73 million JPEG files. For more information about how CLIP works, check out my article, here. The CLIP model was pre-trained on 40 million pairs of images with text labels such that the embeddings encoded from the images will be similar to the embeddings encoded from the text labels. The CLIP system accomplishes two functions, encoding both text and images into “embeddings”, which are strings of numbers that represent the gist of the original data. I use OpenAI’s CLIP to perform a semantic search. The background images are pulled from two sources, the Wikimedia Commons and the OpenImages dataset. In the age of the Internet, the term meme has been narrowed to mean a piece of content, typically an image with a funny caption, that’s spread online via social media. The Wiktionary defines the word meme as “any unit of cultural information, such as a practice or idea, that is transmitted verbally or by repeated action from one mind to another in a comparable way to the transmission of genes.” The term originated in Richard Dawkins’ book, The Selfish Gene. Meme by AI-Memer, Image by Atsuko Sato, Caption by OpenAI GPT-3, License: CC BY-SA 4.0 What are memes, again? The user selects the best caption to create the new meme, which can be downloaded. Either the GPT-3 model from OpenAI or the GPT-Neo model from EleutherAI is used to generate 10 possible captions. The user checks out the top 10 images that match the query and selects their favorite. I then perform a semantic search on the images. ![]() A semantic search looks for matching concepts, not just a word search. I use the CLIP encoders from OpenAI to first perform a semantic search on the text descriptions. Both datasets have corresponding text descriptions of the images. The system then checks for matching images in Wikimedia Commons and the OpenImages dataset. The user starts by entering a search query to find a background image, like “apple pie”. ![]() AI-Memer Components, Diagram by Author, pie photo by W.carter
0 Comments
Leave a Reply. |