{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 5: Retrieval\n", "\n", "To complete the exercise, follow the instructions and complete the missing code and write the answers where required. All points, except the ones marked with **(N points)** are mandatory. The optional tasks require more independet work and some extra effort. Without completing them you can get at most 75 points for the exercise (the total number of points is 100 and results in grade 10). Sometimes there are more optional exercises and you do not have to complete all of them, you can get at most 100 points." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In the first part of this exercise you will implement some indexing operations used to retrieve information from a corpus of text documents. As the size of readily available text documents grows beyond all measures, methods for fast and user-friendly querying of information from text files are needed.\n", "\n", "Then we will look at some techniques for retrieval of images. We will see how images can be queried using similar techniques than text documents. The exercise concludes with an assigment on evaluation of retrieval systems.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment 1: Text retrieval\n", "\n", "The first assignment will address text retrieval. To make initial processing easy, we will be using NLTK library that you should install if you do not have it on your system." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# General import\n", "import re, math, sys\n", "import numpy as np\n", "\n", "# NLTK imports\n", "import nltk\n", "from nltk.corpus import stopwords, gutenberg\n", "from nltk.tokenize import word_tokenize, sent_tokenize\n", "\n", "PYTHONIOENCODING=\"UTF-8\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **nltk** library includes mechanisms for loading a number of different text corpuses. Load the **Gutenberg corpus** as well as some other resources by using the function ``nltk.download()``. We also download the image material that we will use later in the classical manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run this cell to download the data used in this exercise - the image dataset\n", "import zipfile, urllib.request, io\n", "zipfile.ZipFile(io.BytesIO(urllib.request.urlopen(\"http://data.vicos.si/lukacu/multimedia/exercise5.zip\").read())).extractall()\n", "\n", "# Download Gutenberg corpus\n", "nltk.download('gutenberg')\n", "# Download Punkt Tokenizer Models\n", "nltk.download('punkt')\n", "# Download Stop Words\n", "nltk.download('stopwords')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, with the help of the NLTK Book, familiarize yourself with the contents and structure of the corpus. If at any point of the exercise you find the Gutenberg corpus too small or otherwise unsuitable for your needs you are welcome to try other corpuses available on the internet." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from nltk.corpus import gutenberg\n", "\n", "# TODO: Show structure of Gutenberg corpus\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# TODO: familiarize/experiment with the content of Gutenberg corpus\n", "# e.g. show the number of words, display first few words, ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Preprocess the raw data of each document in the Gutenberg corpus using the nltk inbuilt tokenizer. You can use the function ``nltk.word_tokenize()``. Remove stop words and punctuation to further reduce the amount of data you will need to process later on." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# TODO\n", "# Write a function that recives an array of words obtained from tokenized file\n", "# as an input parameter and returns an array of filtered words" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# TODO\n", "# Preprocess raw data of documents in Gutenberg corpus (use build in tokenizer 'word_tokenize')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * In order to simplify and speed up operations on sets of documents, they need to be indexed. This allows a fast look up if and where a query word or phrase appears in our corpus. Build an inverted index. For each unique token appearing in your corpus, make a list of the documents it appears in." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# TODO: Write an inverted index structure\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Index Gutenberg corpus documents and make some queries using Boolean logic (operation AND is an intersection of lists, operation OR is an intersection). Building the entire index takes a lot of time, test your structure on a smaller set (either just a few 100 characters from each document or just three documents), use the entire set when you know that everything works.\n", "\n", " For query \"encompass AND furies\" the system should return documents: 'whitman-leaves.txt', 'milton-paradise.txt'\n", "\n", " For query \"highbury OR behoves\" the system should return documents: 'austen-emma.txt', 'milton-paradise.txt'\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# TODO: Build an index and make some queries\n", "#for k, v in index.index.items():\n", "# if len(v) < 10: \n", "# print(k)\n", "# encompass, behoves, furies\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Extend your index with a stopword list and a stemmer for faster and more informative retrieval. Since we are assuming that documents are written in English, you can use `nltk.corpus.stopwords.words('english')` to get a list of stopwords. You can use Porter2, aslo known as Snowball stemmer `from nltk.stem.snowball.EnglishStemmer`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Build index with stopword support and a stemmer, demonstrate its use with some examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * **(5 points)** To allow querying of tokens that occur together (i.e. common phrases), a positional index can be used. Build a positional index on your corpus. This is an extension of the inverted index where each of the list elements containing the document index also stores a list of positions in the document where the token appears. Make sure you properly removed stop words in order to keep your computation relatively fast.\n", "\n", " Use the positional index to query phrases. That is, return the positions in documents where each of the words in your phrase occurs at approximately the same position in the document." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# TODO: Build positional index, demonstrate its use with some examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * The relevance of the documents in your corpus to the user's query can be measured by the term frequency, i.e. how many times the query term appears in each document. Extend your index structure so that it stores the number of appearances of each token in each document as well as the number of the documents in which a certain term appears." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# TODO: Count the number of appearances of each token in each document\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The absolute number of appearances is biased, therefore a different metric called **TF-IDF** (short for term frequency\u2013inverse document frequency) is commonly used to rank the relevance of documents containing a query. TF-IDF is computed as follows:\n", "\n", "\\begin{equation}\n", "\\mathrm{tfidf}_{t,d} = \\mathrm{tf}_{t,d} \\cdot \\mathrm{idf}_t,\n", "\\end{equation}\n", "\n", "where $\\mathrm{tf}_{t,d}$ is the frequency of the term $t$ in document $d$ and $\\mathrm{idf}_{t}$ equals to\n", "\n", "\\begin{equation}\n", "\\log_{10}{\\frac{N}{\\mathrm{df}_t}},\n", "\\end{equation}\n", "\n", "where $N$ is the number of documents in the corpus and $\\mathrm{df}_t$ is the number of documents of the corpus in which the term $t$ appears.\n", "\n", "Implement a system that returns the first $5$ most relevant documents from the corpus given a query. Note that your queries can contain more than one word. The score in that case is calculated as\n", "\n", "\\begin{equation}\n", "s(q,d) = \\sum_{t \\in q}{\\mathrm{tfidf}_{t,d}},\n", "\\end{equation}\n", "\n", "where $q$ is your query. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Write a function that returns the most relevant documents for a given query.\n", "\n", " For query \"lie AND reason AND book\" the five most relevant documents should be: bible-kjv, melville-moby_dick\n", "whitman-leaves, edgeworth-parents, chesterton-brown." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# TODO: Return 5 most relevant documents from the corpus for a given query\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * **(5 points)** Implement a system for handling typographical errors of queries on the user's part, use Levenshtein string distance (implemeted in NLTK as `nltk.metrics.distance.edit_distance`) to find the best match in your token list for query elements that are not contained in the corpus. Show that your system returns relevant results for misspelled queries.\n", "\n", " Demonstrate your spelling correction on a few examples, you can use Gutemberg dataset and assume that the language is English." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# TODO: Handle typographical errors in user's query and return relevant results for misspelled queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment 2: Image retrieval\n", "\n", "**Important:** In this assignment you will implement several image retrival systems. The input to the system is a query image and the system should return the images in the database sorted by similarity to the query image. For each approach you will have to extract **a feature vector** from all the images, then compare your query image to all of the database images using the appropriate **distance measure**. You will also have to compare the performance of different features over all the images in your dataset by calcualting an $N \\times N$ similarity matrix.\n", "\n", "**Hint:** You can calculate feature vectors for all images in advance, save them to a file and then use a stored matrix of feature vectors for the rest of the exercise. This approach will save you a lot of time, especially towards the end of the exercise. Do not say I did not warn you!\n", "\n", "You will test your retrieval system on a dataset that is based on the Caltech 101 dataset. It consists of approximately 5000 color images from 83 classes (some classes have been removed, others have been reduced in size to make all classes more balanced). You can use the funtion prepare_dataset to extract the images and their corresponding classes. The function returns a list of images along with the corresponding class labels. You can modify it to only extract (or ignore) specific classes and/or to resize the images for faster processing if you wish. For developing the system you are advised to use a smaller subset of classes with a smaller number of samples to speed up the computation (or alternatively, a subset of easier classes, e.g. accordion, faces, strawberry etc.)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def prepare_dataset(n_samples = 10):\n", " # Loads images from caltech_101 dataset and stores their classes\n", " # uses n_samples images from each class\n", " # only uses color images from each class\n", " \n", " # List of images\n", " images_list = []\n", " \n", " # Corresponding list of classes\n", " classes_list = []\n", " \n", " categories = os.walk('images')\n", "\n", " is_first = 1\n", " for subdir_info in categories:\n", " if(is_first):\n", " is_first = 0\n", " \n", " else:\n", " dir_name = subdir_info[0].split('/')\n", " \n", " images = [f for f in os.listdir(subdir_info[0]) if os.path.isfile(os.path.join(subdir_info[0], f))]\n", "\n", " counter = 0\n", " for image in images:\n", " data = io.imread(os.path.join(subdir_info[0], image))\n", " if(len(data.shape) == 3):\n", " if(counter < n_samples):\n", " images_list.append(image)\n", " classes_list.append(dir_name[-1])\n", " counter += 1\n", " \n", " return images_list, classes_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Implement a feature extractor based on color histograms. Write a function that computes 3D color histograms for all images in your database. Use the numpy function histogramdd in the RGB color space, then reshape the $3$D histograms to $1$D histograms and stack them together in a $2$D matrix (the matrix is composed in a way that the $i$-th row includes the histogram of the $i$-th image).\n", "\n", " Write a script that tests your system by loading the database (use $8$ bins per color channel), using the fifth image in the database as a reference image." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Example usage of \"histogramdd\" function:\n", "#\n", "# Load an image\n", "example_image = io.imread(\"images/airplanes/image_0001.jpg\")\n", "# Reshape image of size (h, w, 3) to (h * w, 3)\n", "example_image_reshaped = np.reshape(example_image, (example_image.shape[0] * example_image.shape[1], 3))\n", "# Compute histograms for each color channel (and normalise it)\n", "H, _ = np.histogramdd(example_image_reshaped, bins = 8, normed = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * To compute the distance between the reference histogram and every other histogram in the database you will use Hellinger distance that is defined as:\n", "\n", " \\begin{equation}\n", " H(\\mathbf{h}_1,\\mathbf{h}_2) = \\sqrt{ \\frac{1}{2} \\sum_{i=0}^{N-1} \\Big( \\sqrt{h_1(i)} - \\sqrt{h_2(i)} \\Big)^2 }.\n", " \\end{equation}\n", "\n", " Note that low values of Hellinger distances signify high similarity while high values signify low similarity. This is exactly the opposite what is expected by the ROC curve algoritm (that you will use in the next assignment). This can be fixed by redefining the histogram distance measure. If we assume that $H(\\mathbf{h}_1,\\mathbf{h}_2)$ denotes the Helliner distance between histograms $\\mathbf{h}_1$ and $\\mathbf{h}_2$, we can define the new distance simply as\n", "\n", " \\begin{equation}\n", " \\rho(\\mathbf{h}_1,\\mathbf{h}_2) = 1 - H(\\mathbf{h}_1,\\mathbf{h}_2).\n", " \\end{equation}\n", "\n", " Compute the distances to all other images, sort the distances and display the first five matches and the reference image in a same figure." ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Implement a system that uses normalized cross-correlation of grayscale images. The normalized cross-correlation between two sequences of same size $\\mathbf{X}$ and $\\mathbf{Y}$, denoted as $NCC(\\mathbf{X},\\mathbf{Y})$ is defined as scalar product between normalized sequences:\n", "\n", " \\begin{equation}\n", " NCC(\\mathbf{X},\\mathbf{Y}) = \\frac{1}{N} \\frac{\\sum (x_i - \\bar{x}) (y_i - \\bar{y})}{ \\Big(\\sqrt{ \\frac{1}{N} \\sum (x_i - \\bar{x})^2 } \\Big) \\Big(\\sqrt{ \\frac{1}{N} \\sum (y_i - \\bar{y})^2 } \\Big) }.\n", " \\end{equation}\n", "\n", " where $\\bar{x}$ and $\\bar{y}$ denote the mean values of the elements in the sequences. Sequences are more similar if the correlation is higher. More information is available here.\n", " \n", " **Note:** One can only compute normalized cross-correlation for grayscale images of the same size. Since our database contains images of different sizes we have to first resize and/or crop images to get them to the reference size (the choice is up to you, but make it larger thatn 64x64 pixels). Then convert them to grayscale and reshape them into vectors of intensity values. \n", " \n", " Repeat the testing of the system in the same manner than in the previous task, load images, convert them to grayscale, select a reference image, compute the distances and display the first five matches. What is the system sensitive to? \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Implement a retrieval system that uses CNN features to calculate image similarity. You will need the PyTorch library and a pretrained neural network of your choice. Note that architectures of the models are different and you need to check the model documentation to find out which layer contains useful features (but generally it is the penultimate layer). Its features should be one dimensional vectors that you can compare using the Hellinger distance. Plot the ROC curve for the system and comment on its performance related to the NCC and color histogram systems. Help yourself with the sample code below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Get pretrained ALEXNET model (you can also use other models)\n", "# Full list of pretrained models is available at: https://pytorch.org/docs/stable/torchvision/models.html\n", "model = models.alexnet(pretrained=True)\n", "\n", "# Remove last fully-connected layer (Second to last layer usually contains useful features)\n", "new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])\n", "model.classifier = new_classifier\n", "\n", "# Read an image\n", "image = io.imread(\"images/airplanes/image_0001.jpg\")\n", "\n", "# Image transforms (for preprocessing)\n", "# Note: You must check the documentation of your chosen model to see what kind of \n", "# preprocessing procedure is required \n", "transform = transforms.Compose([\n", " transforms.ToPILImage(),\n", " transforms.Resize(256), # Resize the image to 256x256\n", " transforms.CenterCrop(224), # Central crop the image to 224x224\n", " transforms.ToTensor(), # Convert it to a tensor\n", " transforms.Normalize(mean=[0.485, 0.456, 0.406], # Subtract mean value\n", " std=[0.229, 0.224, 0.225]) # And normalize it using the std value\n", "])\n", "\n", "# Preprocess the input image and prepare a batch to be passed to the model\n", "image_preprocessed = transform(image)\n", "inference_batch = torch.unsqueeze(image_preprocessed, 0)\n", "\n", "# Put the model in EVAL (inference) mode\n", "model.eval()\n", "\n", "# Perform the inference on the given batch\n", "feature_vector = model(inference_batch)\n", "\n", "# Show values of the feature_vector tensor\n", "print(feature_vector)\n", "# Show the shape of the feature_vector tensor\n", "# Note: It should be a one dimensional tensor/vector\n", "print(feature_vector.shape) \n", "\n", "# Convert tensor vector to numpy\n", "# Use this to check image similarities\n", "feature_vector_numpy = feature_vector.detach().numpy()\n", "\n", "# Uncomment to print the non-truncated feature vector\n", "# with np.printoptions(threshold=np.inf):\n", "# print(feature_vector_numpy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * (5 points) Extend the retrieval method based on color histograms by including spatial information. Divide the input image into subregions, calculate the histogram for reach region, then concatenate the histograms into a final feature vector. Experiment with different numbers of subregions and comment on the performance of the new method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: write your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * **(10 points)** Implement a bag-of-words (BoW) approach for image retrieval. First, extract local features from each image (you can use SIFT or a similar descriptor found in OpenCV library), then generate a codebook via a clustering method (as described here). The image feature vectors are then histograms of the codewords and can be used for comparison. You do not have to use the entire library to build a dictionary, use just a quater.\n", "\n", " See the paper for further info on implementation details for BoW image retrieval." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# TODO: write your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment 3: ROC analysis\n", "\n", "The purpose of this assignment is to learn the theory and practical use of ROC analysis. Therefore you should first read a paper about ROC curves (*if you do not have access to ScienceDirect from home, then you can use this link to obtain the paper*). While reading, pay special attention to sections $1$--$5$ and $7$--$8$. In the following tasks you will work on a given theoretical example of two classifiers ($C_1$ and $C_2$).\n", "\n", "We have a set of samples that we wish to classify in one of two classes and a ground truth class of each sample (denoted as $0$ and $1$). For each sample a classifier gives us a score based on which we can determine to which class should the sample belong to (score closer to $0$ means class $0$, score closer to $1$ means class $1$). Below are the results for $8$ samples, their ground truth values ($\\xi_\\mathrm{id}$) and the score values for both classifiers ($\\xi_{C_1}$ and $\\xi_{C_2}$).\n", "\n", "\\begin{equation}\n", " \\begin{array}{*{20}c}\n", " {\\xi_\\mathrm{id} = } & {\\left[ {} \\right.} & 1 & 0 & 1 & 1 & 1 & 0 & 0 & 0 & {\\left. {} \\right]} \\\\\n", " {\\xi_{C_1} = } & {\\left[ {} \\right.} & {0.5} & {0.3} & {0.6} & {0.22} & {0.4} & {0.51} & {0.2} & {0.33} & {\\left. {} \\right]} \\\\\n", " {\\xi_{C_2} = } & {\\left[ {} \\right.} & {0.04} & {0.1} & {0.68} & {0.22} & {0.4} & {0.11} & {0.8} & {0.53} & {\\left. {} \\right]} \\\\\n", " \\end{array}\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * For the example above calculate and draw the ROC curves (by hand) for classifier $C_1$ as well as classifier $C_2$. Also calculate the area under the curve (AUC) for both classifiers. For the classifier $C_1$ select a decision threshold (working points) $\\vartheta_{th1}=0.33$ and use it to calculate the confusion matrix and the $F$ measure score. Do the same thing for the classifier $C_2$ using a threshold value $\\vartheta_{th2}=0.1$.\n", "\n", " **Question:** Based on Fawcett theory decide which classifier is better in the selected working points and motivate your decision. Which working point is optimal for each classifier?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * You will now calculate ROC curves for combined classifiers. We have two binary classifiers, $C_1$ ($\\vartheta_{th1}=0.33$) and $C_2$ ($\\vartheta_{th2}=0.1$), which means that the classifier $C_1$ classifies a sample as class $1$, if its score is $\\xi_{C_1}(\\mathbf{x}_i) > \\vartheta_{th1}$, otherwise it classifies it as class $0$, similarly can be said about classifier $C_2$. The first combined classifier $C_3$ can be obtained as the intersection of decisions of the two basic classifiers, $C_3 = C_1 \\bigwedge C_2$ ($C_3$ classifies a sample as class $1$ if both basic classifiers classify it as class $1$). The second one can be obtained as an union of the decisions of the two basic classifiers, $C_4 = C_1 \\bigvee C_2$ ($C_4$ classifies a sample as class $1$ if at least one of the basic classifiers classifies it as class $1$). For each combined classifier calculate and plot its point in the ROC space together with the ROC curves from previous tasks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Use [sklearn.metrics.roc_curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) to calculate ROC cruves. Study the documentation and reproduce results for the classifiers from the previous task. To obtain AUC you can use [sklearn.metrics.roc_auc_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import roc_curve, roc_auc_score\n", "\n", "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * [Fawcett et al.](https://www.sciencedirect.com/science/article/pii/S016786550500303X) explains how to select an optimal working point for a classifier from a ROC curve. The threshold should be at the point on the curve that is the closest to the point $[0,1]$ in the ROC space. Write a function **get_wp** that calculates the optimal working point for the given ROC curve and returns the corresponding threshold value and $F$ score value as results. Test your code on classifiers $C_1$ and $C_2$. Based on the AUC decide which classifier is performing better over all threshold values, based on the $F$ score at the optimal working point decide which classifier is better at this point. Write a script that calculates and draws ROC curves for classifiers $C_1$ and $C_2$ on the same plot, for both classifiers mark the optimal working point and set the title of the plot to display values of the thresholds, $F$ scores and AUC for both classifiers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * The image retrieval systems can currently return distances between each image in the database to the reference image. If we want to build a binary classifier we have to determine the optimal threshold that can be used to decide if a classifier belongs to the class of the reference image or not. We will use ROC analysis that you have implemented in the previous assignment. Use the ROC curves to determine optimal threshold. You can use an image in the database as a reference image and compute distances to every other image in the dataset. If we select the first image we can generate a ROC curve for that image, however, this curve only tells us the properties of the system for this input and may not be generalizable. If we select a different image we can get a completely different ROC curve with a different optimal point. Instead, use the following procedure to compute the ROC curve over all images in the database.\n", "\n", " 1. Load the database.\n", " 2. For each image in the database:\n", " * Use the selected image as a reference image.\n", " * Compute the distance between the reference image and the remaining images in the database and store the distances to vector $\\vartheta_\\mathrm{score}$
(the vector must not include distance to the reference image itself).\n", " * Use the provided label ground truth data to compose a vector of binary ground truth for the given reference image by comparing it to the class of the reference image. Save the binary ground truth to vector $\\vartheta_\\mathrm{class}$.\n", " * Extend the overall vectors for scores and ground truth by appending the new data: $\\varphi_\\mathrm{class} = \\varphi_\\mathrm{class} + \\vartheta_\\mathrm{class}$ and $\\varphi_\\mathrm{score} = \\varphi_\\mathrm{score} + \\vartheta_\\mathrm{score}$. \n", " 3. Use the composed overall vectors $\\varphi_\\mathrm{score}$ and $\\varphi_\\mathrm{class}$ to compute the ROC curve.\n", " \n", " Evaluate all three systems by plotting their ROC curves and compare them. Determine the optimal threshold value, plot it in the ROC space (on top of the ROC curve). Do not forget to label the axes of the plot.\n", " \n", " **Note:** Pay attention to the values produced by individual distance measures, some measures produce lower values when samples are more alike, some produce higher values in such cases." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: write your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **(5 points)** Compare performance of several deep descriptors (e.g. ResNet family, VGG) using ROC analysis. Download pretrained networks and extract the feature extraction part of the model (remove the fully-connected layers at the end). It is a good idea to pre-compute the features for parts of the dataset and store them on the disk, then load them for further analysis to save time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: write your code here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }