8000 [PyTorch][TF] Add bounding-box by AnirudhDagar · Pull Request #1369 · d2l-ai/d2l-en · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[PyTorch][TF] Add bounding-box #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
8000
Diff view
27 changes: 24 additions & 3 deletions chapter_computer-vision/bounding-box.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,18 @@ from mxnet import image, npx
npx.set_np()
```

```{.python .input}
#@tab pytorch
%matplotlib inline
from d2l import torch as d2l
```

```{.python .input}
#@tab tensorflow
%matplotlib inline
from d2l import tensorflow as d2l
```

Next, we will load the sample images that will be used in this section. We can see there is a dog on the left side of the image and a cat on the right. They are the two main targets in this image.

```{.python .input}
Expand All @@ -24,18 +36,27 @@ img = image.imread('../img/catdog.jpg').asnumpy()
d2l.plt.imshow(img);
```

```{.python .input}
#@tab pytorch, tensorflow
d2l.set_figsize()
img = d2l.plt.imread('../img/catdog.jpg')
d2l.plt.imshow(img);
```

## Bounding Box

In object detection, we usually use a bounding box to describe the target location. The bounding box is a rectangular box that can be determined by the $x$ and $y$ axis coordinates in the upper-left corner and the $x$ and $y$ axis coordinates in the lower-right corner of the rectangle. We will define the bounding boxes of the dog and the cat in the image based on the coordinate information in the above image. The origin of the coordinates in the above image is the upper left corner of the image, and to the right and down are the positive directions of the $x$ axis and the $y$ axis, respectively.

```{.python .input n=2}
```{.python .input}
#@tab all
# bbox is the abbreviation for bounding box
dog_bbox, cat_bbox = [60, 45, 378, 516], [400, 112, 655, 493]
```

We can draw the bounding box in the image to check if it is accurate. Before drawing the box, we will define a helper function `bbox_to_rect`. It represents the bounding box in the bounding box format of `matplotlib`.

```{.python .input n=3}
```{.python .input}
#@tab all
#@save
def bbox_to_rect(bbox, color):
"""Convert bounding box to matplotlib format."""
Expand All @@ -50,6 +71,7 @@ def bbox_to_rect(bbox, color):
After loading the bounding box on the image, we can see that the main outline of the target is basically inside the box.

```{.python .input}
#@tab all
fig = d2l.plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue'))
fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));
Expand All @@ -63,7 +85,6 @@ fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));

1. Find some images and try to label a bounding box that contains the target. Compare the difference between the time it takes to label the bounding box and label the category.


:begin_tab:`mxnet`
[Discussions](https://discuss.d2l.ai/t/369)
:end_tab:
11 changes: 11 additions & 0 deletions d2l/tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,17 @@ def train_concise_ch11(trainer_fn, hyperparams, data_iter, num_epochs=2):
print(f'loss: {animator.Y[0][-1]:.3f}, {timer.avg():.3f} sec/epoch')


# Defined in file: ./chapter_computer-vision/bounding-box.md
def bbox_to_rect(bbox, color):
"""Convert bounding box to matplotlib format."""
# Convert the bounding box (top-left x, top-left y, bottom-right x,
# bottom-right y) format to matplotlib format: ((upper-left x,
# upper-left y), width, height)
return d2l.plt.Rectangle(
xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
fill=False, edgecolor=color, linewidth=2)


# Alias defined in config.ini
size = lambda a: tf.size(a).numpy()

Expand Down
11 changes: 11 additions & 0 deletions d2l/torch.py
Original file line number Diff line number Diff line change
Expand Up @@ -1208,6 +1208,17 @@ def init_weights(m):
print(f'loss: {animator.Y[0][-1]:.3f}, {timer.avg():.3f} sec/epoch')


# Defined in file: ./chapter_computer-vision/bounding-box.md
def bbox_to_rect(bbox, color):
"""Convert bounding box to matplotlib format."""
# Convert the bounding box (top-left x, top-left y, bottom-right x,
# bottom-right y) format to matplotlib format: ((upper-left x,
# upper-left y), width, height)
return d2l.plt.Rectangle(
xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
fill=False, edgecolor=color, linewidth=2)


# Alias defined in config.ini


Expand Down
0