8000 feat(demo): add multimodal hello-world by hanxiao · Pull Request #2002 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(demo): add multimodal hello-world #2002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .github/images/helloworld-multimodal.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/ma

## Jina "Hello, World!" 👋🌍

### Fashion Image Search

Just starting out? Try Jina's "Hello, World" - a simple image neural search demo for [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). No extra dependencies needed, simply run:

<a href="https://docs.jina.ai/">
Expand All @@ -57,13 +59,13 @@ Just starting out? Try Jina's "Hello, World" - a simple image neural search demo


```bash
jina hello mnist # more options in --help
jina hello fashion # more options in --help
```

...or even easier for Docker users, **no install required**:

```bash
docker run -v "$(pwd)/j:/j" jinaai/jina hello mnist --workdir /j && open j/hello-world.html
docker run -v "$(pwd)/j:/j" jinaai/jina hello fashion --workdir /j && open j/hello-world.html
# replace "open" with "xdg-open" on Linux
```

Expand All @@ -85,14 +87,29 @@ This downloads the Fashion-MNIST training and test dataset and tells Jina to ind
<img align="right" width="25%" src="https://github.com/jina-ai/jina/blob/master/.github/images/helloworld-chatbot.gif?raw=true" />
</a>

For NLP engineers, we provide a simple chatbot demo for answering Covid-19 questions. You will need PyTorch and Transformers, which can be installed along with Jina:
For NLP engineers, we provide a simple chatbot demo for answering Covid-19 questions. To run that,
```bash
pip install "jina[torch,transformers]"
pip install "jina[chatbot]"
jina hello chatbot
```

This downloads [CovidQA dataset](https://www.kaggle.com/xhlulu/covidqa) and tells Jina to index 418 question-answer pairs with DistilBERT. The index process takes about 1 minute on CPU. Then it opens a webpage where you can input questions and ask Jina.


### Multimodal Document Search

<a href="https://youtu.be/B_nH8GCmBfc">
<img align="right" width="25%" src="https://github.com/jina-ai/jina/blob/master/.github/images/helloworld-multimodal.gif?raw=true" />
</a>

A multimodality document contains multiple data types at the same time, e.g. a PDF document often contains figure and text. We provide a minimum multimodal document search demo. To run that,
```bash
pip install "jina[multimodal]"
jina hello multimodal
```

This downloads [people image dataset](https://www.kaggle.com/ahmadahmadzada/images2000) and tells Jina to index 2000 image-caption pairs with MobileNet and DistilBERT. The index process takes about 3 minute on CPU. Then it opens a webpage where you can query multimodal document. We have prepared [a Youtube tutorial](https://youtu.be/B_nH8GCmBfc) to walk you through this demo.

## Get Started

| | |
Expand Down Expand Up @@ -774,7 +791,7 @@ with f:
```


That is the essence behind `jina hello mnist`. It is merely a taste of what Jina can do. We’re really excited to see what you do with Jina! You can easily create a Jina project from templates with one terminal command:
That is the essence behind `jina hello fashion`. It is merely a taste of what Jina can do. We’re really excited to see what you do with Jina! You can easily create a Jina project from templates with one terminal command:

```bash
pip install jina[hub] && jina hub new --type app
Expand Down
6 changes: 4 additions & 2 deletions cli/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,14 @@ def hello_world(args: 'Namespace'):


def hello(args: 'Namespace'):
if args.hello == 'mnist':
if args.hello == 'fashion':
from jina.helloworld import hello_world
elif args.hello == 'chatbot':
from jina.helloworld.chatbot import hello_world
elif args.hello == 'multimodal':
from jina.helloworld.multimodal import hello_world
else:
raise ValueError(f'{args.hello} must be one of [`mnist`, `chatbot`]')
raise ValueError(f'must be one of [`fashion`, `chatbot`, `multimodal`]')

hello_world(args)

Expand Down
10 changes: 6 additions & 4 deletions cli/autocomplete.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,14 @@ def _gaa(key, parser):
ac_table = {
'commands': ['--help', '--version', '--version-full', 'hello', 'pod', 'flow', 'optimizer', 'gateway', 'ping',
'check', 'hub', 'pea', 'log', 'client', 'export-api', 'hello-world'], 'completions': {
'hello mnist': ['--help', '--workdir', '--download-proxy', '--shards', '--parallel', '--uses-index',
'--index-data-url', '--index-labels-url', '--index-request-size', '--uses-query',
'--query-data-url', '--query-labels-url', '--query-request-size', '--num-query', '--top-k'],
'hello fashion': ['--help', '--workdir', '--download-proxy', '--shards', '--parallel', '--uses-index',
'--index-data-url', '--index-labels-url', '--index-request-size', '--uses-query',
'--query-data-url', '--query-labels-url', '--query-request-size', '--num-query', '--top-k'],
'hello chatbot': ['--help', '--workdir', '--download-proxy', '--uses', '--index-data-url', '--demo-url',
'--port-expose', '--parallel', '--unblock-query-flow'],
'hello': ['--help', 'mnist', 'chatbot'],
'hello multimodal': ['--help', '--workdir', '--download-proxy', '--uses', '--index-data-url', '--demo-url',
'--port-expose', '--unblock-query-flow'],
'hello': ['--help', 'fashion', 'chatbot', 'multimodal'],
'pod': ['--help', '--name', '--log-config', '--identity', '--hide-exc-info', '--port-ctrl', '--ctrl-with-ipc',
'--timeout-ctrl', '--ssh-server', '--ssh-keyfile', '--ssh-password', '--uses', '--py-modules',
'--port-in', '--port-out', '--host-in', '--host-out', '--socket-in', '--socket-out', '--dump-interval',
Expand Down
34 changes: 17 additions & 17 deletions extra-requirements.txt
1E0A
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,22 @@
# 4. Tag `all` is reserved for representing all packages

scipy>=1.4.1: index, numeric, cicd
fastapi: devel, cicd, http, test, daemon
uvicorn>=0.12.1: devel, cicd, http, test, daemon
fluent-logger: logging, http, sse, dashboard, devel, cicd, test, daemon
fastapi: devel, cicd, http, test, daemon, chatbot, multimodal
uvicorn>=0.12.1: devel, cicd, http, test, daemon, chatbot, multimodal
fluent-logger: logging, http, sse, dashboard, devel, cicd, daemon
nmslib>=1.6.3: index
docker: devel, cicd, network, hub, test, daemon
torch>=1.1.0: framework, cicd
transformers>=2.6.0: nlp, cicd
docker: devel, cicd, network, hub, daemon
torch>=1.1.0: framework, cicd, chatbot, multimodal
transformers>=2.6.0: nlp, cicd, chatbot, multimodal
flair: nlp
paddlepaddle: framework, py37
paddlehub: framework, py37
tensorflow>=2.0: framework, cicd
tensorflow-hub: framework, py37
torchvision>=0.3.0: framework, cv
torchvision>=0.3.0: framework, cv, multimodal, cicd
onnx: framework, py37
onnxruntime: framework, py37
Pillow: cv, cicd, test
Pillow: cv, cicd, multimodal
annoy>=1.9.5: index
sklearn: numeric
plyvel: index
Expand Down Expand Up @@ -61,14 +61,14 @@ pytest-repeat: test
pytest-asyncio: test
flaky: test
mock: test
requests: http, devel, test, daemon
prettytable: devel, test
requests: http, devel, cicd, daemon
prettytable: devel, cicd
optuna: cicd, optimizer
websockets: http, devel, test, ws, daemon
wsproto: http, devel, test, ws, daemon
pydantic: http, devel, test, daemon
python-multipart: http, devel, test, daemon
aiofiles: devel, cicd, http, test, daemon
websockets: http, devel, cicd, ws, daemon
wsproto: http, devel, cicd, ws, daemon
pydantic: http, devel, cicd, daemon
python-multipart: http, devel, cicd, daemon
aiofiles: devel, cicd, http, daemon
pytest-custom_exit_code: cicd, test
bs4: test
aiostream: devel, cicd, test
bs4: cicd
aiostream: devel, cicd
2 changes: 1 addition & 1 deletion jina/helloworld/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def hello_world(args):
download_data(targets, args.download_proxy)

# this envs are referred in index and query flow YAMLs
os.environ['RESOURCE_DIR'] = resource_filename('jina', 'resources')
os.environ['PATH'] += os.pathsep + resource_filename('jina', 'resources')
os.environ['SHARDS'] = str(args.shards)
os.environ['PARALLEL'] = str(args.parallel)
os.environ['HW_WORKDIR'] = args.workdir
Expand Down
4 changes: 2 additions & 2 deletions jina/helloworld/chatbot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def hello_world(args):
except:
pass # intentional pass, browser support isn't cross-platform
finally:
default_logger.success(f'You should see a chatbot page opened in your browser, '
f'if not you may open {args.demo_url} manually')
default_logger.success(f'You should see a demo page opened in your browser, '
f'if not, you may open {args.demo_url} manually')
if not args.unblock_query_flow:
f.block()
2 changes: 1 addition & 1 deletion jina/helloworld/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def write_html(html_path):

colored_url = colored('https://opensource.jina.ai', color='cyan', attrs='underline')
default_logger.success(
f'🤩 Intrigued? Play with "jina hello mnist --help" and learn more about Jina at {colored_url}')
f'🤩 Intrigued? Play with "jina hello fashion --help" and learn more about Jina at {colored_url}')


def download_data(targets, download_proxy=None, task_name='download fashion-mnist'):
Expand Down
61 changes: 61 additions & 0 deletions jina/helloworld/multimodal/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import os
import webbrowser
from pathlib import Path

from pkg_resources import resource_filename

from .. import download_data
from ... import Flow
from ...importer import ImportExtensions
from ...logging import default_logger


def hello_world(args):
Path(args.workdir).mkdir(parents=True, exist_ok=True)

with ImportExtensions(required=True, help_text='this demo requires Pytorch and Transformers to be installed, '
'if you haven\'t, please do `pip install jina[torch,transformers]`'):
import transformers, torch
assert [torch, transformers] #: prevent pycharm auto remove the above line

targets = {
'people-img': {
'url': args.index_data_url,
'filename': os.path.join(args.workdir, 'dataset.zip')
}
}

# download the data
download_data(targets, args.download_proxy, task_name='download zip data')
import zipfile
with zipfile.ZipFile(targets['people-img']['filename'], 'r') as fp:
fp.extractall(args.workdir)

# this envs are referred in index and query flow YAMLs
os.environ['HW_WORKDIR'] = args.workdir
os.environ['PATH'] += os.pathsep + os.path.join(resource_filename('jina', 'resources'), 'multimodal')

# now comes the real work
# load index flow from a YAML file

# index it!
f = Flow.load_config('flow-index.yml')
with f, open(f'{args.workdir}/people-img/meta.csv') as fp:
f.index_csv(fp)

# search it!

f = Flow.load_config('flow-query.yml')
# switch to REST gateway
f.use_rest_gateway(args.port_expose)

with f:
try:
webbrowser.open(args.demo_url, new=2)
except:
pass # intentional pass, browser support isn't cross-platform
finally:
default_logger.success(f'You should see a demo page opened in your browser, '
f'if not, you may open {args.demo_url} manually')
if not args.unblock_query_flow:
f.block()
2 changes: 1 addition & 1 deletion jina/hub
Submodule hub updated 167 files
69 changes: 62 additions & 7 deletions jina/parsers/helloworld.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,18 +32,45 @@ def set_hello_parser(parser=None):
'to get detailed information about each sub-command', required=True)

set_hw_parser(
spp.add_parser('mnist',
help='Start a simple end2end fashion images index & search demo without any extra dependencies.',
spp.add_parser('fashion',
help='Start a simple end2end fashion images index & search demo. '
'This demo requires no extra dependencies.',
description='Run a fashion search demo',
formatter_class=_chf))

set_hw_chatbot_parser(
spp.add_parser('chatbot',
help='Start a simple Covid-19 chatbot. Pytorch and transformers are '
'required to run this demo',
help='''
Start a simple Covid-19 chatbot.

Remarks:

- Pytorch, transformers & FastAPI are required to run this demo. To install all dependencies, use

pip install "jina[chatbot]"

- The indexing could take 1~2 minute on a CPU machine.
''',
description='Run a chatbot QA demo',
formatter_class=_chf))

set_hw_multimodal_parser(
spp.add_parser('multimodal',
help='''
Start a simple multimodal document search.

Remarks:

- Pytorch, torchvision, transformers & FastAPI are required to run this demo. To install all dependencies, use

pip install "jina[multimodal]"

- The indexing could take 2~3 minute on a CPU machine.
- Downloading the dataset could take ~1 minute depending on your network.
''',
description='Run a multimodal search demo',
formatter_class=_chf))


def set_hw_parser(parser=None):
"""Set the hello world parser
Expand Down Expand Up @@ -112,11 +139,11 @@ def set_hw_chatbot_parser(parser=None):
default=resource_filename('jina', '/'.join(('resources', 'helloworld.flow.index.yml'))),
help='The yaml path of the index flow')
parser.add_argument('--index-data-url', type=str,
default='https://api.jina.ai/demo/chatbot/dataset.csv',
default='https://static.jina.ai/chatbot/dataset.csv',
help='The url of index csv data')
parser.add_argument('--demo-url', type=str,
default='https://api.jina.ai/demo/chatbot/',
help='The url of chatbot demo page')
default='https://static.jina.ai/chatbot/',
help='The url of the demo page')
parser.add_argument('--port-expose',
type=int,
default=8080,
Expand All @@ -127,3 +154,31 @@ def set_hw_chatbot_parser(parser=None):
parser.add_argument('--unblock-query-flow', action='store_true', default=False,
help='Do not block the query flow' if _SHOW_ALL_ARGS else argparse.SUPPRESS)
return parser


def set_hw_multimodal_parser(parser=None):
"""Set the parser for the hello world multimodal

:param parser: the parser configure
:return: the new parser
"""
if not parser:
parser = set_base_parser()

mixin_hw_base_parser(parser)
parser.add_argument('--uses', type=str,
default=resource_filename('jina', '/'.join(('resources', 'multimodal', 'flow-index.yml'))),
help='The yaml path of the index flow')
parser.add_argument('--index-data-url', type=str,
default='https://static.jina.ai/multimodal/people-img.zip',
help='The url of index csv data')
parser.add_argument('--demo-url', type=str,
default='https://static.jina.ai/multimodal/',
help='The url of the demo page')
parser.add_argument('--port-expose',
type=int,
default=8080,
help='The port of the host exposed to the public')
parser.add_argument('--unblock-query-flow', action='store_true', default=False,
help='Do not block the query flow' if _SHOW_ALL_ARGS else argparse.SUPPRESS)
return parser
4 changes: 2 additions & 2 deletions jina/resources/helloworld.flow.index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ with:
compress_hwm: 1024
pods:
- name: encode
uses: $RESOURCE_DIR/helloworld.encoder.yml
uses: helloworld.encoder.yml
parallel: $PARALLEL
- name: index
uses: $RESOURCE_DIR/helloworld.indexer.yml
uses: helloworld.indexer.yml
shards: $SHARDS
6 changes: 3 additions & 3 deletions jina/resources/helloworld.flow.query.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ with:
compress_hwm: 1024
pods:
- name: encode
uses: $RESOURCE_DIR/helloworld.encoder.yml
uses: helloworld.encoder.yml
parallel: $PARALLEL
- name: index
uses: $RESOURCE_DIR/helloworld.indexer.yml
uses: helloworld.indexer.yml
shards: $SHARDS
polling: all
uses_after: $RESOURCE_DIR/helloworld.reduce.yml
uses_after: helloworld.reduce.yml
timeout_ready: 100000 # larger timeout as in query time will read all the data
- name: evaluate # optional evaluation, do another step for precision/recall computing
uses: _eval_pr # use internal evaluator on precision & recall
Loading
0