ImageScraper :page_with_curl:
A cool command line tool which downloads images from the given webpage.
Build Status | Version | Downloads |
---|---|---|
Demo
Click here to see it in action!
Download
tar file:
Grab the latest stable build from - Pip: https://pypi.python.org/pypi/ImageScraper
pip install (recommended):
You can also download using pip:
$ pip install ImageScraper
Dependencies
Note that ImageScraper
depends on lxml
and requests
.
If you run into problems in the compilation of lxml
through pip
, install the libxml2-dev
and libxslt-dev
packages on your system.
Usage
$ image-scraper [OPTIONS] URL
You can also use it in your python scripts.
import image_scraper
image_scraper.scrape_images(URL)
Options
-h, --help Print help
-m, --max-images <number> Maximum number images to be scraped
-s, --save-dir <path> Name of the folder to save the images
-g, --injected Scrape injected images
--formats [ [FORMATS ..]] Specify the formats of images to be scraped
--max-filesize <size> Limit on size of image in bytes (default: 100000000)
--dump-urls Print the URLs of the images
--scrape-reverse Scrape the images in reverse order
If you downloaded the tar:
Extract the contents of the tar file.
$ cd ImageScraper/
$ python setup.py install
$ image-scraper --max-images 10 [url to scrape]
Examples
Scrape all images
$ image-scraper ananth.co.in/test.html
Scrape at max 2 images
$ image-scraper -m 2 ananth.co.in/test.html
Scrape only gifs and download to folder ./mygifs
$ image-scraper -s mygifs ananth.co.in/test.html --formats gif
NOTE:
By default, a new folder called "images_
Issues
Q.)All images were not downloaded?
It could be that the content was injected into the page via javascript and this scraper doesn't run javascript.
Contribute
If you want to add features, improve them, or report issues, feel free to send a pull request!!