Create instagram_crawler.py#2508
Create instagram_crawler.py#2508yogeshwaran01 wants to merge 4 commits intoTheAlgorithms:masterfrom yogeshwaran01:patch-3
Conversation
It crawls the Instagram page of the user and Scarpe the data.
| from bs4 import BeautifulSoup | ||
| import json | ||
|
|
||
| headers = \ |
There was a problem hiding this comment.
No backslashes please in Python code. See PEP8. The problem with backslashes is that whitespace to the right of the backslash breaks the script on a change that is invisible to the reader.
You can run your code through psf/black to autofix that issue as discussed in CONTRIBUTING.md
|
|
||
| def __init__(self, username): | ||
| self.username = username | ||
| self.url = 'https://www.instagram.com/{}/'.format(username) |
There was a problem hiding this comment.
| self.url = 'https://www.instagram.com/{}/'.format(username) | |
| self.url = f'https://www.instagram.com/{username}/' |
As discussed in CONTRIBUTING.md, please use f-strings where they make sense.
There was a problem hiding this comment.
Do we really want to query the backend (Instagram) for every one of the fields below? Most of these values are very static and are unlikely to change minute-to-minute. Perhaps it would be better to have a self.user_data dict that contained the results of self.get_json() and the other methods could use that data.
| info = html_1(soup) | ||
| return info | ||
| except: | ||
| info = html_2(soup) | ||
| return info |
There was a problem hiding this comment.
| info = html_1(soup) | |
| return info | |
| except: | |
| info = html_2(soup) | |
| return info | |
| return html_1(soup) | |
| except: # <-- This repo does not accept bare excepts | |
| return html_2(soup) |
Bare excepts are discussed in PEP8 and https://realpython.com/the-most-diabolical-python-antipattern/
| info = html_2(soup) | ||
| return info | ||
|
|
||
| def get_followers(self): |
There was a problem hiding this comment.
Let's make this (and similar methods below) into a @property so that we can use the syntax instagram_user.number_of_followers (without the ()).
| def get_followers(self): | |
| @property | |
| def number_of_followers(self) -> int: |
Also, let's streamline to a one-line implementation like:
return self.get_json()['edge_followed_by']['count']
# or...
return self.data['edge_followed_by']['count']There was a problem hiding this comment.
In general, do not create a variable that you get rid of on the very next line unless the variable name really helps the reader understand something nonobvious.
| followers = info['edge_followed_by']['count'] | ||
| return followers | ||
|
|
||
| def get_followings(self): |
There was a problem hiding this comment.
| def get_followings(self): | |
| def get_number_of_followings(self) -> int: |
| following = info['edge_follow']['count'] | ||
| return following | ||
|
|
||
| def get_posts(self): |
There was a problem hiding this comment.
| def get_posts(self): | |
| def get_number_of_posts(self) -> int: |
| posts = info['edge_owner_to_timeline_media']['count'] | ||
| return posts | ||
|
|
||
| def get_biography(self): |
There was a problem hiding this comment.
| def get_biography(self): | |
| def get_biography(self) -> str: |
| user = Instagram('github') | ||
| print(user.is_verified()) | ||
| print(user.get_biography()) |
There was a problem hiding this comment.
Code that is at global scope will be run by our Travis CI / pytest process as discussed in CONTRIBUTING.md.
| user = Instagram('github') | |
| print(user.is_verified()) | |
| print(user.get_biography()) | |
| if __name__ == '__main__': | |
| user = Instagram('github') | |
| print(f"{user.is_verified() = }) | |
| print(f"{user.get_biography() = }) |
Co-authored-by: Christian Clauss <cclauss@me.com>
yogeshwaran01
left a comment
There was a problem hiding this comment.
According to @cclauss some changes are done
| { | ||
| 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} | ||
| headers = { | ||
| "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" |
There was a problem hiding this comment.
| "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" | |
| "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) " | |
| "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" |
|
Hey @yogeshwaran01, TravisCI finished with status TravisBuddy Request Identifier: 303e78f0-024d-11eb-aba2-872ffb2742c8 |
|
Hey @yogeshwaran01, TravisCI finished with status TravisBuddy Request Identifier: 303bb9d0-024d-11eb-aba2-872ffb2742c8 |
|
Hey @yogeshwaran01, TravisCI finished with status TravisBuddy Request Identifier: 303f1530-024d-11eb-aba2-872ffb2742c8 |
|
Hey @yogeshwaran01, TravisCI finished with status TravisBuddy Request Identifier: 30424980-024d-11eb-aba2-872ffb2742c8 |
|
Closing in favor of #2509 |
An algorithm crawls the Instagram page of the user and Scarpe the data.
Describe your change:
Checklist:
Fixes: #{$ISSUE_NO}.