A central repo for code, sharing resources, workflows and roadmaps.Check the Projects section for project managemet plans.
- Project proposal (In development)
- Scrape data from all sites mentioned below. (In development)
- Clean the data and remove unnecessary variables and noise. (In development)
- Perfrom EDA and find out the relationships between variables. (Pending)
- Build a model to predict/give an objective price of the house based on the parameters. (Pending)
- Deploy the model using flask(Api), HTML, CSS, JS and PHP/Laravel(Backend). (Pending)
- Documentation (Pending)
- major-project
- csv_files
- raw (Raw unclean data after scraping)
- links (Links to be scrapped)
- clean (Cleaned data)
- jupyter_notebooks (Notebooks to clean data and develop the model)
- scraping_files
- Hosing price prediction from towards data science
- Coursera linear regression Only good for conceptual/theoritcal learning. Uses outdated libraries.
- Vectorizaiton - Important for understanding numpy and pandas.
- Gradient descent - Important for understanding linear regression.
- Real estate prediction - For brief overview of how the project will work.
- First Class functions - How python's functions can be treated as objects.
- For Numpy, Pandas and Matplotlib
- Google's machine learning course
- Pip Python package manager
- Setting up python virtual env in VSCode
- Data Science Handbook
- Mathematics for machine learning Book
- Google Colab - Online GPU for extra power.
- creating and importing a python module and using its functions (files)
- namespace of functions in a python module
- how functions from other python modules are called
- Python data types: Lists, Tuples, dictionary and sets
- for loops, map function
- method call from a library vs function call ie how obj.method() works
- lambda function/closures/annonyomous function
- Python virtual environmets
- Basics of Object oriented python. Python class constructors. How a function gets access to the object it is called upon and how it manipulates the object.
- What is anacodna?
- How pip works
- Co-relation between variables and multicollinearity.
- Linear/Multiple/Logistic reression
- Squared error of regression line
- p-value, level of significance, null and alternate hypothesis.
- contour plot
- Fit a line using least square method.
- Vectors, Partial derivative, gradient of a function and gradient descent.
- Scatter plot
- Heatmap - To find multicollinearity
- Box plot
- Histograms
- Violin plot
- One hot encoding
- Dummy variable trap
- Co-relation does not mean causation
- Ridge regression - Because the data seems to have multicollinearity
- Linear regression - Related = Wald's test
- Logistic regression ??
- Convolution neural networks
- Random forest
- ID3
Use waka as a testing tool and may use external librariess to check efficiency of the model.
- Using VSCode commmand pallete
- The Spyder IDE - Optional for now
- Markdown - Very important for editing and adding resourses.
- Land measurement - Details about the different units of measurement of land.
- Creating checkmarks in projects
- How Much Training Data is Required for Machine Learning? (Article)
- Install jupyter lab using conda for notebook dark theme:
Go through the basic contents of the docs atleast once
- Numpy (Vectorization and broadcasting in Numpy)
- Pandas
- Seaborn
- Matplotlib
S.N. | Name | Data Amount | Library required | Status | Remarks |
---|---|---|---|---|---|
1 | 99aana | 3K | BS4 | Completed | |
2 | Nepal Homes | 1K | Selenium & BS4 | Links fetched | Data to be fetched using BS4 |
3 | Basobaas | 2K | Selenium & BS4 | Links fetched | Data to be fetched using BS4 |
4 | 1Ropani | 600 | Selenium & BS4 | Completed | |
5 | Hamrobazar | 3K | Halted | Presents a captcha to check for bots | |
6 | Gharbazar | 340 | Selenium | Links fetched | Low amount of data. Scan for house keyword in title. |
7 | Gharghaderi | 300 | Selenium | Halted | Low amount of data |
8 | Housing Nepal | less than 300 | BS4 | Halted | Low amount of data |
9 | Real Estate In Nepal | less than 300 | BS4 | Halted | Low amount of data. Restriction for bots |
10 | Nepal Home Search | 140 | BS4 | Halted | Low amount of data |
11 | Nepal Realestates | less than 300 | BS4 | Halted | Low amoun of data. |
12 | The Realtors | 300 | Selenium & BS4 | Halted | Low amount of data. Scan title or house keyword |
13 | GharJagga Nepal | 330 | Selenium | Halted | Low data and has infinite scrolling |
- Price
- Location in District
- Number of rooms
- Number of floors
- Area
- Time posted
- Road size and road type*
- Room size *
- Bedrooms *
- Bathrooms *
- Car parking *
- kitchen *
- Living room *
- Garage
- Furnished ? *
- Guestroom *
- Files/Modules and variables = snake_case
- directories = all lowercase preferably without underscores
- classes = PascalCase