Good Housing Deals
Is a certain real estate listing a good deal?
Good Housing Deals is a data-driven platform that helps users evaluate real estate listings for value and potential. It uses Python to analyze housing data, Playwright for web scraping, FastAPI for the backend, and UV for virtual environment.
Features
- Scrapes Data by city, and state
- Averages prices by city and state
- Uses Computer Vision to determine whether the interior is modern or not
- Provides a user-friendly interface for evaluating listings and whether they are above market or below market value
- See all listings on a country map
- Filter by price, number of bedrooms, number of bathrooms, and square footage or good deal score
Why did I build this?
I built Good Housing Deals to help people make informed decisions when buying real estate. The housing market can be complex and overwhelming, and I wanted to create a tool that simplifies the process of evaluating listings. By leveraging data analysis and computer vision, I aimed to provide users with insights that go beyond just the listing price, helping them find properties that offer true value.
Development Story
I will publish a YouTube video most likely that explains this project on detail, including the technical aspects of web scraping, data analysis, and computer vision. But if you want to keep reading, here is a human version of this.
The idea
I am looking to move out of my home, however, I do not have all of the money in the world to do. So I just need to find a good deal. I was looking at Facebook Marketplace and Mercadolibre for apartments, however, I was not sure if the price was good or not. So I thought to myself, why not build a tool that can help me with that?
The data
I started by scraping data from various real estate listing websites using Playwright. I focused on gathering information such as price, location, number of bedrooms and bathrooms, square footage, and images of the interiors. This data was then stored in a structured format for analysis.
The first roadblock: Anti-scraping measures and solutions
I was able to figure out how to scrape the entire data from my main city, however, when I kept running the scraper, the first thing that I saw was a login screen, and at that moment, it clicked: My IP is compromised.
The solution: Rotating Proxies, Human-like interactions, and headers change
Fixing the scroll
The first thing I thought, was: Ok, so it seems that the scroll is just to bot like, I need to convert this to something more human...
Then, I learned that you could use JavaScript evals with Playwright, and what I did, was to perform an eval (which could be seen as just executing code in the devtools) to perform a smooth scroll.
async def scroll_like_human(page: Page, pause: float = 1.5, max_scrolls: int = 30) -> None:
# This gets the scroll height of the body of the page
previous_height = await page.evaluate("document.body.scrollHeight")
for i in range(max_scrolls):
# 1. Smooth scroll relative to current position
scroll_amount = random.randint(500, 800)
# 2. This evaluation simulates a smooth scroll by a random amount
await page.evaluate(f"""
window.scrollBy({{
top: {scroll_amount},
behavior: 'smooth'
}});
""")
# 2. Wait for the smooth animation AND the network
await asyncio.sleep(pause)
# 3. Check if we actually moved or if height grew
new_height = await page.evaluate("document.body.scrollHeight")
current_scroll_pos = await page.evaluate("window.pageYOffset + window.innerHeight")
# Logic: If height hasn't changed AND we are at the bottom, we are done
if new_height == previous_height and current_scroll_pos >= new_height:
print("Reached the end of the page.")
break
previous_height = new_heightpage.eval(r"() => {}")
This piece of code made it so the scroll looks human like! The good part is that I can re-use this particular piece of code anytime I need to scroll on a page.
Browser Fingerprinting
This is a very important concept. Browser Fingerprinting is basically a technique that websites use to detect whether the users navigating the website are bots or real users.
Manually configuring all of these keypoints usually is not ideal, therefore, it is commendable to use the playwright-stealth package for this.
Random values are your friends!
Now, the thing with human movements, is that they are random.
Bot detection usually checks things like the time you stay on a page, your scroll, your mouse movements, and if these are very predictable, they will usually flag your IP (and blacklist it in the worst cases)
I changed the code so instead of relying on fixed values, instead, I rely on random ranges, this approach seems to work very well!
Keystrokes vs Fill
Now, I was using "fill" to fill the searchbox on the website I scrape, however, this again, is robot-like, and what we want, is human-like behavior. Therefore, I used the page.keystroke() method to emulate a keystroke, with a random value from 1.5 seconds to 2.5 seconds.
Clicking instead of Redirecting
Humans click on things, they do not copy the href and then paste it in the URL bar, therefore, what I changed was important: progressively clicking on the listings that are currently visible in the viewport.
[Expand]
This definitely was the most challenging part of this project lol, there is a lot of edge conditions to check.
Proxies
Now, the thing is that even though the bot is behaving like a human, if I want to run ALL cities, and ALL states, I cannot wait until one program finish, so I need to introduce multiple python agents that can do this.
For this, I need to know the number of cities that my country has, which in this case, are ___.
Therefore, what I did was to use this service for residential proxies, and then have multiple agents scrape said cites, so I can store all information I need to track individual locations prices.
Headers
Cron Job Statuses, Sockets and Webhooks
Now, I created the interface to make sure that I can just see the agents progress on the scraping, check if they failed and why, and then just store the relevant information (price change) for each of the apartments.
Here, I created a Job that is executed in my server, and then, once its finished, it updates the status of a said city and state. If the scraping was successfully completed, it will show in the map.
The final result
Now, here we have the full application that tracks the prices of each of the listings in Mercadolibre, as well as if they are still active or not in the platform, and it calculates the average per city, per state, and in the entire country too! Data is so nice.