Recently I've got an idea to analyze prices of houses in Tashkent city. " www.zor.uz " is one of the most famous sites where people sell houses, cars, electronics and etc. I had to crawl it and collect data from it. Unfortunately, "www.zor.uz" does not have the old data, the latest data that I found there was just one month old. They seem to clean their DB every month. I needed data for previous years. So, I used "Way Back Machine" that has captured points of all website around the world. It is really a cool stuff to try : https://archive.org/index.php But they use "https" so I had download their certificate and register it to my JRE. I crawled that site and succeeded to get data starting from 2009.08 (that was the earliest capture point of www.zor.uz) Interestingly "WayBackMachine" captures websites periodically, depending how often website changes. They have their own logic to capture sites for optimizing their stora...
experiences with BIG data collection, analysis and visualization