Skip to main content

Posts

Why Uzbekistan needs its own local CDN

 Introduction Imagine that you're serving a website and the majority of your users are people from Uzbekistan. In other words, your business is targeting the local market of Uzbekistan.  To make your website faster you will need a CDN, this can help your business to perform better. There are several reasons why your website can be slow without the CDN acceleration: 1. No existing Tier 2 network. Tier 2 network plays an important role when it comes to the speed of the internet. It enables Tier 3 internet service providers to directly connect to the internet without other intermediate layer. In Uzbekistan, UzTelecom is the largest internet provider. According to the ` traceroute ` command it uses RETN tier-2 network. The RETN unfortunetly does not have the lines(network) in  Uzbekistan according to their map ( source ). This means that the majority of internet traffic needs to go through the single UzTelecom, which creates an overhead for the speed of internet. 2. Slow internet Uzbek
Recent posts

Reducing system load on cache servers by using Bloom Filter

Intro        In this post, I want to share my experience on how bloom filter was used to reduce system load (CPU, RAM, Disk operations..) on our cache servers at CDNetworks. How it all started?        While working at CDNetworks, I got contacted by a recruiter to apply to Japanese company named Rakuten. It was an interesting challenge, so I tried. I had a skype interview with a technical recruiter and he asked me "what is Bloom Filter?", I did not know what it is. I failed the interview,  but it taught me what is Bloom Filter. Bloom filter is a probabilistic data structure, which is similar to HashMap, but insanely memory optimal. If you hold a million URLs in HashMap, it can reach up to 500Mb, whereas BloomFilter can make it with 16Mb (More info here:  http://ahikmat.blogspot.kr/2016/07/intro-bloom-filter-is-probabilistic.html ) . In other words, Bloom Filter is a clown with a bag full of balls marked with random integer numbers. if you ask him whether some ball wit

How to use VisualVM

      VisualVM can be very helpful to discover the performance lags in Java application.  It is one of the easiest profiling tools for Java. Download VisualVM https://visualvm.github.io/ Run VisualVM and check local running java apps:  Remote Profiling. Run your java application with following JVM arguments: - Djavax . management . builder . initial = - Dcom . sun . management . jmxremote - Dcom . sun . management . jmxremote . port = 9010 - Dcom . sun . management . jmxremote . local . only = false - Dcom . sun . management . jmxremote . authenticate = false - Dcom . sun . management . jmxremote . ssl = false Above parameters, makes your remote java application to listen to port 9010. Then, you can connect to it from VisualVM by Menu->File->Add JMX connection Type your hostname and port. Example: 192.168.10.10:9010 (IP address of remote machine and port) Performance Profiling       After you connect to your app from VisualVM, go to " Sampler "

Performance tuning for Web engine

Install Tsung on CentOS Pre-requisites: 1. Install Erlang: sudo   yum -y update &&  sudo   yum -y upgrade sudo   yum  install   epel-release sudo yum -y install erlang perl perl-RRD-Simple.noarch perl-Log-Log4perl-RRDs.noarch gnuplot perl-Template-Toolkit 2. Get Tsung wget http://tsung.erlang-projects.org/dist/tsung-1.6.0.tar.gz 3. Extract and Install tar zxvf tsung-1.6.0.tar.gz cd tsung-1.6.0 ./configure && make && sudo make install Note: Sample XML configurations are located in  /usr/share/doc/tsung/examples/http_simple .xml Setup up Cluster Testing with Tsung 1. Add cluster nodes info in each node's "/etc/hosts" sudo vi /etc/hosts # cluster nodes 192.168.10.10       n1 192.168.10.11       n2 192.168.10.12       n3 192.168.10.13       n4 2. Setup ~/.ssh/config file vi ~/. ssh /config Host n1    Hostname n1    User tsung    Port 722    IdentityFile /home/tsung/ . ssh /my_key_rsa7 Host n2   

Java: BloomFilter Benchmark

Intro Bloom filter is a probabilistic data structure for searching element in a data set. It is similar to HashSet, similarly it tells us whether the set contains certain element or not. Difference is the output of contains(element)=TRUE is futuristic. In our example we set futuristic value to 0.01 , which means the answer "It contains" is 99% correct. Read more about Bloom filter from here: https://en.wikipedia.org/wiki/Bloom_filter Scenario We create two Arrays of random elements. Elements count in each array is 1,000,000. Then we insert the first array into BloomFilter, and we iterate the first array and check if the item contains in BloomFilter. Second array is used only for checking non-existing elements. We do the same for HashSet as described above. Benchmarking code We used customized version of Bloom filter  which can accept byte array. (Previous version of this blog was using encoding of string for every put and contains, which was misguiding

NLP for Uzbek language

    Natural language processing is an essential tool for text mining in data analysis field. In this post, I want to share my approach in developing stemmer for Uzbek language.      Uzbek language is spoken by 27 million people  around the world and there are a lot of textual materials in internet in uzbek language and it is growing. As I was doing my weekend project " FlipUz " (which is news aggregator for Uzbek news sites) I stumbled on a problem of automatic tagging news into different categories. As this requires a good NLP library, I was not able to find one for Uzbek language. That is how I got a motive to develop a stemmer for Uzbek language.       In short,  Stemming  is an algorithm to remove meaningless suffixes at the end, thus showing the core part of the word. For example: rabbits -> rabbit. As Uzbek language is similar to Turkish, I was curious if there is stemmer for Turkish. And I found this: Turkish Stemmer with Snowball.  Their key approach was to u

How to use Docker

Docker Docker offical webiste: https://www.docker.com/ Setup Docker Prepare fresh version of CentOS, I am using CentOS 6.7. Update the yum rep. > rpm -iUvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm > yum update -y   Install Docker > yum -y install docker-io Pull some image of container, I am going to use CentOS container. To pull the latest (CentOs 7) >  docker pull centos Or > docker pull centos:centos6 Check which container images are installed: > docker images REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE centos              centos6             3bbbf0aca359        2 weeks ago         190.6 MB centos              latest              ce20c473cd8a        2 weeks ago Run docker from image: > docker run -i -t centos:centos6 /bin/bash Note: this creates a container from image (you can see the ContainerID as hostname) List containers: >docker ps >docker