Skip to main content

Posts

Showing posts from 2016

How to use VisualVM

      VisualVM can be very helpful to discover the performance lags in Java application.  It is one of the easiest profiling tools for Java. Download VisualVM https://visualvm.github.io/ Run VisualVM and check local running java apps:  Remote Profiling. Run your java application with following JVM arguments: - Djavax . management . builder . initial = - Dcom . sun . management . jmxremote - Dcom . sun . management . jmxremote . port = 9010 - Dcom . sun . management . jmxremote . local . only = false - Dcom . sun . management . jmxremote . authenticate = false - Dcom . sun . management . jmxremote . ssl = false Above parameters, makes your remote java application to listen to port 9010. Then, you can connect to it from VisualVM by Menu->File->Add JMX connection Type your hostname and port. Example: 192.168.10.10:9010 (IP address of remote machine and port) Performance Profiling       After you connect to your app from VisualVM, go to " Sampler "

Performance tuning for Web engine

Install Tsung on CentOS Pre-requisites: 1. Install Erlang: sudo   yum -y update &&  sudo   yum -y upgrade sudo   yum  install   epel-release sudo yum -y install erlang perl perl-RRD-Simple.noarch perl-Log-Log4perl-RRDs.noarch gnuplot perl-Template-Toolkit 2. Get Tsung wget http://tsung.erlang-projects.org/dist/tsung-1.6.0.tar.gz 3. Extract and Install tar zxvf tsung-1.6.0.tar.gz cd tsung-1.6.0 ./configure && make && sudo make install Note: Sample XML configurations are located in  /usr/share/doc/tsung/examples/http_simple .xml Setup up Cluster Testing with Tsung 1. Add cluster nodes info in each node's "/etc/hosts" sudo vi /etc/hosts # cluster nodes 192.168.10.10       n1 192.168.10.11       n2 192.168.10.12       n3 192.168.10.13       n4 2. Setup ~/.ssh/config file vi ~/. ssh /config Host n1    Hostname n1    User tsung    Port 722    IdentityFile /home/tsung/ . ssh /my_key_rsa7 Host n2   

Java: BloomFilter Benchmark

Intro Bloom filter is a probabilistic data structure for searching element in a data set. It is similar to HashSet, similarly it tells us whether the set contains certain element or not. Difference is the output of contains(element)=TRUE is futuristic. In our example we set futuristic value to 0.01 , which means the answer "It contains" is 99% correct. Read more about Bloom filter from here: https://en.wikipedia.org/wiki/Bloom_filter Scenario We create two Arrays of random elements. Elements count in each array is 1,000,000. Then we insert the first array into BloomFilter, and we iterate the first array and check if the item contains in BloomFilter. Second array is used only for checking non-existing elements. We do the same for HashSet as described above. Benchmarking code We used customized version of Bloom filter  which can accept byte array. (Previous version of this blog was using encoding of string for every put and contains, which was misguiding

NLP for Uzbek language

    Natural language processing is an essential tool for text mining in data analysis field. In this post, I want to share my approach in developing stemmer for Uzbek language.      Uzbek language is spoken by 27 million people  around the world and there are a lot of textual materials in internet in uzbek language and it is growing. As I was doing my weekend project " FlipUz " (which is news aggregator for Uzbek news sites) I stumbled on a problem of automatic tagging news into different categories. As this requires a good NLP library, I was not able to find one for Uzbek language. That is how I got a motive to develop a stemmer for Uzbek language.       In short,  Stemming  is an algorithm to remove meaningless suffixes at the end, thus showing the core part of the word. For example: rabbits -> rabbit. As Uzbek language is similar to Turkish, I was curious if there is stemmer for Turkish. And I found this: Turkish Stemmer with Snowball.  Their key approach was to u