Monday, September 19, 2016

How to use VisualVM

      VisualVM can be very helpful to discover the performance lags in Java application.
 It is one of the easiest profiling tools for Java.


Download VisualVM
https://visualvm.github.io/


Run VisualVM and check local running java apps: 


Remote Profiling.
Run your java application with following JVM arguments:
-Djavax.management.builder.initial=
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
Above parameters, makes your remote java application to listen to port 9010.
Then, you can connect to it from VisualVM by Menu->File->Add JMX connection
Type your hostname and port. Example: 192.168.10.10:9010
(IP address of remote machine and port)

Performance Profiling
      After you connect to your app from VisualVM, go to "Sampler" tab and press "CPU" button.
It is important to sort by "Total Time(CPU)" to see high CPU consumers on the top of the list.
This gives you some idea, but it is not detail. So, to get detail information,
press "Snapshot" button, this opens you following view:





VisualVM allows you to real-time monitoring which functions are taking up high CPU usage.

This window is very important. From this, you can find which functions, classes,
or packages are causing your Java application to be slow.
It is the key approach to resolve performance issues in your java application.
You can play with sorting options, and navigate through callers,
and check other tabs "Host Spots" etc.

Memory Profiling
      Memory usage analysing is also similar to above. Press "Memory" button in Sampler window.
Sort by "Bytes" to see data types (or classes) which are consuming much memory on the top.
You can also take "Snapshot" to see more details about the monitoring status.

Conclusion
   VisualVM can be very helpful to monitor, analyse, tune Java Application Performance.
This is essential task while developing scalabale, distributed, high-performance applications.














Performance tuning for Web engine


Install Tsung on CentOS

Pre-requisites:
1. Install Erlang:

sudo yum -y update && sudo yum -y upgrade
sudo yum install epel-release

sudo yum -y install erlang perl perl-RRD-Simple.noarch perl-Log-Log4perl-RRDs.noarch gnuplot perl-Template-Toolkit


2. Get Tsung
wget http://tsung.erlang-projects.org/dist/tsung-1.6.0.tar.gz

3. Extract and Install
tar zxvf tsung-1.6.0.tar.gz
cd tsung-1.6.0
./configure && make && sudo make install

Note: Sample XML configurations are located in /usr/share/doc/tsung/examples/http_simple.xml

Setup up Cluster Testing with Tsung

1. Add cluster nodes info in each node's "/etc/hosts"
sudo vi /etc/hosts
# cluster nodes
192.168.10.10       n1
192.168.10.11       n2
192.168.10.12       n3
192.168.10.13       n4

2. Setup ~/.ssh/config file
vi ~/.ssh/config
Host n1
  Hostname n1
  User tsung
  Port 722
  IdentityFile /home/tsung/.ssh/my_key_rsa7
Host n2
  Hostname n2
  User tsung
  Port 722
  IdentityFile /home/tsung/.ssh/my_key_rsa7
.....

Test and Visualize Results
1. Start tsung on master server
tsung -f /home/tsung/test/selected_scenario.xml start

2. Plot graphs with Perl script
/usr/lib/tsung/bin/tsung_stats.pl --stats /home/tsung/.tsung/log/$tsung_path/tsung.log

Change Kernel params

vi /etc/sysctl.conf

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1024 65000
fs.file-max = 65000

Source: http://tsung.erlang-projects.org/user_manual/faq.html#why-do-i-have-error-connect-emfile-errors

References:
 https://gist.github.com/huberflores/2827890
  https://github.com/ngocdaothanh/tsart
 https://gist.github.com/clasense4/47438a884cabca9e66c8 

 http://www.jeramysingleton.com/install-erlang-and-elixir-on-centos-7-minimal/

 https://jackiechen.org/2015/12/04/use-tsung-to-test-https-site/

Tuesday, July 19, 2016

Java: BloomFilter Benchmark

Intro

Bloom filter is a probabilistic data structure for searching element in a data set.
It is similar to HashSet, similarly it tells us whether the set contains certain element or not. Difference is the output of contains(element)=TRUE is futuristic.
In our example we set futuristic value to 0.01, which means the answer "It contains" is 99% correct.
Read more about Bloom filter from here: https://en.wikipedia.org/wiki/Bloom_filter

Scenario

We create two Arrays of random elements. Elements count in each array is 1,000,000.
Then we insert the first array into BloomFilter, and we iterate the first array and check if the item contains in BloomFilter. Second array is used only for checking non-existing elements.
We do the same for HashSet as described above.

Benchmarking code

We used customized version of Bloom filter  which can accept byte array.
(Previous version of this blog was using encoding of string for every put and contains, which was misguiding the performance of bloom filter)

source code:
(Source code is not organized for compilation, please modify it for your use)

Performance output:

Output
Testing BloomFilter  1000000 elements
add(): 0.176s, 5681818.181818183 elements/s
contains(), existing: 0.171s, 5847953.216374269 elements/s
Testing HashSet  1000000 elements
add(): 0.181s, 5524861.878453039 elements/s
contains(), existing: 0.08s, 1.25E7 elements/s

Memory size:


BloomFilter is the winner here. With 99% correctness the memory footprint is almost 40 times smaller than HashSet.


If we reduce correctness to 90%, then the memory footprint is reduced to 80 times.

Conclusion

We saw that BloomFilter as fast as HashMap. However, it is very space efficient.
If we have a list of URLs in HashMap in Memory, By using BloomFilter we can reduce it to 40 times. For example, if occupied memory is 500 Mb it can be reduced to 12 Mb with correctness of 99%

Sunday, May 8, 2016

NLP for Uzbek language

    Natural language processing is an essential tool for text mining in data analysis field. In this post, I want to share my approach in developing stemmer for Uzbek language.
     Uzbek language is spoken by 27 million people around the world and there are a lot of textual materials in internet in uzbek language and it is growing.
As I was doing my weekend project "FlipUz" (which is news aggregator for Uzbek news sites) I stumbled on a problem of automatic tagging news into different categories. As this requires a good NLP library, I was not able to find one for Uzbek language.
That is how I got a motive to develop a stemmer for Uzbek language.
      In short, Stemming is an algorithm to remove meaningless suffixes at the end, thus showing the core part of the word. For example: rabbits -> rabbit.
As Uzbek language is similar to Turkish, I was curious if there is stemmer for Turkish. And I found this: Turkish Stemmer with Snowball. Their key approach was to use Finite state machine.
This was an interesting approach. I liked the simplicity of it. The only thing that i had to do was to model the suffixes transformations in state machine.
As the stemmer can be applied for all kind of words, in the first step, the Nouns was the target.
Therefore, I created the state machine for nouns:
While drawing this diagram, I referenced the Uzbek language phonetics and word formation rules from the Uzbek language book. The book was very helpful. Though, I still did not use much of it yet.

I used python language, for its easiness and richness of external libraries.
Here is the source code:
from fysom import Fysom
def stem(word):
    fsm = Fysom(initial='start',
                    events=[
                    ('dir', 'start', 'b'),
                    ('dirda', 'start', 'b'),
                    ('ku', 'start', 'b'),
                    ('mi', 'start', 'b'),
                    ('mikan', 'start', 'b'),
                    ('siz', 'start', 'b'),
                    ('day', 'start', 'b'),
                    ('dek', 'start', 'b'),
                    ('niki', 'start', 'b'),
                    ('dagi', 'start', 'b'),
                    ('mas', 'start', 'd'),
                    ('ning', 'start', 'f'),
                    ('lar', 'start', 'g'),
                    ('lar', 'e', 'g'),
                    ('dan', 'd', 'e'),
                    ('da', 'd', 'e'),
                    ('ga', 'd', 'e'),
                    ('ni', 'd', 'e'),
                    ('dan', 'start', 'e'),
                    ('da', 'start', 'e'),
                    ('ga', 'start', 'e'),
                    ('ni', 'start', 'e'),
                    ('lar', 'f', 'g'),
                    ('miz', 'start', 'h'),
                    ('ngiz', 'start', 'h'),
                    ('m', 'start', 'h'),
                    ('si', 'start', 'h'),
                    ('i', 'start', 'h'),
                    ('ng', 'start', 'h'),
                    ('miz', 'f', 'h'),
                    ('ngiz', 'f', 'h'),
                    ('m', 'f', 'h'),
                    ('si', 'f', 'h'),
                    ('i', 'f', 'h'),
                    ('ng', 'f', 'h'),
                    ('miz', 'e', 'h'),
                    ('ngiz', 'e', 'h'),
                    ('m', 'e', 'h'),
                    ('si', 'e', 'h'),
                    ('i', 'e', 'h'),
                    ('ng', 'e', 'h'),
                    ('lar', 'h', 'g'),
                    ('dagi', 'g', 'start')
                    ]
                );
    i = len(word) - 1
    j = len(word)
    while(True):
        if (i<=0):
            break
        v = word[i:j]
        #print v
        res = fsm.can(v)
        if (res):
            if (v == 'i' and fsm.can(word[i-1:j])):
                i = i - 1
                continue
            fsm.trigger(v)
            if (fsm.current == 'h'):
                if (word[i-1:i]=='i'):
                    i = i - 1 #skip i
                    if (word[i-1:i]=='n' ):
                            # ning qushimchasi
                        fsm.current = 'start'
                        continue
            elif (fsm.current == 'b'):
                fsm.current = 'start'
            j = i
            # print fsm.current
        i =  i - 1
    return word[:j]
It is available in github also.
We are collaborating with Uzbek developers friends to develop full-featured NLP library for Uzbek language.
The next step is to apply stemming for Verbs.
Let me know if you have some ideas on this. thanks.

Test outputs:

print stem('mahallamizdagilardanmisiz')
mahalla