Using Solr to Search Magento by Partial SKU


I recently needed to implement searching by partial SKU in Magento while using Solr. A quick search on the internet turned up a pair of posts (here and here) that were nearly identical. Both posts offer a bit of a walk through of the code, though not explaining what all the code does. I’m going to take the time here to break down the configuration changes needed to make this work. I will also discuss where I made changes to the proposed configuration from those posts.

First, the full code (for the TL;DR of you):

In schema.xml add the following changes:

1) Towards the bottom of the document inside the “schema” node add:
<copyField source="sku" dest="sku_partial" />

2) Inside the “fields” node add:
<field name="sku_partial" type="sku_partial" indexed="true" stored="true"/>

3) Inside the “types” node add:
<fieldType name="sku_partial" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="1000" side="front" />
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="1000" side="back" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
</analyzer>
<!--fieldType>

4) Next in solrconf.xml find the “requestHandler” node(s) used for your store’s locale. (You can do this by searching for “magento_en” for English, and “magento_fr” for French). Now change the line(s) following lines from:
<str name="qf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0
<str name="pf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0

to:
<str name="qf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0 sku_partial^1.0
<str name="pf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0 sku_partial^1.0

Now I’ll walk you through the configuration, line by line and explain what each line is actually telling Solr to do.

1) <copyField source="sku" dest="sku_partial" />
This code is telling Solr to copy the data from the SKU attribute that Magento sends it into a new field called “sku_partial”. We will define this field in the one of the following steps.  We do this so that we can manipulate how Solr treats that field without affecting the original data.

2) <field name="sku_partial" type="sku_partial" indexed="true" stored="true"/>
This is where we define our custom field that we copied the data into. Notice we are using a “type” of “sku_partial”. That is a custom field type that we will setup next (hint: that’s where the magic happens to allow us to search on partial values)

3) Now I will go through the field_type definition line by line:

<fieldType name="sku_partial" class="solr.TextField">
This sets up the custom field type and inherits from solr.TextField. This is just a plain text value (alphanumeric).

<analyzer type="index">
This tells us that the inclosed lines are used during indexing as opposed to during querying.

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
This is a tokenizer. A tokenizer defined in the indexer breaks up each document into many parts. Those parts are then treated as separate pieces of information that are examined during search. In this case, Solr is breaking up the document using white space (spaces, tabs, new lines, etc…)

<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="1000" side="front" />
This line defines a filter to apply to the document. Filters allow you to further manipulate the contents of the document. In this case we are applying an NGramFilterFactory. This allows us to further break the content up. Here we have told Solr to break up the contents into chunks as small as 3 characters all the way up to 1000 characters. So text such as “supercalafragalisticexpialodcious” gets broken into:
sup
supe
super
....

Why would you want to do this? Well, suppose you searched on “super”? You would expect to find “supercalafragalisticexpialodcious”. By breaking up the word this way, it makes it easier for Solr to find this match since one of the parts it will have indexed will be “super”.

<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="1000" side="back" />
Notice the only thing different on this line is that side is set to “back”. This tells Solr to do the same grouping, but to work the document backwards. This yields terms like:

cious
ious
ous

This serves the same purpose as “front” does by providing more possibilities for a search term to match against. (I’ll concede that my chosen document word doesn’t lend itself well as an example here. Imagine a document composed of a large paragraph of text. Each word in that text would be subjected to this filter which would allow it to match partial words).

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
A StopFilter strips designated stop words from the document. A stop word are words such as “a”, “an”, “is”. These words are looked up from the file designated by the “words” attribute (in this case stopwords.txt). Why remove words? Because they are deemed irrelevant and likely to cause false-positives. Consider a query for “an apple”. Without stop words, we would return every document that contained the word “an” as well as those that contain “apple” and “an apple”.

<filter class="solr.LowerCaseFilterFactory"/>
This filter causes everything in the document to be converted to lowercase before being stored in the index. This allows for case-insensitive searches.

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
This filter removes any duplicate terms from the index. This reduces storage and simplifies the index allowing for cleaner results. There are times you may not want to use this though.  The frequency of a term showing up in a document can yield more relevant results. (Finding the word “red” 4 times in a block of text should make that result more relevant than one where “red” is only found once)

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
The SnowballPorterFilter applies common endings to terms. So “apple” becomes “apples”. “Profession” becomes “professional”, “professions”, etc.. These common endings are defined in the file specified by the “protected” attribute.

<analyzer type="query">
This tells us that the inclosed lines will be used during querying (as opposed to the indexer section we just finished).

<tokenizer class="solr.StandardTokenizerFactory"/>
This is a general purpose tokenizer.  It has some basic built-in rules for breaking apart each search term into various parts.

<filter class="solr.LowerCaseFilterFactory"/>
Just like in the indexer analyzer block, this filter allows for case-insensitive searches.  Here it is done by converting the search terms to lowercase before comparing them against the index.

<filter class="solr.TrimFilterFactory" />
This filter removes white space from both sides of each search term in the query.

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
A SynonymFilter does a key value match against a synonym list and adds those terms to the search query. (GB, G, gig are all synonyms for gigabyte. All these would be used to search against the index for the document allowing for a greater chance of matching.

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
This functions just like the filter in the indexer.

<str name="qf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0 sku_partial^1.0
<str name="pf">fulltext1_en^1.0 fulltext2_en^2.0 fulltext3_en^3.0 fulltext4_en^4.0 fulltext5_en^5.0 sku_partial^1.0</str>

Both of these lines add our new custom field with a boost of 1.0 (effectively, no boost) to the query fields (qf) and phrase fields (pf). Phrase fields come into play after the results have been generated. This is where you can affect the ranking of the results further.

While that covers what each line represents, here is what I did differently from the reference posts and why. In both of those posts, you will find they include this line in the query analyzer:
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
What this line does is tell Solr to split the search term into parts based on a guess at possible combined words. So Solr will see 58SKU2 as three separate terms: 58, SKU, and 2. This is not what we want. Because now it will try to for these terms and not likely find the match we are looking for (if it finds one at all). So I removed the line from my configuration.  In the spirit of this post, here is what each of the settings does.

generateWordParts="1"
This causes Solr to split the term into word parts. So if you searched for “56MYPRODUCT1234” it would split into “56MY” and “PRODUCT1234”.

generateNumberParts="1"
Like word parts above, this splits out numbers from the term. So “56MYPRODUCT1234” becomes “56”, “MYPRODUCT”, and “1234”.

catenateWords="0"
If this is set to “1” this will combine our split out words together to form a search query.  Assuming the first 2 settings remain set at “1”, “56MYPRODUCT1234” generates the following query: 56 OR MY OR PRODUCT OR 1234 or "MY PRODUCT".

catenateNumbers="0"
If this is set to “1” this will combine our split out numbers together to form a search query.  Assuming the first 2 settings remain set at “1”, “56MYPRODUCT1234” generates the following query: 56 OR MY OR PRODUCT OR 1234 or "56 1234".

catenateAll="0"
If this is set to “1” this will combine our split out numbers and words together to form a search query.  Assuming the first 2 settings remain set at “1”, “56MYPRODUCT1234” generates the following query: 56 OR MY OR PRODUCT OR 1234 or "56 MY PRODUCT 1234".

splitOnCaseChange="1"
This will split the term when the case of the term changes. E.g. MyProduct splits into “My” and “Product”.

My Thoughts on MageHero


MageHero – Awesome Magento Developers

If you don’t know what MageHero is and you are a developer working with Magento, go check it out first then come back to read the rest of this post.  If you are a merchant or recruiter, I would recommend you wait for now.  The site is very much a work in progress and isn’t really ready for you as an audience yet (I have no doubt that one day it will be one of your first stops when looking for a developer to hire.  That day just isn’t here yet).

First iteration

At the original launch of the site, it was nothing more than just a page that would list Magento developers to whom Kalen Jordan could refer people to.  He had a simple problem and started with a simple solution.  Like many of us, he is constantly getting requests for help with someone’s Magento storefront that don’t necessarily fall into the “Go talk to Agency X” group but are more of the “my buddy John has a few spare hours where he can help you out” type.  So he setup a quick and dirty site that would allow people to register themselves by logging in using their Github account.  Once he approves them, he can then add those people to the list.

Then there was trouble…

What Kalen soon realized was that this wasn’t going to scale.  I’m sure he expected to get somewhere around a dozen freelancers sign-up with their availability listed.  Instead, the community at large barged in the door (at the time of this writing, there are about 180 developers ranging from freelance, to agency, to end merchant developers signed up with the site being less than 2 weeks old).  One person doing manual review was not going to work.  So how to move forward?

Voting

Let’s get the community to help.  This is actually a pretty good idea.  However, there are some things that should be ironed out first.  Who gets to vote?  What is the criteria used for voting?  How do you keep the list from being a “good ol’ boys club” listing only a few well-known developers?

Let’s take those one at a time and see how Kalen has approached it.

Who gets to vote? / How do you keep out the spammers?

This one is a moving target.  Initially it was anyone with 2 or more votes. Currently, it is anyone with 4 or more votes.  There is good reason for this change.  Since nearly everyone who joins gets a single vote from Kalen shortly after they join, this would only require each person to get just one more vote.  That’s a pretty thin margin that might allow someone voting access that could wreck the entire system.  By requiring 4 votes it helps improve the odds that voting members are really deserving of that right.  I’ll come back to the issue of voting a little later as I have some concerns on it.

What is the criteria used for voting?

Initially, this was pretty vague.  I think somewhat intentionally so as Kalen was trying to let the community help form the site.  However, there is now a suggested description for voting criteria.  I say suggested as the voting system is still opinion based.  Voting is not (not do I think it ever will be) policed.  However, as each vote is publicly viewable, you should consider your votes as endorsements of the person you are voting for.

How do you keep the list from being a “good ol’ boys club”?

This one is a bit tougher to solve.  It involves a great deal of trust in the community.  I haven’t seen anything specifically done by Kalen to make sure this doesn’t happen, so only time will tell if does or not.  However, my money is on it not happening.  What I know and have seen of the Magento community is that it is doing its best to help each other.  From the great Alan Storm and his blog posts that have taught and inspired most of us, to the unbeatable Marius Strajeru answering questions on Stack Exchange so fast that it has led to a site determined to help others know if he is awake or sleeping (as the only time others are able to get an answer in faster than Marius is when he is asleep).

To infinity and beyond

So where is the site headed?  I’m not quite sure.  I don’t know that even Kalen knows (if he does, he hasn’t shared anything that I’ve seen on it).  Some of his short-term plans are visible through Github issues.  These include the ability to post updates (per Kalen: “basically updates on what you’re working on – pretty much what we all use twitter for currently.”) and some potential monetization opportunities.  Whatever the case, I think MageHero is here to stay (and that’s a good thing)

My feedback

So if I haven’t gotten into trouble with some people based on my interpretations above, I suppose this is the part where I might step on some toes.  I won’t apologize for this if I do as Kalen has been pretty vocal in asking for feedback.  I’m not saying that my way is the right way.  I may not even be proposing alternatives.  All I’m trying to do is point out potential problem spots as I see them.

Back to voting

I’ve cover some of this above, but I think it bears reiterating some.  There is risk of really good developers joining and never receiving any votes.  The risk comes from these developers not being the type that “toot their own horn”.  They write great code and can get the job done.  However, because they tend to keep to themselves on Twitter, etc they fly under the radar of the rest of the community.  Is this wrong?  Certainly not!  Some people just don’t care for the limelight.  That doesn’t mean they should not be able to be on the list.  In fact, I could argue it might be likely they are more talented than some on the list with a lot of votes.  How do you solve this problem?  I don’t have a good solution to the voting issue.  However, I think Kalen is on the right track.  Making the number of votes less visible helps to remove the popularity contest aspect as it is not immediately clear that one person is higher than another (and I don’t think that was the original intent anyway.  The intent was to list quality developers.)  Instead you see groupings. You have group A which has achieved enough votes to be a voter, you have group B that has at least 1 vote, and then you have group C which are those that have no votes.  Likely those in group C have only recently joined and will shortly be part of group B (or weeded out as spam accounts).  This gives you a much more flat list.  With some of the plans Kalen has for providing ways for people to get more votes, I think those that are less vocal will have some equal footing with the rest of the community.

While this is the issue that lead to my original complaint against a voting system, it was not meant to say I don’t want to participant.  Over the course of my career I think I have done a pretty good job of trying to stand up for those that work under me or with me but are less vocal than I am.  That’s all I’m trying to do here.  I care about the Magento developer community as a whole – those that are at the top, as well as those that are the silent warriors building awesome sites for merchants but content to focus on writing code.

Why are you still on PHP 5.3?


Seriously, why?

Let’s start with some facts. PHP 5.3 officially experienced EOL (End Of Life) in March of 2013. That’s nearly 1 1/2 years ago (an eternity in tech years). The last bug fix for PHP 5.3 was released in December of 2013 – 8 months ago! There has probably been 2 dozen flash bug releases since then (seriously flash, please just die already).

If that isn’t enough to convince you to upgrade, how about a nice little case study in the performance gains PHP 5.4 affords you?

I recently worked on a server migration for a client. We’ll call them Client X. Client X was migrating from one hosting provider to another (for reasons not relevant to this post). The new provider does not support PHP 5.4 (crazy, I know. But let’s move on). So we initially setup the site on PHP 5.3.

In preparing for the switch over, we noticed the site was notably slower than the old server, even though the server specs were very similar. As we started on the task of diagnosing the slowness we found a few things. First, the network card was set to half-duplex on a 100MB/s backend. Even after fixing it to full we still found performance was slow. We then had the provider move the server to a 1G/s backend. This made a major difference. Not only did we see a speed improvement by a factor of 2, but load capacity increased to nearly 3 fold. A very big difference indeed. Yet the site was still slower.

At the same time that we were troubleshooting performance, we had also encountered a weird issue where SOAP requests were failing when zlib compression was enabled in PHP. After some internal debate, a business decision was made to “risk” running PHP 5.4 without support from the provider. After upgrading, we notice the site felt much better – even comparable to the old server. We ran another load test. The results were surprising. Just upgrading from PHP 5.3 to 5.4 we say an improvement of 20% in average response time and 45% improvement in capacity.

So the lesson here? NOT upgrading from PHP 5.3 to at least 5.4 is down right crazy.

Load Test Results

Test in chronological order. PHP 5.4 upgrade is the top test.

As always, article is provided “as is” without any warranties of any kind either expressed or implied. Any resemblance to actual persons, living or dead, is unintentional and purely coincidental. At participating locations only. Not responsible for loss of limb or life. Batteries not included. Beware of dog. Consult your physician before use. Keep away from sunlight, pets, and small children. Your mileage may vary.

Introducing TogglToJira


As a consultant, keeping track of my time is obviously very important. I use Toggl to track every task (billable or not) that I work on for each client. However, I also have to update each client’s Jira project with the time I spent on each ticket as well as our billing system’s timesheet (we’ll leave that for another post and another day). This used to be a very manual process. I would have to check Toggl reports for each day and then log into Jira, pull up the ticket, and add a worklog. Here is where TogglToJira comes in.

TogglToJira provides a basic level of automation. I no longer have to manage the worklog entries in Jira. I only have to keep track of my time in Toggl. Here is how it works.  In Toggle, I have each client and project setup.  Each task that I work on is simply the ticket number from Jira (in a future version I will pull the ticket number from either a tag or the Toggl task ID so that I can use the description field as the worklog comment).  TogglToJira uses YAML files to understand which clients get worklog entries created as well as the URL for their Jira instance (and what credentials to use with Jira’s REST API).  I do this because not every client uses Jira and not every task gets logged to a Jira ticket (we have internal non-billable tasks that don’t get logged to tickets, but the time still needs to be reported in our timesheet system).  The YAML for the Toggl settings looks like this:

---
workspace_id: 
user_agent: TogglToJira 
api_token:

Now you can probably guess, finding the workspace_id is a little cumbersome.  I wish the Toggl API allowed you to either specify a workspace name or to allow you to simply use your default workspace.  Alas, it doesn’t.  So to find your workspace ID, go to https://www.toggl.com/app/workspaces and click the little box to the left of the workspace name you want to work with.  Choose “Settings” from the dropdown menu.  You are taken to a page where you can edit the settings for that workspace.  Your workspace ID will be in the URL of your browser (it’s the number at the end of https://www.toggl.com/app/workspaces/edit/).

The user_agent can be left set at TogglToJira as long as you are not making any changes to the code (if you do plan to make changes, please set the user_agent to your email address.  This will allow Toggl to identify your API calls and let you know if you are doing something bad/wrong).

You can find your api_token on your profile page at https://www.toggl.com/app/profile.

You also need to configure the YAML file for your Jira sites should look like this:

---
Sites:
    Jira1:
        url:
        user:
        pass:
    Jira2:
        url:
        user:
        pass:
Clients:
    Client Name From Toggl1:
        site: Jira1
    Client Name From Toggl2:
        site: Jira2

Here you will define sites and clients.  Each site corresponds to a Jira instance.  Many of our clients have their tickets managed in our Jira instance.  So I set those up under “Jira1” providing the base URL for the Jira instance (e.g. https://yourcompany.atlassian.net or https://yourcompany.jira.com) along with the same credentials I use to log into the web interface.  I setup each client using the name I gave the client in Toggl and then a reference to the site that client is using for Jira.

Now for the fun part: running it.  You can execute the process from a terminal window by running:

php bootstrap.php [date in format yyyy-MM-dd]

The date is optional.  If you don’t specify it, then it will use yesterday’s date.  Below is an example of the output you should see:

******** Created Jira Worklogs **********
Array
(
[0] => Client1 ticket CLIENT1-123 was logged on 2014-08-01 for 0.75h
[1] => Client1 ticket CLIENT1-456 was logged on 2014-08-01 for 0.5h
[2] => Client2 ticket CLIENT2-789 was logged on 2014-08-01 for 0.25h
)
******* No Jira Worklogs Created ********
Array
(
[0] => Internal task Timesheet was NOT logged on 2014-08-01 for 0.25h
)
*****************************************

I’ll admit, the output is a bit rudimentary. But it works for my needs. Check out the project on GitHub and let me know what you think in the comments below.

Fixed Price Sales Rules (Coupons)


Did you know Magento 1.x supports “fixed price” sales rules (ex, Coupons) out of the box? Not many people do and there’s a reason for that. There is a one-line bug that has been left unfixed for over <a href=”http://www.magentocommerce.com/bug-tracking/issue?issue=8627&#8243; target=”_blank”>4 years now</a>. Here is a workaround to allow you to use them:
Add the following to an Observer:

public function workaroundForFixedPricedSalesRules($observer)
{
    $form = $observer->getEvent()->getForm();
    $rule_actions_fieldset = $form->getElements()->searchById('action_fieldset');
    $simpleAction = $rule_actions_fieldset->getElements()->searchById('simple_action');
    $values = $simpleAction->getValues();
    $values[] = array(
        'value' => Mage_SalesRule_Model_Rule::TO_FIXED_ACTION,
        'label' => Mage::helper('salesrule')->__('To Fixed Amount'),
    );
    $simpleAction->setValues($values);
    return $this;
}

 

In your config.xml, add this to the section:

<adminhtml_block_salesrule_actions_prepareform>
    <observers>
        <admin_salesrule_tofixed>
            <type>singleton</type>
            <class>NAMESPACE_MODUELNAME_Model_Observer</class>
            <method>workaroundForFixedPricedSalesRules</method>
        </admin_salesrule_tofixed>
    </observers>
</adminhtml_block_salesrule_actions_prepareform>

Now login to the Admin panel, go to Promotions->Shopping Cart Rules and find “To Fixed Amount” available in the dropdown on the Actions tab. This will allow you to create a rule that forces the price of a product (or group of products) to a specific price regardless of what the current price is (unless, of course, the price is lower than your new rule’s price). This means that you could have a rule that says “French Fries are always $1.99 when the customer enters code FRIES at checkout” and you don’t have to worry about what the current price is for French Fries (as opposed to trying to set a percent off and hoping that someone doesn’t decide to set the price to $2.00 breaking your percent off coupon’s intent).

Who do you optimize for?


It’s a fairly well-known fact that search engines (especially Google) factor in website speed in how they rank your site in search results. With that in mind, do you still optimize website speed for your users? Or do you do this more to cater to SEO? I’m particularly curious after reading this post on creating the illusion of speed where one of the proposed actions is to “Pretend to work, even when you don’t” (they don’t actually encourage you to “lie” to your users, if anything the heading is a poor choice of words).  The article is clearly targeted at optimizing for the user and does offer some interesting techniques to use.  However, much of it seems like putting lipstick on a pig. Let me know your thoughts in the comments below.

Debugging Java Applications


This process is an addition to the regular debugging techniques that include:

  1. Check all the processes and resources (CPU, RAM, IO, disk space, etc.)
  2. Check log files for errors
  3. Check network connectivity
  4. Check DNS (JVM by default caches the name resolution of hosts. So, if you have changed an IP for a certain name – you’ll need to restart the app to cache the new one)

——–

Alright, these are the additions:

  1. Java processes are usually multithreaded. So, if you just run “ps auxww | grep java” -> chances are that you’ll only see the parent processes.If you want to see all the threads and the resources they are taking – do this (replace JAVA_PID with the parent process ID number):

    top -Hp JAVA_PID

    or if you don’t have “top” installed:

    ps -C java -L -o pcpu,cpu,nice,state,cputime,pid,tid

  2. Make a note of the highest CPU/memory consuming threads and their PIDs
  3. Next – we need to get a thread dump of all the threads (without affecting the application or stopping the server) – $JAVA_HOME is where your java path is and JAVA_PID is the parent process ID number (same one from step 1); $USER is the name of the user the application is running as:

    sudo -u $USER $JAVA_HOME/bin/jstack JAVA_PID > threads.out

  4. If everything went ok with step 3 – you’ll have a file with info about all the threads.
  5. You can examine the threads.out file and see which threads are BLOCKED or in “WAITING (on object monitor)” status. They will also tell you what LOCKED object they are waiting for. You can examine the file further to see exactly which thread is locking the object that all the blocked threads are waiting for. Then you can give the info to the developers and they should know how and what to fix (we hope).
  6. All the threads IDs in thread.out file are in HEX format. Now, get the thread ID you took a note of in step 2 and convert it into HEX format (let’s say the highest CPU PID of the thread was 12345):

    printf “0x%xn” 12345

    This will produce a result something like: 0×3039

    Then you take the HEX ID (in this case 0×3039) -> and you track it down in the threads.out file to see info for that particular thread.

  7. If you don’t have a commercial java profiler (like Wiley) -> since JDK 1.6.7 there is a visual profiler included with your jdk

 


This is a guest post by Nick Stoianov.  Nick is a System Administrator with expert experience configuring Apache, MySQL, and WebSphere. He is very adept at troubleshooting web applications as well as tuning web server environments for performance