How to Identify a MongoDB Performance Anti Pattern in Five Minutes

The other day I was looking at a web application that was using MongoDB as its central database. We were analyzing the application for potential performance problems and inside five minutes I detected what I must consider to be a MongoDB anti pattern and had a 40% impact on response time. The funny thing: It was a Java best practice that triggered it.

Analyzing the ApplicationThe first thing I always do is look at the topology of an application to get a feel for it.

Overall Transaction Flow of the Application

As we see it's a modestly complex web application and it's using MongoDB as its datastore. Overall MongoDB contributes about 7% to the response time of the application. I noticed that about half of all transactions are actually calling MongoDB so I took a closer look.

Flow of Transactions that access MongoDB, showing 10% response time contribution of MongoDB

Those transactions that actually do call MongoDB spend about 10% of their response time in that popular document database. As a next step I wanted to know what was being executed against MongoDB.

Overview of all MongoDB commands. This shows that the JourneyCollection find and getCount contribute the most to response time

One immediately notices the first two lines, which contribute much more to the response time per transaction than all the others. What was interesting was that thegetCount on the JourneyCollection had the highest contribution time, but the developer responsible was not aware that he was even using it anywhere.

Things get interesting - the mysterious getCount callTaking things one level deeper, we looked at all transactions that were executing the ominous getCount on the JourneyCollection.

Transactions that call JourneyCollection.getCount spend nearly half their time in MongoDB

CIO, CTO & Developer Resources

What jumps out is that those particular transactions spend indeed over 40% of their time in MongoDB, so there was a big potential for improvement here. Another click and we looked at all MongoDB calls that were executed within the context of the same transaction as the getCount call we found so mysterious.

All MongoDB Statements that run within the same transaction context as the JourneyCollection.getCount

What struck us as interesting was that the number of executions per transaction of thefind and getCount on the JourneyCollection seemed closely connected. At this point we decided to look at the transactions themselves - we needed to understand why that particular MongoDB call was executed.

Single Transactions that execute the ominous getCount call

It's immediately clear that several different transaction types are executing that particulargetCount. What that meant for us is that the problem was likely in the core framework of that particular application rather than being specific to any one user action. Here is the interesting snippet:

The Transaction Trace shows where the getCount is executed exactly

We see that the WebService findJourneys spends all its time in the two MongoDB calls. The first is the actual find call to the Journey Collection. The MongoDB client is good at lazy loading, so the find does not actually do much yet. It only calls the server once we access the result set. We can see the round trip to MongoDB visualized in the call node at the end.

We also see the offending getCount. We see that it is executed by a method called sizewhich turns out to be com.mongodb.DBCursor.size method. This was news to our developer. Looking at several other transactions we found that this was a common pattern. Every time we search for something in the JourneyCollection the getCountwould be executed by com.mongodb.DBCursor.size. This always happens before we would really execute the send the find command to the server(which happens in the callmethod). So we used CompuwareAPM DTM's (a.k.a dynaTrace) developer integration and took a look at the offending code. Here is what we found:

The code looks harmless enough; we execute a find, create an array for the result and fill it. The offender is the location.size(). MongoDBs DBCursor is similar to the ResultSet in JDBC, it does not return the whole data set at once, but only a subset. As a consequence it doesn't really know how many elements the find will end up with. The only way for MongoDB to determine the final size seems to be to execute a getCountwith the same criteria as the original find. In our case that additional unnecessary roundtrip made up 40% of the web services response time!

An Anti-Patter triggered by a Best PracticeSo it turns out that calling size on the DBCursor must be considered an anti-pattern! The real funny thing is that the developer thought he was writing performant code. He was following the best practice to pre-size arrays. This avoids any unnecessary re-sizing. In this particular case however, that minor theoretical performance improvement led to a 40% performance degradation!

ConclusionThe take away here is not that MongoDB is bad or doesn't perform. In fact the customer is rather happy with it. But mistakes happen and similar to other database applications we need to have the visibility into a running application to see how much it contributes to the overall response time. We also need to have that visibility to understand which statements are called where and why.

In addition this also demonstrates nicely why premature micro optimization, without leveraging an APM solution, in production will not lead to better performance. In some cases - like this one - it can actually lead to worse performance.

Michael Kopp has over 12 years of experience as an architect and developer in the Enterprise Java space. Before coming to CompuwareAPM dynaTrace he was the Chief Architect at GoldenSource, a major player in the EDM space. In 2009 he joined dynaTrace as a technology strategist in the center of excellence. He specializes application performance management in large scale production environments with special focus on virtualized and cloud environments. His current focus is how to effectively leverage BigData Solutions and how these technologies impact and change the application landscape.

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Business and IT leaders today need better application delivery capabilities to support critical new innovation. But how often do you hear objections to improving application delivery like, “I can harden it against attack, but not on this timeline”; “I can make it better, but it will cost more”; “I can deliver faster, but not with these specs”; or “I can stay strong on cost control, but quality will suffer”? In the new application economy, these tradeoffs are no longer acceptable. Customers will ...

The free version of KEMP Technologies' LoadMaster™ application load balancer is now available for unlimited use, making it easy for IT developers and open source technology users to benefit from all the features of a full commercial-grade product at no cost. It can be downloaded at FreeLoadBalancer.com.
Load balancing, security and traffic optimization are all key enablers for application performance and functionality. Without these, application services will not perform as expected or have the...

Hadoop as a Service (as offered by handful of niche vendors now) is a cloud computing solution that makes medium and large-scale data processing accessible, easy, fast and inexpensive.
In his session at Big Data Expo, Kumar Ramamurthy, Vice President and Chief Technologist, EIM & Big Data, at Virtusa, will discuss how this is achieved by eliminating the operational challenges of running Hadoop, so one can focus on business growth. The fragmented Hadoop distribution world and various PaaS soluti...

SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY.
Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures...

VictorOps is making on-call suck less with the only collaborative alert management platform on the market.
With easy on-call scheduling management, a real-time incident timeline that gives you contextual relevance around your alerts and powerful reporting features that make post-mortems more effective, VictorOps helps your IT/DevOps team solve problems faster.

As organizations shift toward IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. CommVault can ensure protection &E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise.
In his session at 16th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Partnerships, will disc...

Skytap Inc., has appointed David Frost as vice president of professional services. David joins Skytap from Deloitte Consulting where he served as Managing Director leading SAP, Cloud, and Advanced Technology Services. At Skytap, David will head the company's professional services organization, and spearhead a new consulting practice that will guide IT organizations through the adoption of DevOps best practices. David's appointment comes on the heels of Skytap's recent $35 million Series D fundin...

Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...

Cloud data governance was previously an avoided function when cloud deployments were relatively small. With the rapid adoption in public cloud – both rogue and sanctioned, it’s not uncommon to find regulated data dumped into public cloud and unprotected. This is why enterprises and cloud providers alike need to embrace a cloud data governance function and map policies, processes and technology controls accordingly.
In her session at 15th Cloud Expo, Evelyn de Souza, Data Privacy and Compliance...

Skeuomorphism usually means retaining existing design cues in something new that doesn’t actually need them. However, the concept of skeuomorphism can be thought of as relating more broadly to applying existing patterns to new technologies that, in fact, cry out for new approaches.
In his session at DevOps Summit, Gordon Haff, Senior Cloud Strategy Marketing and Evangelism Manager at Red Hat, will discuss why containers should be paired with new architectural practices such as microservices ra...

Roberto Medrano, Executive Vice President at SOA Software, had reached 30,000 page views on his home page - http://RobertoMedrano.SYS-CON.com/ - on the SYS-CON family of online magazines, which includes Cloud Computing Journal, Internet of Things Journal, Big Data Journal, and SOA World Magazine. He is a recognized executive in the information technology fields of SOA, internet security, governance, and compliance. He has extensive experience with both start-ups and large companies, having been ...

The Workspace-as-a-Service (WaaS) market will grow to $6.4B by 2018. In his session at 16th Cloud Expo, Seth Bostock, CEO of IndependenceIT, will begin by walking the audience through the evolution of Workspace as-a-Service, where it is now vs. where it going.
To look beyond the desktop we must understand exactly what WaaS is, who the users are, and where it is going in the future. IT departments, ISVs and service providers must look to workflow and automation capabilities to adapt to growing ...

There are many considerations when moving applications from on-premise to cloud. It is critical to understand the benefits and also challenges of this migration. A successful migration will result in lower Total Cost of Ownership, yet offer the same or higher level of robustness.
In his session at 15th Cloud Expo, Michael Meiner, an Engineering Director at Oracle, Corporation, will analyze a range of cloud offerings (IaaS, PaaS, SaaS) and discuss the benefits/challenges of migrating to each of...

Platform-as-a-Service (PaaS) is a technology designed to make DevOps easier and allow developers to focus on application development. The PaaS takes care of provisioning, scaling, HA, and other cloud management aspects. Apache Stratos is a PaaS codebase developed in Apache and designed to create a highly productive developer environment while also supporting powerful deployment options.
Integration with the Docker platform, CoreOS Linux distribution, and Kubernetes container management system ...

The industrial software market has treated data with the mentality of “collect everything now, worry about how to use it later.” We now find ourselves buried in data, with the pervasive connectivity of the (Industrial) Internet of Things only piling on more numbers. There’s too much data and not enough information.
In his session at @ThingsExpo, Bob Gates, Global Marketing Director, GE’s Intelligent Platforms business, to discuss how realizing the power of IoT, software developers are now focu...

Red Hat has launched the Red Hat Cloud Innovation Practice, a new global team of experts that will assist companies with more quickly on-ramping to the cloud. They will do this by providing solutions and services such as validated designs with reference architectures and agile methodology consulting, training, and support.
The Red Hat Cloud Innovation Practice is born out of the integration of technology and engineering expertise gained through the company’s 2014 acquisitions of leading Ceph s...

Operational Hadoop and the Lambda Architecture for Streaming Data
Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing...

SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY.
Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes ...

The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals.
In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. ...

DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, it is now feasible to create a rich desktop and tuned mobile experience with a single codebase, without compromising performance or usability.

An anatomy of startup ventures for the Internet of Things market. Like GE describes in their white paper Pushing the Boundaries of Mind and Machine, this is basically a process of innovating through more intelligent machines to reinvent workflow models.
For a useful overview as to what constitutes an ‘IoT startup’, check out one example for some key characteristics: Hutgrip. Hutgrip is a SaaS solution that replaces VPNs with the Cloud and real time analytics, with the headline points being:
Clear description of the business benefit the new technology will bring – Smarter automation of bi...

The Internet of Things has emerged as the universally accepted term for the ‘next big thing’ wave, not replacing but building upon the Cloud Computing cycle, which itself built upon SaaS and ASPs.
There are many technology aspects to this trend, which will be covered extensively throughout this guide and ongoing series, but overall our goal is to describe the associated startup venture opportunities.
Indeed it’s not limited to startups, the IoT represents a new product innovation platform for any and all businesses, and this is the overall theme of this paper.

Creating global change that is actually good for the entire world is a mammoth task. With a population of almost 7 Billion people as of 2015, the planet is taking a toll with surviving the brunt of keeping the works going. What role can Cloud Computing play in making it easier for all of us?

There is continued and escalating hype about "Big Data" and the cloud services that empower big data analytics. But to better understand and demystify this hype, whether exploring Big Data or traditional, structured or unstructured, we need to categorize cloud data into three primary forms.
Aggregate form of cloud data - whether public, hybrid or private - is especially popular with multi-site organizations. Data collected in different locations needs to be processed at some point and using the cloud to aggregate data in a common location is commonsensical for future processing or analyti...

One important differentiator between what passed for digital back in the dot-com days and today’s notion of digital is the role mobile plays. Yes, this company had a mobile site, and they had what the AVP claimed was a “mobile first” plan for their web content, but as yet they had yet to roll out any responsive design. In the final analysis, their digital effort up to this point boiled down to little more than better brochureware, a la 1990s web redesigns.
But more significantly, what was entirely missing from their digital achievements (although the AVP did indicate that it was a roadmap i...

Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities.
Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices
Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...

Over the last couple of years I have talked to numerous enterprise customers, analysts, industry pundits, and others interested in cloud technologies, and one thing is abundantly clear – Platform-as-a-Service (PaaS) seems to mean different things to different people. But the term PaaS is irrelevant – it's just noise. What is relevant, and what is important, is what PaaS does: enable applications. That's what enterprises care about. They want to accelerate application development to get products to market faster and into users' hands sooner.

RealTime Medicare Data analyzes huge volumes of Medicare data and provides analysis to their many customers on the caregiver side of the healthcare sector using HP Vertica.
Here to explain how they manage such large data requirements for quality, speed, and volume, we're joined by Scott Hannon, CIO of RealTime Medicare Data and he's based in Birmingham, Alabama. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Welcome to my four-part series on what I’m going to call the Art of DevOps. We will embark on a mission to reveal the extremely valuable intelligence that’s been collected about a unique strategy to continuously deliver assets to the operational battleground safely, securely and quickly. This strategy drives optimal monitoring of the frontlines and enhanced communications with the troops supporting the initial development. I do make the assumption that most of you are battle-worthy veterans in one or more of the environments that I’ll review.
Beyond this introduction blog, I’ll brief you on ...

Although programmers for decades have been starting their careers by writing a “Hello World” program, the phenomenon has remained as unknown to the general public as the song of the same name by Lady Antebellum. But the former is now about to change.
Programming is in fact the real “new way of working”. For years education for the general public was largely confined to learning how to use computers. Thus we created operators rather than programmers. And today operators of all kinds are rapidly being replaced by automation. Just look at the big banks where people who “worked with computers” ...

CIOs are increasingly concerned about cloud security. And they should be: with the recent outbreak of visible breaches at high-profile organizations like Target, Anthem, and others, and the subsequent damage they cause, corporations are scrambling to make sure their cloud applications, whether on private, public, or hybrid clouds, are safe.
But cloud security is complex. With ephemeral applications and services springing up around multiple data centers, with dozens or hundreds of independent microservices each with their own access mechanisms, with the widespread adoption of virtualization an...

In September, Macy’s announced that they will invest $1 Billion into their omni-channel strategy. When spending so much money the question that immediately comes up is how to measure the success? Key questions such as “Are conversion rates increasing as planned?” and “How good is the User Experience for each channel?” need answers. Since my first blog about omni-channel monitoring, I came across a variety of customer implementations. Today, I want to share with you how Solocal Group, provider of the French Yellow Pages (PagesJaunes), brings all key metrics together on a single dashboard....

I attended a Meetup yesterday in Mountain View, hosted by The Hive group on the subject of Lambda Architecture. Since I had never heard about this new phrase, my curiosity took me there. There was a panel discussion and panelists came from Hortonworks, Cloudera, MapR, Teradata, etc.
Lambda Architecture is a useful framework to think about designing big data applications. Nathan Marz designed this generic architecture addressing common requirements for big data based on his experience working on distributed data processing systems at Twitter. Some of the key requirements in building this archi...

One question that springs to mind is why invest in embedded physical SDN vSwitches which sit in-line between each and every appliance and their corresponding access ports? Why not simply apply the appropriate SDN configuration to the access switch on a per-port basis (assuming the access switch is SDN capable – of which there are a growing number)? Given that SDN (if present) is much more likely to be found in the data-centres and core networks than out to the access layer and remote branch offices, ONA would allow SDN policies to be pushed out further than existing SDN deployment would allo...

One of the neat things about microservices is the ability to segment functional actions into scalability domains. Login, browsing, and checkout are separate functional domains that can each be scaled according to demand. While one hopes that checkout is similarly in demand, it is unlikely to be as popular as browsing, after all, and the days of wasting expensive money on idle compute resources went out when the clouds descended.
In that same vein comes the ability to also create performance domains. After all, if you're scaling out a specific functional service domain you can also specify p...

The competition among public cloud providers is red hot, private cloud continues to grab increasing shares of IT budgets, and hybrid cloud strategies are beginning to conquer the enterprise IT world.

Big Data is driving dramatic leaps in resource requirements and capabilities, and now the Internet of Things promises an exponential leap in the size of the Internet and Worldwide Web.

The world of SDX now encompasses Software-Defined Data Centers (SDDCs) as the technology world prepares for the Zettabyte Age.

Add the key topics of WebRTC and DevOps into the mix, and you have three days of pure cloud computing that you simply cannot miss.

Cloud Expo - the world's most established event - offers a vast selection of 130+ technical and strategic Industry Keynotes, General Sessions, Breakout Sessions, and signature Power Panels. The exhibition floor features 100+ exhibitors offering specific solutions and comprehensive strategies. The floor also features two Demo Theaters that give delegates the opportunity to get even closer to the technology they want to see and the people who offer it.

Attend Cloud Expo. Craft your own custom experience. Learn the latest from the world's best technologists. Find the vendors you want and put them to the test.