While discussing a web application’s performance or request profiling results, it is vital to distinguish between the latency and the response time.
A simple formula to articulate the difference between response time and latency:
response time = latency + processing time
Response time is the total time it takes for the web service to respond to the sent request, including all networking latencies. Response time is the sum of processing time and encountered latencies.
Processing time is usually the time taken by the server from receiving the last byte of the request and returning the first byte of the response. It does not include the time it takes the request to get from the client to the server or the time it takes to get from the server back to the client.
If we are talking about an API, the server usually does not start processing until it receives and reads all the bytes from the request. Since the server needs to parse it and understand how it can satisfy it, once it started to render the response (sent the first byte), it does not control the network latency.
It would be best if you communicated it clearly with your team. You may write a web service that starts parsing JSON on-the-flight and executing the business logic. Your processing time will have a slightly different meaning. Measure it from the first received request byte.
Latency is the duration that a request is waiting to be handled. Until the request is handled, it is latent - inactive or dormant. A high latency indicates problems in the networking or that the server cannot handle a series of requests and is probably overloaded.
You might be interested in a round-trip latency. Round-trip latency is when a request is waiting to be handled and the duration it takes for the response to get back to the client. A high level of round-trip latency usually means bad or slow networking.
To exclude networking problems, you must eliminate networking and ideally host your application locally if it is possible. This way, you can remove it from the equation and check the response time.
If the server code is under your control, you can track the processing time, and if you need to get the latency value, you can subtract the processing from the response time, and you will receive the round-trip latency value.
Another way to debug the latency is to log timestamps when the client sends a request and when the server has started to process the request. The difference between the end and the start timestamp is the latency.
You can use Chrome DevTools to debug the response time. In the Chrome DevTools, it is defined by Waiting time or Time To First Byte (TTFB). You can also emulate slow network connection by using network throttling.