Security Considerations in Blue-Green Deployments


tl,dr; Blue-Green deployments for critical uptime applications is a strong deployment strategy but if a deployment fixes critical security issues be sure that the definition of “deployment complete” is decommissioning of the “blue” environment and not just deployment of “green” successfully.

Organizations have gotten used to following Continuous Integration/Continuous Deployment (CI/CD) for software releases. The use of cloud solutions such as AWS Code* utilities, Azure DevOps or Google Cloud Source repositories enables enterprises to quickly and securely accomplish CI/CD for software release cycles. Software upgrades undergo the same CI/CD tooling and the dev teams need to choose how to do the upgrades.

There are a few different ways development teams upgrade their applications – full cut-over (same infrastructure, new codebase deployed directly and migrated in one go), rolling deployments (same infra or new infra gradually upgrading all instances), immutable (brand new infra and code for each deployment and migrated in one go), blue-green deployment (same infra, simultaneously deployed in prod, gradually phasing out old instances upon successful tests from a section of traffic). The Blue-green deployment strategy has, therefore, become quite popular for modern software deployment.

What is a Blue-Green deployment?
When you release an application via Blue-Green deployment strategy you gradually shift traffic as tests succeed and your observability (Cloudwatch alarms, etc.) does not indicate any problems. You can do that via Containers (ECS/EKS/AKS/GKE) or AWS Lambda/Azure Functions/Google Cloud Functions and traffic shifting can be done with the help of DNS solutions (Route53/Azure DNS/Google Cloud DNS), Load balancing solutions (AWS Elastic Load Balancer/Azure Load Balancer/Cloud Load Balancing). Simplistically, take your current (blue) deployment and create a full stack (green) and use either DNS or load balancers to slice out a traffic section and test the “green” stack. This is all happening in production, by the way. Once everything looks good, direct all the traffic to green and decommission the “blue”. This helps maintain operational resilience and, therefore, this is a popular deployment strategy. AWS has solid whitepaper I recommend to review to dive in from a solution architecture standpoint if you are interested.

Security Considerations

Some critical security issues (e.g., remote code execution via Log4j, remote code execution via struts, etc.) demand immediate fixes because of their severity. If your blue-green deployments are going to take days and your tests will run over a very long period (say days) then any security fixes you make also will get fixed after a successful “green” deployment and “blue decommission” only. If during that window or prior, an attacker managed to get a foothold into the impacted “blue” environment, then even decommissioning of the “blue” becomes critical to claim the issue is fully remediated. Typically, when incident responders and security operations professionals breathe a sigh of relief is when the fix is deployed. Typically, the software engineering teams consider fix as deployed is when the “green” is fully handling all traffic (and its unrelated to the decommissioning of the “blue” environment). In this case, the incident responders need to remember its not the deployment time when the risk is truly mitigated, its mitigated after completion of cut-over to green and decommissioning of the blue. There is a subtle, yet important, difference here – and it really comes down to the use of shared vocabulary. As long as security operations and software development teams both have this shared definition of what deployment means, there are no misunderstandings.


Security Considerations in use of AI/ML


The world of Artificial Intelligence (AI) seems to be exploding with the release of ChatGPT. But as soon as the the chat bot came into the hands of public people started finding self-sabotaging queries at worst (exploitable issues) and some weird interactions whereby people could write malware that could stay undetected by Endpoint Detection and Response (EDR) bypasses.

What is AI?

Very simply, Artificial Intelligence (per Wikipedia) is intelligence demonstrated by machines. But technically, it is a set of algorithms that can make do things that a human does by making an inference, similar to humans, on the basis of data that was historically provided as “reference” to make the decisions. This reference data is called as training data. And the data which is used to test the effectiveness of the algorithm to arrive at a decision on the basis of that reference, is called as test data. Any good machine learning course teaches how do you design data and how much data to use for training and how much to use for testing and metrics of performance but that is not relevant to our discussion here – however, what’s important is that it is the data that you provide that controls the decision-making in an artificially intelligent algorithm. This is a key difference between typical algorithms (where the code is more or less static and makes decisions on certain states in the program) whereas in an artificially intelligent system you can have the program arrive at different decisions depending on how one decides to “train” the algorithms.

What is ML?

Machine Learning (ML) is a subset of Artificial Intelligence (AI) where the artificial intelligent algorithms evolve their decision-making on the basis of data that has been processed and tagged as training data. ML systems have been used in classifying spam or anomaly detection in computer security. These systems tend to use statistical inference to establish a baseline and highlight situations where the input data does not fall within the norm. When operational data is being used to train ML-based system one has to be careful that we are not incrementally altering the baselines of what’s normal and what’s not. Such “tilting” may happen over time and its important to protect against drift of such systems. Some “drift” is ok but “bad drift” is not – which is hard to predict. E.g., let’s say you classify some data inaccurately and accidentally/maliciously end up using it for training your ML-models but if it inherently alters the behavior of the ML model, then the model becomes unreliable.

What is Adversarial AI?

Adversarial Artificial Intelligence (AI) based threats are ones where malicious actors design the inputs to make models predict erroneously. There are a couple of different types of attack here – poisoning attack (where you train models with bad data controlled by adversaries) or an evasion attack (where you make the artificial intelligence system make a bad inference with a security implication). The way to understand these attacks is that the poisoning attack is basically “Garbage-in-garbage-out” but its this really “special” garbage. This is garbage that changes the behavior of the algorithm in a way that the algorithms returns an incorrect result when it has to make a decision. The inferential attacks are different in that the decision made is wrong because the input is such that it appears differently to the ML algorithm than it does to humans. E.g., Gaussian noise being classified as a human or a fingerprint being matched incorrectly.

Can we attack these systems in other ways?

In a paper presented by Google researchers created a tool (TensorFuzz) that they were able to demonstrate finding a few varieties of bugs in Deep Neural Networks (DNNs). So typical software attack techniques do work against the deep neural networks too. Fuzzing has been used for decades and has caused faults in code forever. At its core, fuzzing is simple, send garbage input that causes a failure in the program. It’s just that the failures in DNN are different and you want to ensure the software relying on the DNN to make a decision handles such failures appropriately and do not cause a security failure with secure defaults.

Protection mechanisms

There are a few simple ways to look at ML systems and security thereof. Microsoft released an excellent howto on how to threat model ML systems. Additionally, using adversarial training data is imperative to ensure that artificially intelligent algorithms performs as you expect them to in the presence of adversarial data. When you rely on ML-based systems, its all the more important that you test it appropriately and continue to do so against baselines. Unfortunately, for Deep Neural Networks transparency of decision making continues to be an issue and needs the AI/ML researchers to establish appropriate transparency measures.


What to do when things go wrong?


I blogged earlier about blameless post-mortems and how one gets to a point that they are able to do blameless post-mortems – by having an operational rigor and observability. This is more of a lessons learnt post about what do you do and what you don’t when things go wrong?

Focusing on the Who?

A lot of times focusing on the “who reported the issue?” can be focusing on a wrong thing. If a report comes from a penetration test or a bug bounty researcher or an internal security engineering resource you need to make sure that the impact and likelihood is clearly understood. There are sometimes where customers (who pay or intend to pay for your service) report problems – these are obviously more important.

Focusing on the How?

How a security issue gets reported is important. As examples where you learn about a security issue via a bug report(1), or where you learn about it via your own telemetry(2) or you learn about it on Twitter! There is a potential for legal ramifications in each of these cases and the risks might be different. When things become public without your knowledge where you were not notified and the information is now public you do have a role to instill confidence in your current customers. The best approach here tends to be of sticking to facts without any speculations. If you are working on incident say so. Don’t say we are most secure when you are the subject of a breach discussion especially because you already have the data that you are not as secure. Identification and Containment of the security issue are top priorities – do not take resources that are doing these actions away to ensure Public Relations are good – doing that will eventually make public relations bad! Involve lawyers in your communications process and mark communications with right legal tags (“attorney client privileged material”) so that if a litigation happens you can clearly demarcate evidence that can be or cannot be part of a discovery.

Focusing on the What?

“What” needs to be done has to be clear with the help of an incident manager. The incident manager is the person who is most well read, subject matter expert, and leads the response process. Having this single-threaded ownership of leading the incident is incredibly important. The role of the incident manager is to ensure they have all the information that they need to make decisions. This also streamlines the process of public relations, legal needs, incident cleanup (eradication and recovery), and helps with swift and focused decision-making. This can sometimes be crisis management depending on impact and otherwise it can be just another day in the Security operations office. The key trait here is focus and goal-based decision making. Adrenaline can run high, tempers can flare – that typically happens when you are unprepared to handle security incidents. The tempers and nervousness can be avoided by being proactive in doing tabletop exercises, incident dry-runs and having good runbooks. But all practice games do is prepare you for the real thing – the real thing is how you handle a true incident. Use the help of key stakeholders to derive best decisions – there often tend to be situations where no answer looks good – and therein comes the customer focus – if you focus on the well being of customers you will rarely go wrong.

Focusing on the Why?

Capture incident response logs in tickets and communications so all the timeline and actions get captured properly with documentation. After the recovery is completed, do a blameless post-mortem of how you got there. Ensure you put a timeline of taking on agreed-upon corrective actions on a timeline that is agreed and don’t waiver – this is a part of operational rigor one needs to follow to really avoid incidents from happening in future. Typically, the reason why issues happen is because something was not prioritized as it should have been. Reprioritize to make sure you can reassess. Sometimes the size of the incident makes it your reprioritization almost coerced – it’s ok to be coerced in that direction. You will find that coercion is simply an acceleration of the actions that you should have taken up earlier. No one is perfect – just come out of it better!

Focusing on the Where?

Where you discuss the issue is important. When sizable incidents happen discuss is openly with the business leaders so that full awareness and feedback is provided in “powerful forums”. This obviously does not mean that you break your attorney client privilege – it just means discuss with the highest leaders in a manner where action items, impact and post-mortem results are provided. This enables business to become resilient over time and develop confidence in the security teams. If you need to do public releases then ensure that lawyers read it and security SMEs read it as well as business leaders read it – only then do such releases. Don’t let the “left hand meet right” situation ever occur. This instills customer confidence in your process.


This was just an attempt for me to pen-down my thoughts as they appeared in my brain. I am sure I forgot a lot such as stress of handling, avoiding knee-jerk reactions, etc. but these are top most important things that I felt were necessary to share. Remember, incident handling gets better with practice – you want the practice be done in practice games not in the olympics! 🙂


The historical evolution of Cross-Site Request Forgery


Having been in application security for more than 2 decades now and officially completing my 18th year now of being meaningfully employed in that space there is just a lot of crud that I have gathered in my brain. Most of that is history of how things came about to be. That stuff is likely not interesting to most but I find it intriguing as to how some seemingly minor decisions of one software vendor can have massive impact to the web application security industry.

Oh the dreaded IE…
Internet Explorer 4, 5 and 6 that started in the Windows XP days (or even earlier, can’t recall) had a setting – the cookie jar was not shared – i.e., if you opened a new window to a site, you would have to log in again unless you used “Ctrl+N” key to open a new window from an existing session. Each new process would have its own cookie jar. For the uninitiated, the “cookie jar” is the internal browser storage of cookies. Cookies are random looking strings that indicate a “trust token” that a web server places in the browser. Since HTTP is a connectionless protocol, this cookie is what preserves the “state” and this is exactly what authorization decisions in HTTP context are typically based on. These cookies are stored in a web browser data storage called cookie jar where each cookie gets stored with the name, value, domain, path (and today, there are few other attributes but that wasn’t the case back in 2004-2005). The browser gets all these parameters from the HTTP response header Set-Cookie. Microsoft, the vendor for Internet Explorer, made a decision that each new window of IE should have its own set of stored cookies that were not shared. Mozilla Firefox and Google Chrome always had a shared cookie jar if I recall correctly.

Along came a Cross-Site Request Forgery (CSRF)…

Jesse Burns from iSecPartners (an NYC-based security consultancy that was acquired by NCC group) back then wrote a paper which I think was the seminal paper on Cross-Site Request Forgery. They called it “XSRF” back then because “XSS” was already in parlance back-then. Thereafter, there were presentations in 2006 about the same by Microsoft. The whole attack was simple. The victim has a browser tab open in which they are logged into a site that has issued that session a cookie value. Due to the browser same-origin policy (a concept that Netscape designed in 1995) that cookie would be resent by the browser in the request as a Cookie HTTP request header whenever an HTTP request was sent to the same domain, protocol (“scheme”) and port. There were few idiosyncracies of IE (which made it infamous back then) such as if the port number did not match IE did not complain and would think that access was allowed per the Same-Origin Policy (SOP). What does that mean? http://example.com and http://example.com:81 would be treated as the same origin! Weird right? It wasn’t the case with other browsers. This was also documented in Michal Zalewski’s book Tangled Web in 2011. Where am I going with this? So while IE did some weird things, it did one good thing – isolate cookie jars. So if you opened up a new window where the attacker ran a payload that sent a request to the site which had handed you a cookie, the new IE window would have no interesting cookies to share with that site – inadvertently protecting the user from being a victim to a CSRF issue. Yes, the hated IE protected the users from being victim to CSRF! Who would have thought? That’s how weird 2005 was 🙂

Fast forward…

Since all browser vendor today have concept of shared cookie jars because who doesn’t like opening new tabs of their favorite cloud consoles without having to re-login right? So what did we the people do? We came up with another attribute that could be added to a Set-Cookie HTTP response header – SameSite attribute which restricted the cookie from being sent unless the request originated from a page on the same site as the cookie issuer.

So there you have it… the history of SameSite and how one of the most hated browsers of the day (IE) did one good thing for users – protect them from CSRF! 🙂


A brief history of SSRF


Server-Side Request Forgery is a security issue in applications where an attacker is able to get a server to send some type of a request (these days mostly HTTP/s requests) that the server should not be able to send. This issue is the classic abuse of trust vulnerability – the server tends to sit in a “trusted” environment (e.g., DMZ, your cloud VPC, etc.) and the users of the application sit outside the trust boundary (e.g., mobile devices, cafe, home, corporate environments, API clients within and outside cloud).

A brief history
In this blog post though, I won’t be talking about all the fancy new things that have had SSRF issues – you can likely find a few hundreds of those anyway! I am going to be talking about a brief history of this issue and what happened before we gave this issue a “name” – SSRF. The earliest references to the name “SSRF” appear to come from a talk done in BlackHat US 2012, and wayback machine tells me that the CWE-918 page was authored sometime around 2013. If you look closely at the CWE-918 page though, you will find that there were old CVEs dating back to 2002, and 2004. There was a Shmoocon talk about it in 2008 too but the term SSRF was not established until 2012.

The Issue

I was working on a penetration test for a financial services firm in 2010 of a popular load balancer that offered a GTM (Global traffic manager) solution and that allowed folks to login and obtain restricted execution environments wherefrom certain applications could be exposed and that would allow remote workers or untrusted entities who you only want to expose certain applications could use. The issue was in a POST request post login and IIRC even pre-login (though my details on this are fuzzy). This might be circa 2010 timeframe and to my knowledge the issue was not issued a CVE and wasn’t associated with Knowledge base article – I may not know 100%. The guidance from the vendor was simple – update the software and move on. The issue was this – an authenticated or an unauthenticated user would send an HTTP POST request to a page with a base64 encoded parameter that included a hostname which would trigger a DNS request on the back end of the GTM site. The time it took for the response to get back would indicate whether the domain was legitimate or not. So I used the a popular dictionary and enumerate all the hostnames from that directory that were legitimate and the ones that were not sitting outside on the Internet and mapping the hosts on the internal network.

The backstory
What the vendor of the GTM software did not know was how critical this application was to the business of the customer. They seemed to be dragging their feet without updates and meanwhile the customer – a financial institution with lots at stake could not go live. The pressure mounted on the IT staff to fix the issue and the vendor while being responsive was unable to give a firm date quickly – remember this was 13-14 years ago prior to bug bounties and responsible disclosures still were quite clunky! And the customer was also advising me to push the software vendor so we could discuss. Thankfully, on the vendor side, there was a solid security person who understood the issue immediately and its impact and advised the software teams to do what was right. They made the process post authentication and they also added tokens, limits and constant time responses to fix the issue.

Fast forward
Today, obviously things are a lot better. And I wrote this blog post so the old me can look back and point to this in a meaningful way without forgetting the old experiences among the new.


Filing Tax Assessment Appeal in Jersey City


In this post, I will cover a how to for filing a resident’s tax appeal. It’s quite simple. This is not meant to cover all special situations but should cover simple situations if you live in a condo in Jersey City for example. For other situations, review the handbook listed below.
Most importantly – this appeal needs to be in the hands of the folks by Dec 1 2022 otherwise it will be rejected. Therefore, really important to visit the office and hand it over in-person the tax officer said. You could also send it via a certified mail.

Important Links

  1. Where to get the appeal form for mid-year added/omitted assessment https://www.state.nj.us/treasury/taxation/pdf/other_forms/lpt/adomap.pdf
  2. N/A for mid-year: But if you are filing during the usual time January or Apr for annual tax changes use https://www.hcnj.us/wp-content/uploads/2021/12/a-1-petition-of-appeal.pdf
  3. Comparables are obtained from: https://www.zillow.com/b/20-2nd-st-jersey-city-nj-5XkRmF/
  4. Appeals handbook: https://secure.njappealonline.com/prodappeals/help/Hudson_InstructionsHandbook.pdf
  5. If you are filing an online appeal you can do so at http://www.njappealonline.com/prodappealonline/Home.aspx however, this site only works at certain times of the year. For example, in Nov 2022 the site is not accepting Hudson County appeals for some reason.

How to fill the form:

  • This is the link for the ratio for Jersey City municipality (max value is 100%) i.e., the cost of the sale price. Minimum value for Common Level Ratio in Jersey City for 2022 is 0.7426 and Max is 1.0. So if your unit value is assessed to be within the maximum and minimum range you do not qualify for an appeal.
    This is where you get the Common Level Ratio values from: https://www.state.nj.us/treasury/taxation/pdf/lpt/chap123/2022ch123.pdf . E.g., let’s say your unit value was assessed to be $1mn and comparative sale prices show that the total value of the unit is $950,000 this sale price is not within $1mn/0.7426 and $1mn/(1.0). This means that you qualify for an appeal. So your taxable value would be $950,000*0.8737 (Avg. value of the Common Level Ratio) = $830,015. At a 2.118% tax rate this would come to $17,580.
  • The is what the fields look like:
    Bock / Lot / Qualifier – you get it from your tax bill and also can be obtained from https://tax1.co.monmouth.nj.us/cgi-bin/prc6.cgi?district=0906&ms_user=monm by searching the site via address
  • Next go to Zillow (https://www.zillow.com/b/20-2nd-st-jersey-city-nj-5XkRmF/) and find the comparable sales for your unit for the pretax year (if you are appealing 2022 assessment, use 2021 sales). Goto https://tax1.co.monmouth.nj.us/cgi-bin/prc6.cgi?district=0906&ms_user=monm and find the sale dates and add that information to the form.
  • The prorated fields in the form can be left out because the county knows those values (so I did not fill those out, the county clerk did that for me)
  • Sign and date the form
  • You need to send one copy each to the following addresses via post or in-person (if the online system does not work).
    Hudson County Board of Taxation, Hudson County Plaza, 257 Cornelison Ave Room 303, Jersey City NJ 07302. You also need to send one copy to the city: Office of the City Assessor, 364 Martin Luther King Drive, Jersey City NJ 07305. Phone: 201-547-5131.

Update 12/29/2022:

I did go to the Hudson county court and appealed my decision in person. The city representatives were quite polite and the process was quite smooth – you just show up in the court and either accept or reject the city’s proposal. Once the judgment is reached they mail you the judgment which you can appeal for 45 days. After that the decision is binding for 2 years.


PCI SSC Forbids SSL and “Early TLS”


On April 15, 2015 the PCI SSC released the PCI DSS v3.1.  The main cause for concern for most merchants and other entities (called “entities” hereonforth) that store, transmit and process cardholder data is the prohibition of using SSL and “Early TLS”.  The PCI SSC also released a supplement to assist entities in mitigating the issue.   The supplement references the NIST guideline SP800-52 rev1 for determining which are good ciphers and which are not.

The key point being what does “Early TLS” mean?  Does it mean TLSv1.0 and TLSv1.1 OR does it mean only TLSv1.0?  Are the entities supposed to disable all ciphers except anything that’s TLSv1.2?

Answer is (in consultant speak) “it depends”. 🙂

TLSv1.1 does theoretically have ciphers that are not ideal.  Example: CBC mode ciphers that are TLSv1.1 but there may be a potential for attacks on them given that in the past couple of years CBC has fallen multiple times (BEAST, POODLE).

Google Chrome lists the use of CBC-based ciphers (despite the fact that they’re TLSv1.1) to be obsolete.  Google Chrome essentially makes “obsolete cryptography” a function of using TLS v1.2-based ciphers.


Firefox allows the configuration of disabling TLSv1.0 and that can be done by typing “about:config” in the address bar.  The security.tls.version.min = 0 (means SSLv3), 1 (means TLSv1.0), 2 (means TLSv1.1) and 3 (means TLSv1.2).  The following screenshot shows the configuration snapshot (here the lowest allowed version is TLSv1.0).


Let’s start with what is definitely ok for PCI:


 TLS_RSA_WITH_NULL_SHA256                  NULL-SHA256
 TLS_RSA_WITH_AES_128_CBC_SHA256           AES128-SHA256
 TLS_RSA_WITH_AES_256_CBC_SHA256           AES256-SHA256
 TLS_RSA_WITH_AES_128_GCM_SHA256           AES128-GCM-SHA256
 TLS_RSA_WITH_AES_256_GCM_SHA384           AES256-GCM-SHA384









 TLS_DH_anon_WITH_AES_128_CBC_SHA256       ADH-AES128-SHA256
 TLS_DH_anon_WITH_AES_256_CBC_SHA256       ADH-AES256-SHA256
 TLS_DH_anon_WITH_AES_128_GCM_SHA256       ADH-AES128-GCM-SHA256
 TLS_DH_anon_WITH_AES_256_GCM_SHA384       ADH-AES256-GCM-SHA384

Now let’s see what may potentially be good from TLSv1.1 perspective (from NIST SP8000-52 rev1):

TLS_RSA_WITH_AES_128_CBC_SHA            AES128-SHA

Here’s a problem though per OpenSSL man page:

If you’re using OpenSSL, how do you ensure that the browser is not negotiating the vulnerable TLSv1.0 ciphers? The only real answer seems to be by providing a cipher order for negotiation and hoping the client doesn’t cheat.  Most likely, the browser will negotiate a better cipher when it exists in the server and on the client and you’d avert the possibility of negotiation of a bad cipher.

According to experts, anything that uses CBC is inherently broken.  But disabling TLSv1.0 may make the server inaccessible to various older Android devices.  Also, if you’re using older Java Development Kits (JDK7 and below), do remember that the default ciphers may not hit the spot for PCI.

There’s an excellent site to help you configure each type of the server so you could become PCI compliant. This is an excellent site by Ivan Ristic to test your Internet-facing servers for configuration of SSL/TLS encryption.

In conclusion, configure browsers to minimally allow TLSv1.1 and configure servers to use TLSv1.2 to be PCI DSS compliant.  The road to TLSv1.1 compatibility and PCI DSS is filled with potholes and death-falls so do it at your own risk.