Detecting Credit Card Fraud

2023-11-24

Cassandra Clark

Introduction
Reading Data
Identify duplicates
Flagging Compromised Terminals
Bringing it All Together
Conclusion

Introduction

In our increasingly interconnected world fraud has become a pervasive force. Criminals worldwide are using increasingly sophisticated techniques to steal data and money from hapless victims. These techniques cause losses of billions of dollars worldwide. Fortunately, technology is incredibly helpful for identifying these attempts at fraud.

In this article, we will focus on credit card transactions specifically. We will demonstrate how to use Enso to bring in multiple data sources to flag transactions identified by a variety of systems as fraudulent, from stolen credit card details to compromised payment terminals.

Reading Data

To start things off we’re going to read in our dataset of credit card transactions.

Credit Card Transaction dataset

This dataset has a transaction_id which is a unique identifier per transaction, tx_datetime which shows the time the transaction occurred, customer_id which identifies each user, a terminal_id which identifies where the transaction took place, and tx_amount which is the amount of the transaction. There are two more columns, tx_time_seconds and tx_time_days, but we don’t need these for this example.

Cleaning up the data

Enso automatically infers data types for you, even with text files, but in this dataset, there are a few messy values in several fields that don’t quite conform to the correct types, so we can use the parse function to set these appropriately and discard any values that don’t conform.

Querying the database

From here, we bring in our second dataset. We use Enso’s native database support to query from our internal database directly using the same syntax as other operations. Here we can select only the three columns we need.

Database dataset

We’ll join the data where TRANSACTION_ID and CUSTOMER_ID match, and replace any empty TX_AMOUNT values with the Amount field found in the database. We can remove the unnecessary Transaction ID and Amount columns, as we’re done using them.

Identify duplicates

Identifying fraudulent transactions

We can identify duplicate transactions in our dataset by looking for cases where the same TRANSACTION_ID exists more than once. The aggregate method is perfect for this. We mark these duplicates as fraudulent and bring them back into our original dataset.

Now we’re going to add a row number so that we can apply our other fraud cases to the test dataset. While we’re at it, we’ll also select distinct records by CUSTOMER_ID, TRANSACTION_ID, and TX_AMOUNT, and truncate the TX_DATETIME field to remove the time, as we don’t need that level of specificity to create these cases.

Transforming the data

Flagging Compromised Terminals

Next, we’re going to input a list of terminals that have been flagged as being compromised by our system. This dataset includes only the terminal ID and the date it was flagged. The company’s policy is to flag all transactions from a compromised terminal for the next 28 days after the initial report. To do this, we add an end_date field using the set method.

Adding the end date

Let’s break this method down since it’s really powerful. Here we can select a column operation. We get a dropdown of all of the different calculations we can perform, and once we select one, like Date_Add, we get a dropdown listing our columns, so we can easily select what data we want to operate on. We set our period of 28 days, and then we set a new_name of end_date so that Enso creates a new field.

Adding the end date

We use Enso’s powerful join method to isolate all the transactions between two dates, thanks to the handy Between condition. This condition will automatically look for all dates between our start and end dates for each specific terminal. Then we can set our Fraud column to True, and select only the RowNum and Fraud fields as these are all that we need to update our original dataset.

Selecting fraudulent entries

Bringing it All Together

We use the merge method to update our primary dataset. We can look up the RowNum field, and this function will automatically update the value of our Fraud column for any row identified as related to a fraudulent transaction on a compromised terminal.

Post merge dataset

Updating the primary dataset

Finally, we can bring in a list of flagged customer transactions from another dataset. This dataset CustomerFraud.csv has a lot of extra information in it, but all we care about is marking the transactions as fraudulent, and using the TRANSACTION_ID to update our core dataset.

Marking externally flagged transactions

So, we set Fraud to True for these records, and then use the aggregate method to ensure unique transactions are being selected via the Group_By operation, and then we use lookup_and_replace once more, using TRANSACTION_ID to mark these transactions as fraudulent.

Conclusion

We have successfully created a dataset of all of the transactions in our system for this period, with fraudulent transactions correctly identified for downstream analysis, predictive modeling, or reporting use cases.

Final dataset

Fraud is a major threat worldwide, and it can take a variety of shapes, but using Enso we can quickly bring together and analyze data to develop better resiliency against it.