Re-id One of Two Duplicate Records

Forum for DBF Commander discussing.

Re-id One of Two Duplicate Records

Postby btorrean » Tue Jan 26, 2021 1:06 am

Dear Forum,

I have a dataset in which duplicate IDs were made for some records. There are never more than two records that share the same ID.

I am wondering, is it possible to re-number one of the two duplicate IDs in each duplicate set by finding out what the max ID number is for the entire dataset, and adding one to that number? And can this be done sequentially (i.e. one ID is renumbered, so the next ID that gets renumbered is one greater than the previous re-numbered ID)?

Here's the code I've come up with so far, but it doesn't work as desired. Column name with the duplicates = 'internalno'. There are 73 total columns in this dbf.

Code: Select all
UPDATE "D:\aggregate\PAW.DBF" SET internalno= (SELECT MAX(internalno) FROM "D:\aggregate\PAW.DBF")+1 WHERE (
SELECT internalno , COUNT(internalno )
FROM "D:\aggregate\PAW.DBF"
GROUP BY internalno
HAVING ( COUNT(internalno ) > 1) )


In total, there are 3,171 duplicate records out of 352,794 records. That's why I'm really hoping this can be automated.

If you can help, please let me know. Thanks!

Brian
btorrean
 
Posts: 8
Joined: Sun Jun 23, 2019 7:40 pm

Re: Re-id One of Two Duplicate Records

Postby Admin » Tue Jan 26, 2021 12:13 pm

The task you've described cannot be solved with a single UPDATE query. Here is the SELECT, that adds updated ID to your table (new_id field):

Code: Select all
select iif(nvl(t0.dup_id,0)=0, t0.internalno, (select max(internalno) from paw) + recno()) as new_id, t0.* from (
  select t0.internalno as dup_id, t1.* from (
    select internalno from paw group by internalno having count(internalno)>1
  ) as t0 right join paw as t1 on t0.internalno=t1.internalno   
) as t0 into table paw2


After that, you can update internalno field with the new values:
Code: Select all
update paw2 set internalno=new_id


If you need to remain the structure of your DBF, you can delete these 2 temporary fields new_id and dup_id:
Code: Select all
alter table paw2 drop column new_id

Code: Select all
alter table paw2 drop column dup_id


Please note, the SQL above changes all duplicated IDs, not just one of them. If you need to save one of them unchanged, please send me your sample DBF to support@elphsoft.com, so I could rewrite the query. In this case, the task is more complex, since the SQL will depend on other field(s) of the table.
Best regards,
Oleg Zhechkov
User avatar
Admin
Site Admin
 
Posts: 125
Joined: Wed Apr 21, 2010 7:27 pm

Re: Re-id One of Two Duplicate Records

Postby btorrean » Tue Jan 26, 2021 5:10 pm

Thank you, Oleg! I've just sent you an email with the sample database!

Brian
btorrean
 
Posts: 8
Joined: Sun Jun 23, 2019 7:40 pm

Re: Re-id One of Two Duplicate Records

Postby btorrean » Fri Feb 12, 2021 7:59 am

Oleg,

I finally got the code to do what I wanted it to do! Here are the process steps:

-- 1. Open table paw.dbf in DBF Commander

-- 2. ************** IMPORTANT ***********************
-- To purge deleted records from 'paw', use the following command. Do this before running queries, as not doing so appears to produce duplicate or missing records.
-- PLEASE NOTE: This command cannot be undone!
-- Where 'paw' is the name of the table to be purged.
Code: Select all
pack paw


-- 3. Run the following code on paw.dbf to create paw_test_01.dbf. The code produces an id named 'dup_id', which is the internalno, EXCEPT when the row is a duplicate, and the newest of its duplicate pair by patdate. Then, it is null.
Code: Select all
select t0.internalno as dup_id, t1.* from (
select internalno, min(patdate) as MinDate from paw group by internalno
) as t0 right join paw as t1 on t0.internalno=t1.internalno and t1.patdate = t0.MinDate
into table paw_test_01


-- 4. Run the following code on paw_test_01.dbf to create paw_test_02.dbf. The code creates an id which is the internalno, unless dup_id is null. Then, it creates a new id. At this point, the new internalno's are set in the new_id field. We could just drop the internalno field and rename new_id to internalno, but for the sake of data integrity, we will update the internalno field in step 5.
Code: Select all
select iif(nvl(t0.dup_id,0)=0, (select max(internalno) from paw_test_01) + recno(), t0.internalno) as new_id, t0.* from
paw_test_01 as t0
into table paw_test_02


-- 5. Run the following code on paw_test_02 to set the new internalno.
Code: Select all
update paw_test_02 set internalno = new_id


-- 6. Remove the fields: dup_id and new_id, as they are now unnecessary.

Thanks so much for your help! :D :D :D

Edit: 20210218_2040 - Updated process steps, as previous process steps were found to not exactly perform the required operations. These steps do.

Brian
btorrean
 
Posts: 8
Joined: Sun Jun 23, 2019 7:40 pm


Return to DBF Commander Professional

Who is online

Users browsing this forum: No registered users and 3 guests

cron